1. What is Tablesaw?
Tablesaw is a data visualization library for Java that consists of two parts:
- The data parsing library, which is mainly used to load data, manipulate data (transformation, filtering, summarization, etc.), analogous to the Pandas library in Python;
- A data visualization library that converts target data into visual charts, analogous to the Matplotlib library in Python.
Unlike Pandas, tables in Tablesaw are based on columns, so most operations are based on columns. Of course, it also includes some functions that operate on rows, but the functions are relatively limited
Tablesaw directory description:
- aggregate: the parent project of Maven, which mainly defines the configuration of project packaging.
- Beakerx: The registry for the Tablesaw library, the main registry and columns.
- core: The core code of the Tablesaw library, which is mainly used for data processing operations: data appending, sorting, grouping, querying, etc.
- data: the directory of project test data.
- docs: The directory of the project’s MarkDown documentation.
- docs-src: the source code directory of project documents, which is mainly used to generate MarkDown documents.
- Excel: A subproject that parses the data in an Excel file.
- HTML: A subproject that parses HTML file data.
- JSON: A subproject that parses data in a JSON file.
- jsplot: a sub-project of data visualization, which is mainly used to load data and generate visualization charts.
- saw: a sub-item of tablesaw that reads and writes chart data.
2. Environmental preparation
mysql setup
docker run --name docker-mysql-5.7 -e MYSQL_ROOT_PASSWORD=123456 -p 3306:3306 -d mysql:5.7
init data
create database demo;
create table user_info
(
user_id varchar(64) not null primary key,
username varchar(100) null ,
age int(3) null ,
gender tinyint(1) null ,
remark varchar(255) null ,
create_time datetime null ,
create_id varchar(64) null ,
update_time datetime null ,
update_id varchar(64) null ,
enabled tinyint(1) default 1 null
);
INSERT INTO demo.user_info
(user_id, username, age, gender, remark, create_time, create_id, update_time, update_id, enabled)
VALUES('1', '1', 1, 1, '1', NULL, '1', NULL, NULL, 1);
remark
msyql username:root
mysql password:123456
3. Code engineering
Objectives:
Two-dimensional data is processed and visualized using Tablesaw
pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>springboot-demo</artifactId>
<groupId>com.et</groupId>
<version>1.0-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<artifactId>tablesaw</artifactId>
<properties>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-autoconfigure</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>tech.tablesaw</groupId>
<artifactId>tablesaw-core</artifactId>
<version>0.43.1</version>
</dependency>
<dependency>
<groupId>tech.tablesaw</groupId>
<artifactId>tablesaw-jsplot</artifactId>
<version>0.43.1</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-jdbc</artifactId>
</dependency>
<!--mysql-->
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>8.0.29</version>
</dependency>
</dependencies>
</project>
application.yaml
server:
port: 8088
spring:
datasource:
driver-class-name: com.mysql.cj.jdbc.Driver
type: com.zaxxer.hikari.HikariDataSource
url: jdbc:mysql://127.0.0.1:3306/demo?useUnicode=true&characterEncoding=utf-8&useSSL=false
username: root
password: 123456
Read CSV data
@Before
public void before() throws IOException {
log.info("init some data");
tornadoes = Table.read().csv("/Users/liuhaihua/IdeaProjects/springboot-demo/tablesaw/src/main/resources/data/tornadoes_1950-2014.csv");
}
Print the column name
@Test
public void columnNames() throws IOException {
System.out.println(tornadoes.columnNames());
}
outcome
[Date, Time, State, State No, Scale, Injuries, Fatalities, Start Lat, Start Lon, Length, Width]
Check out the shape
@Test
public void shape() throws IOException {
System.out.println(tornadoes.shape());
}
outcome
tornadoes_1950-2014.csv: 59945 rows X 11 cols
View the table structure
@Test
public void structure() throws IOException {
System.out.println(tornadoes.structure().printAll());
}
outcome
2024-06-05 21:13:01.297 INFO 41685 --- [ main] com.et.tablesaw.DemoTests : init some data
Structure of tornadoes_1950-2014.csv
Index | Column Name | Column Type |
-----------------------------------------
0 | Date | LOCAL_DATE |
1 | Time | LOCAL_TIME |
2 | State | STRING |
3 | State No | INTEGER |
4 | Scale | INTEGER |
5 | Injuries | INTEGER |
6 | Fatalities | INTEGER |
7 | Start Lat | DOUBLE |
8 | Start Lon | DOUBLE |
9 | Length | DOUBLE |
10 | Width | INTEGER |
2024-06-05 21:13:02.907 INFO 41685 --- [ main] com.et.tablesaw.DemoTests : clean some data
Review the data
@Test
public void show() throws IOException {
System.out.println(tornadoes);
}
outcome
2024-06-05 21:14:52.181 INFO 41732 --- [ main] com.et.tablesaw.DemoTests : init some data
tornadoes_1950-2014.csv
Date | Time | State | State No | Scale | Injuries | Fatalities | Start Lat | Start Lon | Length | Width |
-----------------------------------------------------------------------------------------------------------------------------------------
1950-01-03 | 11:00:00 | MO | 1 | 3 | 3 | 0 | 38.77 | -90.22 | 9.5 | 150 |
1950-01-03 | 11:00:00 | MO | 1 | 3 | 3 | 0 | 38.77 | -90.22 | 6.2 | 150 |
1950-01-03 | 11:10:00 | IL | 1 | 3 | 0 | 0 | 38.82 | -90.12 | 3.3 | 100 |
1950-01-03 | 11:55:00 | IL | 2 | 3 | 3 | 0 | 39.1 | -89.3 | 3.6 | 130 |
1950-01-03 | 16:00:00 | OH | 1 | 1 | 1 | 0 | 40.88 | -84.58 | 0.1 | 10 |
1950-01-13 | 05:25:00 | AR | 1 | 3 | 1 | 1 | 34.4 | -94.37 | 0.6 | 17 |
1950-01-25 | 19:30:00 | MO | 2 | 2 | 5 | 0 | 37.6 | -90.68 | 2.3 | 300 |
1950-01-25 | 21:00:00 | IL | 3 | 2 | 0 | 0 | 41.17 | -87.33 | 0.1 | 100 |
1950-01-26 | 18:00:00 | TX | 1 | 2 | 2 | 0 | 26.88 | -98.12 | 4.7 | 133 |
1950-02-11 | 13:10:00 | TX | 2 | 2 | 0 | 0 | 29.42 | -95.25 | 9.9 | 400 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2014-12-23 | 14:46:00 | MS | 0 | 0 | 0 | 0 | 31.4247 | -89.5358 | 0.24 | 50 |
2014-12-23 | 15:22:00 | MS | 0 | 2 | 0 | 2 | 31.6986 | -89.2239 | 5.89 | 350 |
2014-12-23 | 16:27:00 | MS | 0 | 0 | 0 | 0 | 32.0005 | -88.4854 | 0.49 | 100 |
2014-12-24 | 09:30:00 | NC | 0 | 1 | 1 | 0 | 34.3124 | -77.917 | 0.2 | 25 |
2014-12-24 | 11:50:00 | GA | 0 | 1 | 0 | 0 | 31.406 | -82.251 | 5.69 | 100 |
2014-12-24 | 16:16:00 | OH | 0 | 0 | 0 | 0 | 39.728 | -82.634 | 0.63 | 30 |
2014-12-27 | 13:12:00 | TX | 0 | 0 | 0 | 0 | 30.3853 | -94.7823 | 0.23 | 75 |
2014-12-27 | 14:00:00 | TX | 0 | 1 | 0 | 0 | 30.6829 | -94.2553 | 4.25 | 50 |
2014-12-27 | 14:35:00 | TX | 0 | 1 | 0 | 0 | 30.864 | -93.979 | 1.98 | 150 |
2014-12-29 | 10:26:00 | GA | 0 | 2 | 9 | 0 | 30.8157 | -83.2833 | 0.74 | 180 |
2024-06-05 21:14:54.430 INFO 41732 --- [ main] com.et.tablesaw.DemoTests : clean some data
Table structure filtering
@Test
public void structurefilter() throws IOException {
System.out.println( tornadoes
.structure()
.where(tornadoes.structure().stringColumn("Column Type").isEqualTo("DOUBLE")));
}
outcome
2024-06-05 21:16:12.489 INFO 41766 --- [ main] com.et.tablesaw.DemoTests : init some data
Structure of tornadoes_1950-2014.csv
Index | Column Name | Column Type |
-----------------------------------------
7 | Start Lat | DOUBLE |
8 | Start Lon | DOUBLE |
9 | Length | DOUBLE |
2024-06-05 21:16:14.104 INFO 41766 --- [ main] com.et.tablesaw.DemoTests : clean some data
Preview the data
@Test
public void previewdata() throws IOException {
System.out.println(tornadoes.first(3));
}
outcome
2024-06-05 21:17:21.268 INFO 41798 --- [ main] com.et.tablesaw.DemoTests : init some data
tornadoes_1950-2014.csv
Date | Time | State | State No | Scale | Injuries | Fatalities | Start Lat | Start Lon | Length | Width |
-----------------------------------------------------------------------------------------------------------------------------------------
1950-01-03 | 11:00:00 | MO | 1 | 3 | 3 | 0 | 38.77 | -90.22 | 9.5 | 150 |
1950-01-03 | 11:00:00 | MO | 1 | 3 | 3 | 0 | 38.77 | -90.22 | 6.2 | 150 |
1950-01-03 | 11:10:00 | IL | 1 | 3 | 0 | 0 | 38.82 | -90.12 | 3.3 | 100 |
2024-06-05 21:17:23.206 INFO 41798 --- [ main] com.et.tablesaw.DemoTests : clean some data
Column operations
@Test
public void ColumnOperate() throws IOException {
StringColumn month = tornadoes.dateColumn("Date").month();
tornadoes.addColumns(month);
System.out.println(tornadoes.first(3));
tornadoes.removeColumns("State No");
System.out.println(tornadoes.first(3));
}
outcome
2024-06-05 21:20:33.554 INFO 41879 --- [ main] com.et.tablesaw.DemoTests : init some data
tornadoes_1950-2014.csv
Date | Time | State | State No | Scale | Injuries | Fatalities | Start Lat | Start Lon | Length | Width | Date month |
--------------------------------------------------------------------------------------------------------------------------------------------------------
1950-01-03 | 11:00:00 | MO | 1 | 3 | 3 | 0 | 38.77 | -90.22 | 9.5 | 150 | JANUARY |
1950-01-03 | 11:00:00 | MO | 1 | 3 | 3 | 0 | 38.77 | -90.22 | 6.2 | 150 | JANUARY |
1950-01-03 | 11:10:00 | IL | 1 | 3 | 0 | 0 | 38.82 | -90.12 | 3.3 | 100 | JANUARY |
tornadoes_1950-2014.csv
Date | Time | State | Scale | Injuries | Fatalities | Start Lat | Start Lon | Length | Width | Date month |
-------------------------------------------------------------------------------------------------------------------------------------------
1950-01-03 | 11:00:00 | MO | 3 | 3 | 0 | 38.77 | -90.22 | 9.5 | 150 | JANUARY |
1950-01-03 | 11:00:00 | MO | 3 | 3 | 0 | 38.77 | -90.22 | 6.2 | 150 | JANUARY |
1950-01-03 | 11:10:00 | IL | 3 | 0 | 0 | 38.82 | -90.12 | 3.3 | 100 | JANUARY |
2024-06-05 21:20:37.052 INFO 41879 --- [ main] com.et.tablesaw.DemoTests : clean some data
sort
@Test
public void sort() throws IOException {
tornadoes.sortOn("-Fatalities");
System.out.println(tornadoes.first(20));
}
outcome
2024-06-05 21:23:22.978 INFO 41945 --- [ main] com.et.tablesaw.DemoTests : init some data
tornadoes_1950-2014.csv
Date | Time | State | State No | Scale | Injuries | Fatalities | Start Lat | Start Lon | Length | Width |
-----------------------------------------------------------------------------------------------------------------------------------------
1950-01-03 | 11:00:00 | MO | 1 | 3 | 3 | 0 | 38.77 | -90.22 | 9.5 | 150 |
1950-01-03 | 11:00:00 | MO | 1 | 3 | 3 | 0 | 38.77 | -90.22 | 6.2 | 150 |
1950-01-03 | 11:10:00 | IL | 1 | 3 | 0 | 0 | 38.82 | -90.12 | 3.3 | 100 |
1950-01-03 | 11:55:00 | IL | 2 | 3 | 3 | 0 | 39.1 | -89.3 | 3.6 | 130 |
1950-01-03 | 16:00:00 | OH | 1 | 1 | 1 | 0 | 40.88 | -84.58 | 0.1 | 10 |
1950-01-13 | 05:25:00 | AR | 1 | 3 | 1 | 1 | 34.4 | -94.37 | 0.6 | 17 |
1950-01-25 | 19:30:00 | MO | 2 | 2 | 5 | 0 | 37.6 | -90.68 | 2.3 | 300 |
1950-01-25 | 21:00:00 | IL | 3 | 2 | 0 | 0 | 41.17 | -87.33 | 0.1 | 100 |
1950-01-26 | 18:00:00 | TX | 1 | 2 | 2 | 0 | 26.88 | -98.12 | 4.7 | 133 |
1950-02-11 | 13:10:00 | TX | 2 | 2 | 0 | 0 | 29.42 | -95.25 | 9.9 | 400 |
1950-02-11 | 13:50:00 | TX | 3 | 3 | 12 | 1 | 29.67 | -95.05 | 12 | 1000 |
1950-02-11 | 21:00:00 | TX | 4 | 2 | 5 | 0 | 32.35 | -95.2 | 4.6 | 100 |
1950-02-11 | 23:55:00 | TX | 5 | 2 | 6 | 0 | 32.98 | -94.63 | 4.5 | 67 |
1950-02-12 | 00:30:00 | TX | 6 | 2 | 8 | 1 | 33.33 | -94.42 | 8 | 833 |
1950-02-12 | 01:15:00 | TX | 7 | 1 | 0 | 0 | 32.08 | -98.35 | 2.3 | 233 |
1950-02-12 | 06:10:00 | TX | 8 | 2 | 0 | 0 | 31.52 | -96.55 | 3.4 | 100 |
1950-02-12 | 11:57:00 | TX | 9 | 1 | 32 | 0 | 31.8 | -94.2 | 7.7 | 100 |
1950-02-12 | 12:00:00 | TX | 10 | 3 | 15 | 3 | 31.8 | -94.2 | 1.9 | 50 |
1950-02-12 | 12:00:00 | MS | 2 | 1 | 0 | 0 | 34.6 | -89.12 | 2 | 10 |
1950-02-12 | 12:00:00 | MS | 1 | 2 | 2 | 3 | 34.6 | -89.12 | 0.1 | 10 |
2024-06-05 21:23:24.864 INFO 41945 --- [ main] com.et.tablesaw.DemoTests : clean some data
summary
@Test
public void summary() throws IOException {
System.out.println( tornadoes.column("Fatalities").summary().print());
}
outcome
2024-06-05 21:24:17.166 INFO 41963 --- [ main] com.et.tablesaw.DemoTests : init some data
Column: Fatalities
Measure | Value |
------------------------------------
Count | 59945 |
sum | 6802 |
Mean | 0.11347068145800349 |
Min | 0 |
Max | 158 |
Range | 158 |
Variance | 2.901978053261765 |
Std. Dev | 1.7035193140266314 |
2024-06-05 21:24:19.131 INFO 41963 --- [ main] com.et.tablesaw.DemoTests : clean some data
Data filtering
@Test
public void filter() throws IOException {
Table result = tornadoes.where(tornadoes.intColumn("Fatalities").isGreaterThan(0));
result = tornadoes.where(result.dateColumn("Date").isInApril());
result =
tornadoes.where(
result
.intColumn("Width")
.isGreaterThan(300) // 300 yards
.or(result.doubleColumn("Length").isGreaterThan(10))); // 10 miles
result = result.select("State", "Date");
System.out.println(result);
}
outcome
2024-06-05 21:25:03.671 INFO 41980 --- [ main] com.et.tablesaw.DemoTests : init some data
tornadoes_1950-2014.csv
State | Date |
------------------------
MO | 1950-01-03 |
IL | 1950-01-03 |
OH | 1950-01-03 |
AR | 1950-01-13 |
MO | 1950-01-25 |
IL | 1950-01-25 |
TX | 1950-02-12 |
TX | 1950-02-12 |
LA | 1950-02-12 |
AR | 1950-02-12 |
... | ... |
KS | 1951-06-21 |
OK | 1951-06-21 |
KS | 1951-06-23 |
WV | 1951-06-26 |
KS | 1951-06-27 |
IL | 1951-06-27 |
PA | 1951-06-27 |
SD | 1951-07-20 |
KS | 1951-08-06 |
NC | 1951-08-09 |
2024-06-05 21:25:05.764 INFO 41980 --- [ main] com.et.tablesaw.DemoTests : clean some data
Write to the file
@Test
public void write() throws IOException {
tornadoes.write().csv("rev_tornadoes_1950-2014-test.csv");
Read data from MySQL
@Resource
private JdbcTemplate jdbcTemplate;
@Test
public void datafrommysql() throws IOException {
Table table = jdbcTemplate.query("SELECT user_id,username,age from user_info", new ResultSetExtractor<Table>() {
@Override
public Table extractData(ResultSet resultSet) throws SQLException, DataAccessException {
return Table.read().db(resultSet);
}
});
System.out.println(table);
}
outcome
2024-06-05 21:26:04.963 INFO 42016 --- [ main] com.et.tablesaw.DemoTests : init some data
2024-06-05 21:26:06.622 INFO 42016 --- [ main] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Starting...
2024-06-05 21:26:06.949 INFO 42016 --- [ main] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Start completed.
user_id | username | age |
--------------------------------
1 | 1 | 1 |
2024-06-05 21:26:07.037 INFO 42016 --- [ main] com.et.tablesaw.DemoTests : clean some data
Data visualization
/*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.et.tablesaw;
import java.io.IOException;
import tech.tablesaw.api.Table;
import tech.tablesaw.plotly.Plot;
import tech.tablesaw.plotly.api.BubblePlot;
import tech.tablesaw.plotly.components.Figure;
/** */
public class BubbleExample2 {
public static void main(String[] args) throws IOException {
Table wines = Table.read().csv("/Users/liuhaihua/IdeaProjects/springboot-demo/tablesaw/src/main/resources/data/test_wines.csv");
Table champagne =
wines.where(
wines
.stringColumn("wine type")
.isEqualTo("Champagne & Sparkling")
.and(wines.stringColumn("region").isEqualTo("California")));
Figure figure =
BubblePlot.create(
"Average retail price for champagnes by year and rating",
champagne, // table namex
"highest pro score", // x variable column name
"year", // y variable column name
"Mean Retail" // bubble size
);
Plot.show(figure);
}
}
The results are shown in the figure below
The above are just some of the key codes, all of which can be found in the repositories below