incubator-seatunnel Code

SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).

Brought to you by: niko-zeng

Tree [1de6ed] 2.1.0-release /

History

HTTPS access

File	Date	Author	Commit
.github	2022-03-10	CalvinKirs	[2c88ac] Ignore license-py tools, since I changed and re...
.idea	2022-02-07	lamberken	[95b4ef] [Feature][patch] Improve the title of patch (#1...
.mvn	2022-02-28	Kirs	[28e987] [Improve][CI]Upgrade maven-wrapper to 3.1.0 (#1...
bin	2022-03-08	Kirs	[31b0b2] [Improve][tools] Move the license check script ...
config	2022-03-07	mans2singh	[a1aa84] [hotfix][docs] Updated conf template to indicat...
deploy	2022-01-29	Benedict Jin	[df0ebf] [SeaTunnel#1183] Fix dead link (#1184)
docs	2022-03-10	CalvinKirs	[2c88ac] Ignore license-py tools, since I changed and re...
plugins	2022-01-12	Kirs	[0e1e9e] Revert "[hotfix] remove plugins folder (#1009)"...
seatunnel-apis	2022-03-10	kirs	[1de6ed] [maven-release-plugin] prepare for next develop...
seatunnel-common	2022-03-10	kirs	[1de6ed] [maven-release-plugin] prepare for next develop...
seatunnel-config	2022-03-10	kirs	[1de6ed] [maven-release-plugin] prepare for next develop...
seatunnel-connectors	2022-03-10	kirs	[1de6ed] [maven-release-plugin] prepare for next develop...
seatunnel-core	2022-03-10	kirs	[1de6ed] [maven-release-plugin] prepare for next develop...
seatunnel-dist	2022-03-10	kirs	[1de6ed] [maven-release-plugin] prepare for next develop...
seatunnel-examples	2022-03-10	kirs	[1de6ed] [maven-release-plugin] prepare for next develop...
seatunnel-transforms	2022-03-10	kirs	[1de6ed] [maven-release-plugin] prepare for next develop...
tools	2022-03-10	CalvinKirs	[2c88ac] Ignore license-py tools, since I changed and re...
.asf.yaml	2021-12-29	Kirs	[bd6e8e] [Github]add Labels (#887)
.dlc.json	2021-12-30	lifeng	[f576a2] Update .dlc.json (#897)
.gitignore	2022-01-19	Kirs	[c4f63c] [License#905] Check dependencies' binary licens...
.licenserc.yaml	2022-02-26	Kirs	[76973f] [Chore][License]Add license header to META-INF/...
.scalafmt.conf	2021-12-18	Simon	[3bae2d] fix conflicts (#806)
DISCLAIMER	2021-12-29	Kirs	[02d002] [License]add Notice and Disclaimer (#880)
LICENSE	2022-02-09	Yves	[3e7d04] Fix #1186 (#1187)
NOTICE	2022-01-03	Kirs	[97048a] Update NOTICE (#928)
README.md	2022-03-09	Kirs	[e57772] [Docs]Readme added incubating instructions (#1448)
README_zh_CN.md	2022-03-09	Kirs	[e57772] [Docs]Readme added incubating instructions (#1448)
build.md	2022-03-08	Kirs	[d6a16b] [Docs] Update build doc (#1435)
mvnw	2022-02-28	Kirs	[28e987] [Improve][CI]Upgrade maven-wrapper to 3.1.0 (#1...
mvnw.cmd	2022-02-28	Kirs	[28e987] [Improve][CI]Upgrade maven-wrapper to 3.1.0 (#1...
pom.xml	2022-03-10	kirs	[1de6ed] [maven-release-plugin] prepare for next develop...

Read Me

Apache SeaTunnel (Incubating)

SeaTunnel was formerly named Waterdrop , and renamed SeaTunnel since October 12, 2021.

SeaTunnel is a very easy-to-use ultra-high-performance distributed data integration platform that supports real-time
synchronization of massive data. It can synchronize tens of billions of data stably and efficiently every day, and has
been used in the production of nearly 100 companies.

Why do we need SeaTunnel

SeaTunnel will do its best to solve the problems that may be encountered in the synchronization of massive data:

Data loss and duplication
Task accumulation and delay
Low throughput
Long cycle to be applied in the production environment
Lack of application running status monitoring

SeaTunnel use scenarios

Mass data synchronization
Mass data integration
ETL with massive data
Mass data aggregation
Multi-source data processing

Features of SeaTunnel

Easy to use, flexible configuration, low code development
Real-time streaming
Offline multi-source data analysis
High-performance, massive data processing capabilities
Modular and plug-in mechanism, easy to extend
Support data processing and aggregation by SQL
Support Spark structured streaming
Support Spark 2.x

Workflow of SeaTunnel

Source[Data Source Input] -> Transform[Data Processing] -> Sink[Result Output]

The data processing pipeline is constituted by multiple filters to meet a variety of data processing needs. If you are
accustomed to SQL, you can also directly construct a data processing pipeline by SQL, which is simple and efficient.
Currently, the filter list supported by SeaTunnel is still being expanded. Furthermore, you can develop your own data
processing plug-in, because the whole system is easy to expand.

Plugins supported by SeaTunnel

Spark Connector Plugins	Database Type	Source	Sink
Batch	Fake	doc
	ElasticSearch	doc	doc
	File	doc	doc
	Hive	doc	doc
	Hudi	doc	doc
	Jdbc	doc	doc
	MongoDB	doc	doc
	neo4j	doc
	Phoenix	doc	doc
	Redis	doc	doc
	Tidb	doc	doc
	Clickhouse		doc
	Doris		doc
	Email		doc
	Hbase	doc	doc
	Kafka		doc
	Console		doc
	Kudu	doc	doc
	Redis	doc	doc
Stream	FakeStream	doc
	KafkaStream	doc
	SocketStream	doc

Database Type	Source	Sink
Druid	doc	doc
Fake	doc
File	doc	doc
InfluxDb	doc	doc
Jdbc	doc	doc
Kafka	doc	doc
Socket	doc
Console		doc
Doris		doc
ElasticSearch		doc

Transform Plugins	Spark	Flink
Add
CheckSum
Convert
Date
Drop
Grok
Json	doc
Kv
Lowercase
Remove
Rename
Repartition
Replace
Sample
Split	doc	doc
Sql	doc	doc
Table
Truncate
Uppercase
Uuid

Environmental dependency

java runtime environment, java >= 8
If you want to run SeaTunnel in a cluster environment, any of the following Spark cluster environments is usable:
Spark on Yarn
Spark Standalone

If the data volume is small, or the goal is merely for functional verification, you can also start in local mode without
a cluster environment, because SeaTunnel supports standalone operation. Note: SeaTunnel 2.0 supports running on Spark
and Flink.

Downloads

Download address for run-directly software package :https://github.com/apache/incubator-seatunnel/releases

Quick start

Spark
https://seatunnel.apache.org/docs/spark/quick-start

Flink
https://seatunnel.apache.org/docs/flink/quick-start

Detailed documentation on SeaTunnel
https://seatunnel.apache.org/docs/introduction

Application practice cases

Weibo, Value-added Business Department Data Platform

Weibo business uses an internal customized version of SeaTunnel and its sub-project Guardian for SeaTunnel On Yarn task
monitoring for hundreds of real-time streaming computing tasks.

Sina, Big Data Operation Analysis Platform

Sina Data Operation Analysis Platform uses SeaTunnel to perform real-time and offline analysis of data operation and
maintenance for Sina News, CDN and other services, and write it into Clickhouse.

Sogou, Sogou Qiqian System

Sogou Qiqian System takes SeaTunnel as an ETL tool to help establish a real-time data warehouse system.

Qutoutiao, Qutoutiao Data Center

Qutoutiao Data Center uses SeaTunnel to support mysql to hive offline ETL tasks, real-time hive to clickhouse backfill
technical support, and well covers most offline and real-time tasks needs.

Yixia Technology, Yizhibo Data Platform
Yonghui Superstores Founders' Alliance-Yonghui Yunchuang Technology, Member E-commerce Data Analysis Platform

SeaTunnel provides real-time streaming and offline SQL computing of e-commerce user behavior data for Yonghui Life, a
new retail brand of Yonghui Yunchuang Technology.

Shuidichou, Data Platform

Shuidichou adopts SeaTunnel to do real-time streaming and regular offline batch processing on Yarn, processing 3~4T data
volume average daily, and later writing the data to Clickhouse.

Tencent Cloud

Collecting various logs from business services into Apache Kafka, some of the data in Apache Kafka is consumed and extracted through Seatunnel, and then store into Clickhouse.

For more use cases, please refer to: https://seatunnel.apache.org/blog

Code of conduct

This project adheres to the Contributor Covenant code of conduct.
By participating, you are expected to uphold this code. Please follow
the REPORTING GUIDELINES to report
unacceptable behavior.

Developer

Thanks to all developers!

Contact Us

Mail list: dev@seatunnel.apache.org. Mail to dev-subscribe@seatunnel.apache.org, follow the reply to subscribe
the mail list.
Slack: https://join.slack.com/t/apacheseatunnel/shared_invite/zt-123jmewxe-RjB_DW3M3gV~xL91pZ0oVQ
Twitter: https://twitter.com/ASFSeaTunnel
Bilibili (for Chinese users)

Landscapes

SeaTunnel enriches the CNCF CLOUD NATIVE Landscape.

Our Users

Various companies and organizations use SeaTunnel for research, production and commercial products.
Visit our website to find the user page.

License

Apache 2.0 License.

incubator-seatunnel Code

SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).

Branches

Tags

Tree [1de6ed] 2.1.0-release /

History

Read Me

Apache SeaTunnel (Incubating)

Why do we need SeaTunnel

SeaTunnel use scenarios

Features of SeaTunnel

Workflow of SeaTunnel

Plugins supported by SeaTunnel

Environmental dependency

Downloads

Quick start

Application practice cases

Code of conduct

Developer

Contact Us

Landscapes

Our Users

License

incubator-seatunnel Code

SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).

Branches

Tags

Tree [1de6ed] 2.1.0-release / Download Snapshot History

Read Me

Apache SeaTunnel (Incubating)

Why do we need SeaTunnel

SeaTunnel use scenarios

Features of SeaTunnel

Workflow of SeaTunnel

Plugins supported by SeaTunnel

Environmental dependency

Downloads

Quick start

Application practice cases

Code of conduct

Developer

Contact Us

Landscapes

Our Users

License

Tree [1de6ed] 2.1.0-release /

History