Menu

Tree [1de6ed] 2.1.0-release /
 History

HTTPS access


File Date Author Commit
 .github 2022-03-10 CalvinKirs CalvinKirs [2c88ac] Ignore license-py tools, since I changed and re...
 .idea 2022-02-07 lamberken lamberken [95b4ef] [Feature][patch] Improve the title of patch (#1...
 .mvn 2022-02-28 Kirs Kirs [28e987] [Improve][CI]Upgrade maven-wrapper to 3.1.0 (#1...
 bin 2022-03-08 Kirs Kirs [31b0b2] [Improve][tools] Move the license check script ...
 config 2022-03-07 mans2singh mans2singh [a1aa84] [hotfix][docs] Updated conf template to indicat...
 deploy 2022-01-29 Benedict Jin Benedict Jin [df0ebf] [SeaTunnel#1183] Fix dead link (#1184)
 docs 2022-03-10 CalvinKirs CalvinKirs [2c88ac] Ignore license-py tools, since I changed and re...
 plugins 2022-01-12 Kirs Kirs [0e1e9e] Revert "[hotfix] remove plugins folder (#1009)"...
 seatunnel-apis 2022-03-10 kirs kirs [1de6ed] [maven-release-plugin] prepare for next develop...
 seatunnel-common 2022-03-10 kirs kirs [1de6ed] [maven-release-plugin] prepare for next develop...
 seatunnel-config 2022-03-10 kirs kirs [1de6ed] [maven-release-plugin] prepare for next develop...
 seatunnel-connectors 2022-03-10 kirs kirs [1de6ed] [maven-release-plugin] prepare for next develop...
 seatunnel-core 2022-03-10 kirs kirs [1de6ed] [maven-release-plugin] prepare for next develop...
 seatunnel-dist 2022-03-10 kirs kirs [1de6ed] [maven-release-plugin] prepare for next develop...
 seatunnel-examples 2022-03-10 kirs kirs [1de6ed] [maven-release-plugin] prepare for next develop...
 seatunnel-transforms 2022-03-10 kirs kirs [1de6ed] [maven-release-plugin] prepare for next develop...
 tools 2022-03-10 CalvinKirs CalvinKirs [2c88ac] Ignore license-py tools, since I changed and re...
 .asf.yaml 2021-12-29 Kirs Kirs [bd6e8e] [Github]add Labels (#887)
 .dlc.json 2021-12-30 lifeng lifeng [f576a2] Update .dlc.json (#897)
 .gitignore 2022-01-19 Kirs Kirs [c4f63c] [License#905] Check dependencies' binary licens...
 .licenserc.yaml 2022-02-26 Kirs Kirs [76973f] [Chore][License]Add license header to META-INF/...
 .scalafmt.conf 2021-12-18 Simon Simon [3bae2d] fix conflicts (#806)
 DISCLAIMER 2021-12-29 Kirs Kirs [02d002] [License]add Notice and Disclaimer (#880)
 LICENSE 2022-02-09 Yves Yves [3e7d04] Fix #1186 (#1187)
 NOTICE 2022-01-03 Kirs Kirs [97048a] Update NOTICE (#928)
 README.md 2022-03-09 Kirs Kirs [e57772] [Docs]Readme added incubating instructions (#1448)
 README_zh_CN.md 2022-03-09 Kirs Kirs [e57772] [Docs]Readme added incubating instructions (#1448)
 build.md 2022-03-08 Kirs Kirs [d6a16b] [Docs] Update build doc (#1435)
 mvnw 2022-02-28 Kirs Kirs [28e987] [Improve][CI]Upgrade maven-wrapper to 3.1.0 (#1...
 mvnw.cmd 2022-02-28 Kirs Kirs [28e987] [Improve][CI]Upgrade maven-wrapper to 3.1.0 (#1...
 pom.xml 2022-03-10 kirs kirs [1de6ed] [maven-release-plugin] prepare for next develop...

Read Me

Apache SeaTunnel (Incubating)

seatunnel logo

Backend Workflow
Slack
Twitter Follow


EN doc
CN doc

SeaTunnel was formerly named Waterdrop , and renamed SeaTunnel since October 12, 2021.


SeaTunnel is a very easy-to-use ultra-high-performance distributed data integration platform that supports real-time
synchronization of massive data. It can synchronize tens of billions of data stably and efficiently every day, and has
been used in the production of nearly 100 companies.

Why do we need SeaTunnel

SeaTunnel will do its best to solve the problems that may be encountered in the synchronization of massive data:

  • Data loss and duplication
  • Task accumulation and delay
  • Low throughput
  • Long cycle to be applied in the production environment
  • Lack of application running status monitoring

SeaTunnel use scenarios

  • Mass data synchronization
  • Mass data integration
  • ETL with massive data
  • Mass data aggregation
  • Multi-source data processing

Features of SeaTunnel

  • Easy to use, flexible configuration, low code development
  • Real-time streaming
  • Offline multi-source data analysis
  • High-performance, massive data processing capabilities
  • Modular and plug-in mechanism, easy to extend
  • Support data processing and aggregation by SQL
  • Support Spark structured streaming
  • Support Spark 2.x

Workflow of SeaTunnel

seatunnel-workflow.svg

Source[Data Source Input] -> Transform[Data Processing] -> Sink[Result Output]

The data processing pipeline is constituted by multiple filters to meet a variety of data processing needs. If you are
accustomed to SQL, you can also directly construct a data processing pipeline by SQL, which is simple and efficient.
Currently, the filter list supported by SeaTunnel is still being expanded. Furthermore, you can develop your own data
processing plug-in, because the whole system is easy to expand.

Plugins supported by SeaTunnel

Spark Connector Plugins
Database Type
Source
Sink
Batch Fake doc
ElasticSearch doc doc
File doc doc
Hive doc doc
Hudi doc doc
Jdbc doc doc
MongoDB doc doc
neo4j doc
Phoenix doc doc
Redis doc doc
Tidb doc doc
Clickhouse doc
Doris doc
Email doc
Hbase doc doc
Kafka doc
Console doc
Kudu doc doc
Redis doc doc
Stream FakeStream doc
KafkaStream doc
SocketStream doc
Flink Connector Plugins
Database Type
Source
Sink
Druid doc doc
Fake doc
File doc doc
InfluxDb doc doc
Jdbc doc doc
Kafka doc doc
Socket doc
Console doc
Doris doc
ElasticSearch doc
Transform Plugins
Spark
Flink
Add
CheckSum
Convert
Date
Drop
Grok
Json doc
Kv
Lowercase
Remove
Rename
Repartition
Replace
Sample
Split doc doc
Sql doc doc
Table
Truncate
Uppercase
Uuid

Environmental dependency

  1. java runtime environment, java >= 8

  2. If you want to run SeaTunnel in a cluster environment, any of the following Spark cluster environments is usable:

  3. Spark on Yarn

  4. Spark Standalone

If the data volume is small, or the goal is merely for functional verification, you can also start in local mode without
a cluster environment, because SeaTunnel supports standalone operation. Note: SeaTunnel 2.0 supports running on Spark
and Flink.

Downloads

Download address for run-directly software package :https://github.com/apache/incubator-seatunnel/releases

Quick start

Spark
https://seatunnel.apache.org/docs/spark/quick-start

Flink
https://seatunnel.apache.org/docs/flink/quick-start

Detailed documentation on SeaTunnel
https://seatunnel.apache.org/docs/introduction

Application practice cases

  • Weibo, Value-added Business Department Data Platform

Weibo business uses an internal customized version of SeaTunnel and its sub-project Guardian for SeaTunnel On Yarn task
monitoring for hundreds of real-time streaming computing tasks.

  • Sina, Big Data Operation Analysis Platform

Sina Data Operation Analysis Platform uses SeaTunnel to perform real-time and offline analysis of data operation and
maintenance for Sina News, CDN and other services, and write it into Clickhouse.

  • Sogou, Sogou Qiqian System

Sogou Qiqian System takes SeaTunnel as an ETL tool to help establish a real-time data warehouse system.

  • Qutoutiao, Qutoutiao Data Center

Qutoutiao Data Center uses SeaTunnel to support mysql to hive offline ETL tasks, real-time hive to clickhouse backfill
technical support, and well covers most offline and real-time tasks needs.

  • Yixia Technology, Yizhibo Data Platform

  • Yonghui Superstores Founders' Alliance-Yonghui Yunchuang Technology, Member E-commerce Data Analysis Platform

SeaTunnel provides real-time streaming and offline SQL computing of e-commerce user behavior data for Yonghui Life, a
new retail brand of Yonghui Yunchuang Technology.

  • Shuidichou, Data Platform

Shuidichou adopts SeaTunnel to do real-time streaming and regular offline batch processing on Yarn, processing 3~4T data
volume average daily, and later writing the data to Clickhouse.

  • Tencent Cloud

Collecting various logs from business services into Apache Kafka, some of the data in Apache Kafka is consumed and extracted through Seatunnel, and then store into Clickhouse.

For more use cases, please refer to: https://seatunnel.apache.org/blog

Code of conduct

This project adheres to the Contributor Covenant code of conduct.
By participating, you are expected to uphold this code. Please follow
the REPORTING GUIDELINES to report
unacceptable behavior.

Developer

Thanks to all developers!

Contact Us

Landscapes



  

SeaTunnel enriches the CNCF CLOUD NATIVE Landscape.

Our Users

Various companies and organizations use SeaTunnel for research, production and commercial products.
Visit our website to find the user page.

License

Apache 2.0 License.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.