Apache Gobblin
A distributed data integration framework that simplifies common aspects of Big Data integration such as data ingestion, replication, organization, and lifecycle management for both streaming and batch data ecosystems. Runs as a standalone application on a single box. Also supports embedded mode. Runs as an mapreduce application on multiple Hadoop versions. Also supports Azkaban for launching mapreduce jobs. Runs as a standalone cluster with primary and worker nodes. This mode supports high availability and can run on bare metals as well. Runs as an elastic cluster on public cloud. This mode supports high availability. Gobblin as it exists today is a framework that can be used to build different data integration applications like ingest, replication, etc. Each of these applications is typically configured as a separate job and executed through a scheduler like Azkaban.
Learn more
Apache Spark
Apache Spark™ is a unified analytics engine for large-scale data processing. Apache Spark achieves high performance for both batch and streaming data, using a state-of-the-art DAG scheduler, a query optimizer, and a physical execution engine. Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python, R, and SQL shells. Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application. Spark runs on Hadoop, Apache Mesos, Kubernetes, standalone, or in the cloud. It can access diverse data sources. You can run Spark using its standalone cluster mode, on EC2, on Hadoop YARN, on Mesos, or on Kubernetes. Access data in HDFS, Alluxio, Apache Cassandra, Apache HBase, Apache Hive, and hundreds of other data sources.
Learn more
Apache Heron
Heron is built with a wide array of architectural improvements that contribute to high-efficiency gains. Heron is API compatible with Apache Storm and hence no code change is required for migration. Easily debug and identify the issues in topologies, allowing faster iteration during development. Heron UI gives a visual overview of each topology to visualize hot spot locations and detailed counters for tracking progress and troubleshooting. Heron is highly scalable both in the ability to execute large number of components for each topology and the ability to launch and track large numbers of topologies.
Learn more
Striim
Data integration for your hybrid cloud. Modern, reliable data integration across your private and public cloud. All in real-time with change data capture and data streams. Built by the executive & technical team from GoldenGate Software, Striim brings decades of experience in mission-critical enterprise workloads. Striim scales out as a distributed platform in your environment or in the cloud. Scalability is fully configurable by your team. Striim is fully secure with HIPAA and GDPR compliance. Built ground up for modern enterprise workloads in the cloud or on-premise. Drag and drop to create data flows between your sources and targets. Process, enrich, and analyze your streaming data with real-time SQL queries.
Learn more