Kylo
Kylo is an open source enterprise-ready data lake management software platform for self-service data ingest and data preparation with integrated metadata management, governance, security and best practices inspired by Think Big's 150+ big data implementation projects. Self-service data ingest with data cleansing, validation, and automatic profiling. Wrangle data with visual sql and an interactive transform through a simple user interface. Search and explore data and metadata, view lineage, and profile statistics. Monitor health of feeds and services in the data lake. Track SLAs and troubleshoot performance. Design batch or streaming pipeline templates in Apache NiFi and register with Kylo to enable user self-service. Organizations can expend significant engineering effort moving data into Hadoop yet struggle to maintain governance and data quality. Kylo dramatically simplifies data ingest by shifting ingest to data owners through a simple guided UI.
Learn more
Cloudera DataFlow
Cloudera DataFlow for the Public Cloud (CDF-PC) is a cloud-native universal data distribution service powered by Apache NiFi that lets developers connect to any data source anywhere with any structure, process it, and deliver to any destination. CDF-PC offers a flow-based low-code development paradigm that aligns best with how developers design, develop, and test data distribution pipelines. With over 400+ connectors and processors across the ecosystem of hybrid cloud services—including data lakes, lakehouses, cloud warehouses, and on-premises sources—CDF-PC provides indiscriminate data distribution. These data distribution flows can then be version-controlled into a catalog where operators can self-serve deployments to different runtimes.
Learn more
Apache NiFi
An easy to use, powerful, and reliable system to process and distribute data. Apache NiFi supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. Some of the high-level capabilities and objectives of Apache NiFi include web-based user interface, offering a seamless experience between design, control, feedback, and monitoring. Highly configurable, loss tolerant, low latency, high throughput, and dynamic prioritization. Flow can be modified at runtime, back pressure, data provenance, track dataflow from beginning to end, designed for extension. Build your own processors and more. Enables rapid development and effective testing. Secure, SSL, SSH, HTTPS, encrypted content, and much more. Multi-tenant authorization and internal authorization/policy management. NiFi is comprised of a number of web applications (web UI, web API, documentation, custom UI's, etc). So, you'll need to set up your mapping to the root path.
Learn more
Rivery
Rivery’s SaaS ETL platform provides a fully-managed solution for data ingestion, transformation, orchestration, reverse ETL and more, with built-in support for your development and deployment lifecycles.
Key Features:
Data Workflow Templates: Extensive library of pre-built templates that enable teams to instantly create powerful data pipelines with the click of a button.
Fully managed: No-code, auto-scalable, and hassle-free platform. Rivery takes care of the back end, allowing teams to spend time on priorities rather than maintenance.
Multiple Environments: Construct and clone custom environments for specific teams or projects.
Reverse ETL: Automatically send data from cloud warehouses to business applications, marketing clouds, CPD’s, and more.
Learn more