With CueLake, you can use SQL to build ELT (Extract, Load, Transform) pipelines on a data lakehouse. You write Spark SQL statements in Zeppelin notebooks. You then schedule these notebooks using workflows (DAGs). To extract and load incremental data, you write simple select statements. CueLake executes these statements against your databases and then merges incremental data into your data lakehouse (powered by Apache Iceberg). To transform data, you write SQL statements to create views and tables in your data lakehouse. CueLake uses Celery as the executor and celery-beat as the scheduler. Celery jobs trigger Zeppelin notebooks. Zeppelin auto-starts and stops the Spark cluster for every scheduled run of notebooks.

Features

  • Upsert Incremental data
  • Create Views in data lakehouse
  • Elastically Scale Cloud Infrastructure
  • Automated maintenance of Iceberg tables
  • Versioning in Github
  • Your data always stays within your cloud account

Project Samples

Project Activity

See All Activity >

Categories

Data Pipeline

License

Apache License V2.0

Follow CueLake

CueLake Web Site

Other Useful Business Software
MongoDB Atlas runs apps anywhere Icon
MongoDB Atlas runs apps anywhere

Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of CueLake!

Additional Project Details

Programming Language

JavaScript

Related Categories

JavaScript Data Pipeline Tool

Registered

2023-06-12