With CueLake, you can use SQL to build ELT (Extract, Load, Transform) pipelines on a data lakehouse. You write Spark SQL statements in Zeppelin notebooks. You then schedule these notebooks using workflows (DAGs). To extract and load incremental data, you write simple select statements. CueLake executes these statements against your databases and then merges incremental data into your data lakehouse (powered by Apache Iceberg). To transform data, you write SQL statements to create views and tables in your data lakehouse. CueLake uses Celery as the executor and celery-beat as the scheduler. Celery jobs trigger Zeppelin notebooks. Zeppelin auto-starts and stops the Spark cluster for every scheduled run of notebooks.

Features

  • Upsert Incremental data
  • Create Views in data lakehouse
  • Elastically Scale Cloud Infrastructure
  • Automated maintenance of Iceberg tables
  • Versioning in Github
  • Your data always stays within your cloud account

Project Samples

Project Activity

See All Activity >

Categories

Data Pipeline

License

Apache License V2.0

Follow CueLake

CueLake Web Site

Other Useful Business Software
Go from Code to Production URL in Seconds Icon
Go from Code to Production URL in Seconds

Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.
Try it free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of CueLake!

Additional Project Details

Programming Language

JavaScript

Related Categories

JavaScript Data Pipeline Tool

Registered

2023-06-12