File | Date | Author | Commit |
---|---|---|---|
.github | 2021-06-14 | Vikrant Dubey | [f0255b] Update pr_checks.yml |
api | 2021-06-15 | Vikrant Dubey | [c2721a] Refactoring notebook jobs services |
docs | 2021-06-15 | Vikrant Dubey | [599eca] Update index.md |
ui | 2021-06-11 | PraveenCuebook | [05c820] scheduleTestCases removed dead code from schedules |
zeppelinConf | 2021-06-08 | Prabhat Dubey | [3a77ed] Update interpreter.json |
.gitallowed | 2021-05-26 | vincue | [b241a6] Adding pushed key to gitallowed |
.gitignore | 2021-05-19 | vincue | [cb0da0] Gitignored package-lock.json |
CODE_OF_CONDUCT.md | 2021-04-13 | Sachin Bansal | [f289ed] Create CODE_OF_CONDUCT.md |
Dockerfile | 2021-05-06 | Vikrant | [86608f] Fixing build issues |
LICENSE | 2021-04-12 | vikrantcue | [f1c617] Create LICENSE |
README.md | 2021-06-15 | Vikrant Dubey | [66245d] Update README.md |
cuelake.yaml | 2021-06-09 | Prabhat Dubey | [510f0a] Adding pods access in default role |
nginx.conf | 2021-06-03 | Vikrant Dubey | [790fb8] Spark UI fixes |
With CueLake, you can use SQL to build ELT
(Extract, Load, Transform) pipelines on a data lakehouse.
You write Spark SQL statements in Zeppelin notebooks. You then schedule these notebooks using workflows (DAGs).
To extract and load incremental data, you write simple select statements. CueLake executes these statements against your databases and then merges incremental data into your data lakehouse (powered by Apache Iceberg).
To transform data, you write SQL statements to create views and tables in your data lakehouse.
CueLake uses Celery as the executor and celery-beat as the scheduler. Celery jobs trigger Zeppelin notebooks. Zeppelin auto-starts and stops the Spark cluster for every scheduled run of notebooks.
To know why we are building CueLake, read our viewpoint.
CueLake uses Kubernetes kubectl
for installation. Create a namespace and then install using the cuelake.yaml
file. Creating a namespace is optional. You can install in the default namespace or in any existing namespace.
In the commands below, we use cuelake
as the namespace.
kubectl create namespace cuelake
kubectl apply -f https://raw.githubusercontent.com/cuebook/cuelake/main/cuelake.yaml -n cuelake
kubectl port-forward services/lakehouse 8080:80 -n cuelake
Now visit http://localhost:8080 in your browser.
If you don’t want to use Kubernetes and instead want to try it out on your local machine first, we’ll soon have a docker-compose version. Let us know if you’d want that sooner.
merge into
query to automatically merge incremental data.For general help using CueLake, read the documentation, or go to Github Discussions.
To report a bug or request a feature, open an issue.
Join our cuelake discord server and ask your questions to the developers directly.
We'd love contributions to CueLake. Before you contribute, please first discuss the change you wish to make via an issue or a discussion. Contributors are expected to adhere to our code of conduct.