Ploomber is an open-source framework designed to simplify the development and deployment of data science and machine learning pipelines. It allows developers to transform exploratory data analysis workflows into production-ready pipelines without rewriting large portions of code. The system integrates with common development environments such as Jupyter Notebook, VS Code, and PyCharm, enabling data scientists to continue working with familiar tools while building scalable workflows. Ploomber automatically manages task dependencies and execution order, allowing complex pipelines with multiple stages to run reliably. The framework can deploy pipelines across different computing environments including Kubernetes, Airflow, AWS Batch, and high-performance computing clusters. It also helps teams maintain reproducibility by tracking changes in code and rerunning only outdated pipeline tasks.
Features
- Framework for building maintainable machine learning and data pipelines
- Integration with development tools such as Jupyter Notebook, VS Code, and PyCharm
- Automatic management of task dependencies and pipeline execution order
- Deployment support for environments like Kubernetes, Airflow, and cloud compute platforms
- Tools for converting exploratory notebooks into modular production pipelines
- Incremental execution that reruns only tasks affected by code changes