Data processing for and with foundation models
SDG is a specialized framework
Git-based data version control for machine learning workflows
Collection of useful data science topics along with articles
Data science interview questions and answers
Self-learning data agent that grounds its answers in layers of content
An end-to-end Data Scientist
Synthetic Data Generation for tabular, relational and time series data
A Collection of Cheatsheets, Books, Questions, and Portfolio
Deep Research framework, combining language models with tools
OCRmyPDF adds an OCR text layer to scanned PDF files
Label Studio is a multi-type data labeling and annotation tool
Machine learning in Python
AI coding assistant skill (Claude Code, Codex, OpenCode, OpenClaw)
A reactive notebook for Python
Deterministic LLMs Outputs for AI Applications and AI Agents
A Simple and Universal Swarm Intelligence Engine
1 min voice data can also be used to train a good TTS model
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine
Effortless data labeling with AI support from Segment Anything
GeoAI: Artificial Intelligence for Geospatial Data
The standard data-centric AI package for data quality and ML
AI-driven multi-agent research assistant automating hypothesis
Central interface to connect your LLM's with external data
Benchmarking synthetic data generation methods