Data processing for and with foundation models
Collection of useful data science topics along with articles
Data science interview questions and answers
Git-based data version control for machine learning workflows
SDG is a specialized framework
Self-learning data agent that grounds its answers in layers of content
An end-to-end Data Scientist
A Collection of Cheatsheets, Books, Questions, and Portfolio
Synthetic Data Generation for tabular, relational and time series data
Official DeiT repository
OCRmyPDF adds an OCR text layer to scanned PDF files
Machine learning in Python
Central interface to connect your LLM's with external data
Training data (data labeling, annotation, workflow) for all data types
Uncover insights, surface problems, monitor, and fine tune your LLM
Label Studio is a multi-type data labeling and annotation tool
The open-source tool for building high-quality datasets
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine
Conditional GAN for generating synthetic tabular data
Cloud-native open source data warehouse for analytics and AI queries
Wan2.2: Open and Advanced Large-Scale Video Generative Model
Code for running inference and finetuning with SAM 3 model
1 min voice data can also be used to train a good TTS model
Effortless data labeling with AI support from Segment Anything
A reactive notebook for Python