Data processing for and with foundation models
SDG is a specialized framework
Uncover insights, surface problems, monitor, and fine tune your LLM
The open-source tool for building high-quality datasets
Synthetic Data Generation for tabular, relational and time series data
Training data (data labeling, annotation, workflow) for all data types
The standard data-centric AI package for data quality and ML
Create HTML profiling reports from pandas DataFrame objects
Wan2.2: Open and Advanced Large-Scale Video Generative Model
Benchmarking synthetic data generation methods
A high-quality tool for convert PDF to Markdown and JSON
Curated list of classic, high-quality computer science books
Synthetic data curation for post-training and data extraction
Flexible Photo Recrafting While Preserving Your Identity
Automatically Visualize any dataset, any size
A high-quality rapid TTS voice cloning model
Extract schema, statistics and entities from datasets
Toloka-Kit is a Python library for working with Toloka API
Collaborative & Open-Source Quality Assurance for all AI models
DeepVariant is an analysis pipeline that uses a deep neural networks
Multi-Agents LLM Financial Trading Framework
An unsupervised and free tool for image and video dataset analysis
Claude Code skill for generating production-quality SVG+PNG technical
Code for running inference and finetuning with SAM 3 model
Tooling for the Common Objects In 3D dataset