Data processing for and with foundation models
The open-source tool for building high-quality datasets
Synthetic Data Generation for tabular, relational and time series data
Uncover insights, surface problems, monitor, and fine tune your LLM
Create HTML profiling reports from pandas DataFrame objects
Training data (data labeling, annotation, workflow) for all data types
Wan2.2: Open and Advanced Large-Scale Video Generative Model
The standard data-centric AI package for data quality and ML
Benchmarking synthetic data generation methods
A high-quality rapid TTS voice cloning model
An unsupervised and free tool for image and video dataset analysis
Collaborative & Open-Source Quality Assurance for all AI models
Flexible Photo Recrafting While Preserving Your Identity
Extract schema, statistics and entities from datasets
Toloka-Kit is a Python library for working with Toloka API
Code for running inference and finetuning with SAM 3 model
Release for Improved Denoising Diffusion Probabilistic Models
HY-Motion model for 3D character animation generation
Open-source choice to scale, assess and maintain natural language data
Tooling for the Common Objects In 3D dataset
DeepVariant is an analysis pipeline that uses a deep neural networks
Multi-Agent daTa geneRation Infra and eXperimentation framework
Scalable data pre processing and curation toolkit for LLMs
Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI
Reference PyTorch implementation and models for DINOv3