The toolkit to test, validate, and evaluate your models and surface
Automatically find issues in image datasets
BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)
Synthetic data generators for structured and unstructured text
The open-source tool for building high-quality datasets
Docker image used to run data processing workloads
Data science on data without acquiring a copy
CKAN is an open-source DMS for powering data hubs
The standard data-centric AI package for data quality and ML
Training data (data labeling, annotation, workflow) for all data types
Create HTML profiling reports from pandas DataFrame objects
Uncover insights, surface problems, monitor, and fine tune your LLM
Make your own running home page
A real-time visualisation of the CO2 emissions of electricity
Always know what to expect from your data
Project structure for doing and sharing data science work
airda(Air Data Agent
A curated list of data mining papers about fraud detection
The open standard for data logging
AutoGluon: AutoML for Image, Text, and Tabular Data
Panda-Helper: Data profiling utility for Pandas DataFrames and Series
Clean Jupyter notebooks of outputs, metadata, and empty cells
Streamline your ML workflow
Train machine learning models within Docker containers
Production-ready data processing made easy and shareable