cleanlab helps you clean data and labels by automatically detecting issues in a ML dataset. To facilitate machine learning with messy, real-world data, this data-centric AI package uses your existing models to estimate dataset problems that can be fixed to train even better models. cleanlab cleans your data's labels via state-of-the-art confident learning algorithms, published in this paper and blog. See some of the datasets cleaned with cleanlab at labelerrors.com. This package helps you find label issues and other data issues, so you can train reliable ML models. All features of cleanlab work with any dataset and any model. Yes, any model: PyTorch, Tensorflow, Keras, JAX, HuggingFace, OpenAI, XGBoost, scikit-learn, etc. If you use a sklearn-compatible classifier, all cleanlab methods work out-of-the-box.

Features

  • Binary and multi-class classification
  • Multi-label classification (e.g. image/document tagging)
  • Token classification (e.g. entity recognition in text)
  • Classification with data labeled by multiple annotators
  • Active learning with multiple annotators (suggest which data to label or re-label to improve model most)
  • Outlier and out of distribution detection

Project Samples

Project Activity

See All Activity >

License

Affero GNU Public License

Follow Cleanlab

Cleanlab Web Site

Other Useful Business Software
Build Securely on AWS with Proven Frameworks Icon
Build Securely on AWS with Proven Frameworks

Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
Download Now
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Cleanlab!

Additional Project Details

Operating Systems

Linux, Mac, Windows

Programming Language

Python

Related Categories

Python Data Labeling Tool, Python Data Quality Tool

Registered

2023-05-23