Python Data Labeling Tools

View 85 business solutions

Browse free open source Python Data Labeling Tools and projects below. Use the toggles on the left to filter open source Python Data Labeling Tools by OS, license, language, programming language, and project status.

  • Our Free Plans just got better! | Auth0 by Okta Icon
    Our Free Plans just got better! | Auth0 by Okta

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your secuirty. Auth0 now, thank yourself later.
    Try free now
  • Bright Data - All in One Platform for Proxies and Web Scraping Icon
    Bright Data - All in One Platform for Proxies and Web Scraping

    Say goodbye to blocks, restrictions, and CAPTCHAs

    Bright Data offers the highest quality proxies with automated session management, IP rotation, and advanced web unlocking technology. Enjoy reliable, fast performance with easy integration, a user-friendly dashboard, and enterprise-grade scaling. Powered by ethically-sourced residential IPs for seamless web scraping.
    Get Started
  • 1
    Label Studio

    Label Studio

    Label Studio is a multi-type data labeling and annotation tool

    The most flexible data annotation tool. Quickly installable. Build custom UIs or use pre-built labeling templates. Detect objects on image, bboxes, polygons, circular, and keypoints supported. Partition image into multiple segments. Use ML models to pre-label and optimize the process. Label Studio is an open-source data labeling tool. It lets you label data types like audio, text, images, videos, and time series with a simple and straightforward UI and export to various model formats. It can be used to prepare raw data or improve existing training data to get more accurate ML models. The frontend part of Label Studio app lies in the frontend/ folder and written in React JSX. Multi-user labeling sign up and login, when you create an annotation it's tied to your account. Configurable label formats let you customize the visual interface to meet your specific labeling needs. Support for multiple data types including images, audio, text, HTML, time-series, and video.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 2
    bbox-visualizer

    bbox-visualizer

    Make drawing and labeling bounding boxes easy as cake

    Make drawing and labeling bounding boxes easy as cake. This package helps users draw bounding boxes around objects, without doing the clumsy math that you'd need to do for positioning the labels. It also has a few different types of visualizations you can use for labeling objects after identifying them. There are optional functions that can draw multiple bounding boxes and/or write multiple labels on the same image, but it is advisable to use the above functions in a loop in order to have full control over your visualizations.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    Cleanlab

    Cleanlab

    The standard data-centric AI package for data quality and ML

    cleanlab helps you clean data and labels by automatically detecting issues in a ML dataset. To facilitate machine learning with messy, real-world data, this data-centric AI package uses your existing models to estimate dataset problems that can be fixed to train even better models. cleanlab cleans your data's labels via state-of-the-art confident learning algorithms, published in this paper and blog. See some of the datasets cleaned with cleanlab at labelerrors.com. This package helps you find label issues and other data issues, so you can train reliable ML models. All features of cleanlab work with any dataset and any model. Yes, any model: PyTorch, Tensorflow, Keras, JAX, HuggingFace, OpenAI, XGBoost, scikit-learn, etc. If you use a sklearn-compatible classifier, all cleanlab methods work out-of-the-box.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Compose

    Compose

    A machine learning tool for automated prediction engineering

    Compose is a machine learning tool for automated prediction engineering. It allows you to structure prediction problems and generate labels for supervised learning. An end user defines an outcome of interest by writing a labeling function, then runs a search to automatically extract training examples from historical data. Its result is then provided to Featuretools for automated feature engineering and subsequently to EvalML for automated machine learning. Prediction problems are structured by using a label maker and a labeling function. The label maker automatically extracts data along the time index to generate labels. The process starts by setting the first cutoff time after the minimum amount of data. Then subsequent cutoff times are spaced apart using gaps. Starting from each cutoff time, a window determines the amount of data, also referred to as a data slice, to pass into a labeling function.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Cyber Risk Assessment and Management Platform Icon
    Cyber Risk Assessment and Management Platform

    ConnectWise Identify is a powerful cybersecurity risk assessment platform offering strategic cybersecurity assessments and recommendations.

    When it comes to cybersecurity, what your clients don’t know can really hurt them. And believe it or not, keep them safe starts with asking questions. With ConnectWise Identify Assessment, get access to risk assessment backed by the NIST Cybersecurity Framework to uncover risks across your client’s entire business, not just their networks. With a clearly defined, easy-to-read risk report in hand, you can start having meaningful security conversations that can get you on the path of keeping your clients protected from every angle. Choose from two assessment levels to cover every client’s need, from the Essentials to cover the basics to our Comprehensive Assessment to dive deeper to uncover additional risks. Our intuitive heat map shows you your client’s overall risk level and priority to address risks based on probability and financial impact. Each report includes remediation recommendations to help you create a revenue-generating action plan.
    Learn More
  • 5
    Diffgram

    Diffgram

    Training data (data labeling, annotation, workflow) for all data types

    From ingesting data to exploring it, annotating it, and managing workflows. Diffgram is a single application that will improve your data labeling and bring all aspects of training data under a single roof. Diffgram is world’s first truly open source training data platform that focuses on giving its users an unlimited experience. This is aimed to reduce your data labeling bills and increase your Training Data Quality. Training Data is the art of supervising machines through data. This includes the activities of annotation, which produces structured data; ready to be consumed by a machine learning model. Annotation is required because raw media is considered to be unstructured and not usable without it. That’s why training data is required for many modern machine learning use cases including computer vision, natural language processing and speech recognition.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Lightly

    Lightly

    A python library for self-supervised learning on images

    A python library for self-supervised learning on images. We, at Lightly, are passionate engineers who want to make deep learning more efficient. That's why - together with our community - we want to popularize the use of self-supervised methods to understand and curate raw image data. Our solution can be applied before any data annotation step and the learned representations can be used to visualize and analyze datasets. This allows selecting the best core set of samples for model training through advanced filtering. We provide PyTorch, PyTorch Lightning and PyTorch Lightning distributed examples for each of the models to kickstart your project. Lightly requires Python 3.6+ but we recommend using Python 3.7+. We recommend installing Lightly in a Linux or OSX environment. With lightly, you can use the latest self-supervised learning methods in a modular way using the full power of PyTorch. Experiment with different backbones, models, and loss functions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Toloka-Kit

    Toloka-Kit

    Toloka-Kit is a Python library for working with Toloka API

    Toloka-Kit is a Python library for working with Toloka API. The API allows you to build scalable and fully automated human-in-the-loop ML pipelines, and integrate them into your processes. The toolkit makes integration easier. You can use it with Jupyter Notebooks. Support for all common Toloka use cases: creating projects, adding pools, uploading tasks, and so on. Toloka entities are represented as Python classes. You can use them instead of accessing the API using JSON representations. There’s no need to validate JSON files and work with them directly. Support of both synchronous and asynchronous (via async/await) executions. Streaming support: build complex pipelines which send and receive data in real-time. For example, you can pass data between two related projects: one for data labeling, and another for its validation. AutoQuality feature which automatically finds the best fitting quality control rules for your project.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next