Showing 8 open source projects for "data quality"

View related business solutions
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • 1
    Data-Juicer

    Data-Juicer

    Data processing for and with foundation models

    Data-Juicer is an open-source data processing and augmentation framework designed to enhance the quality and diversity of datasets for machine learning tasks. It includes a modular pipeline for scalable data transformation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Diffgram

    Diffgram

    Training data (data labeling, annotation, workflow) for all data types

    From ingesting data to exploring it, annotating it, and managing workflows. Diffgram is a single application that will improve your data labeling and bring all aspects of training data under a single roof. Diffgram is world’s first truly open source training data platform that focuses on giving its users an unlimited experience. This is aimed to reduce your data labeling bills and increase your Training Data Quality.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    DataProfiler

    DataProfiler

    Extract schema, statistics and entities from datasets

    DataProfiler is an AI-powered tool for automatic data analysis and profiling, designed to detect patterns, anomalies, and schema inconsistencies in structured and unstructured datasets. The DataProfiler is a Python library designed to make data analysis, monitoring, and sensitive data detection easy. Loading Data with a single command, the library automatically formats & loads files into a DataFrame. Profiling the Data, the library identifies the schema, statistics, entities (PII / NPI), and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    AllenNLP

    AllenNLP

    An open-source NLP research library, built on PyTorch

    AllenNLP makes it easy to design and evaluate new deep learning models for nearly any NLP problem, along with the infrastructure to easily run them in the cloud or on your laptop. AllenNLP includes reference implementations of high quality models for both core NLP problems (e.g. semantic role labeling) and NLP applications (e.g. textual entailment). AllenNLP supports loading "plugins" dynamically. A plugin is just a Python package that provides custom registered classes or additional...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    XLM (Cross-lingual Language Model)

    XLM (Cross-lingual Language Model)

    PyTorch original implementation of Cross-lingual Language Model

    XLM (Cross-lingual Language Model) is a family of multilingual pretraining methods that align representations across languages to enable strong zero-shot transfer. It popularized objectives like Masked Language Modeling (MLM) across many languages and Translation Language Modeling (TLM) that jointly trains on parallel sentence pairs to tighten cross-lingual alignment. Using a shared subword vocabulary, XLM learns language-agnostic features that work well for classification and sequence...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    CC-Net

    CC-Net

    Tools to download and cleanup Common Crawl data

    cc_net provides tools to download, segment, clean, and filter Common Crawl to build large-scale text corpora, including monolingual datasets and the multilingual CC-100 collection introduced in the associated paper. It includes pipelines to fetch snapshots, extract text, de-duplicate, identify language, and apply quality filtering based on heuristics and language models. The outputs are intended for pretraining language models and for creating standardized corpora that can be reproduced or...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    NLP Best Practices

    NLP Best Practices

    Natural Language Processing Best Practices & Examples

    In recent years, natural language processing (NLP) has seen quick growth in quality and usability, and this has helped to drive business adoption of artificial intelligence (AI) solutions. In the last few years, researchers have been applying newer deep learning methods to NLP. Data scientists started moving from traditional methods to state-of-the-art (SOTA) deep neural network (DNN) algorithms which use language models pretrained on large text corpora.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Chatito

    Chatito

    Dataset generation for AI chatbots, NLP tasks

    Chatito is a tool that helps generate datasets for training and validating chatbot models using a simple domain-specific language (DSL).
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
Auth0 Logo