Showing 133 open source projects for "data quality"

View related business solutions
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 1
    LLM Foundry

    LLM Foundry

    LLM training code for MosaicML foundation models

    Introducing MPT-7B, the first entry in our MosaicML Foundation Series. MPT-7B is a transformer trained from scratch on 1T tokens of text and code. It is open source, available for commercial use, and matches the quality of LLaMA-7B. MPT-7B was trained on the MosaicML platform in 9.5 days with zero human intervention at a cost of ~$200k. Large language models (LLMs) are changing the world, but for those outside well-resourced industry labs, it can be extremely difficult to train and deploy...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    FLAML

    FLAML

    A fast library for AutoML and tuning

    ...It frees users from selecting learners and hyperparameters for each learner. For common machine learning tasks like classification and regression, it quickly finds quality models for user-provided data with low computational resources. It supports both classical machine learning models and deep neural networks. It is easy to customize or extend. Users can find their desired customizability from a smooth range: minimal customization (computational resource budget), medium customization (e.g., scikit-style learner, search space, and metric), or full customization (arbitrary training and evaluation code). ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    MentDB Projects

    MentDB Projects

    Generalized Interoperability and Strong AI

    MentDB is an open-source platform driving research into next-generation AI and universal data exchange. Our architecture is built around the revolutionary Mentalese Query Language (MQL). MentDB Weak (Generalized Interoperability): A unified data layer enabling seamless data exchange and application integration (SOA, ETL, Data Quality). We eliminate data silos through a single, generalized data language. MentDB Strong (Strong AI / AGI): The framework for exploring and building Machine Consciousness, free will, and advanced ethical reasoning systems. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    VideoCrafter2

    VideoCrafter2

    Overcoming Data Limitations for High-Quality Video Diffusion Models

    VideoCrafter is an open-source video generation and editing toolbox designed to create high-quality video content. It features models for both text-to-video and image-to-video generation. The system is optimized for generating videos from textual descriptions or still images, leveraging advanced diffusion models. VideoCrafter2, an upgraded version, improves on its predecessor by enhancing motion dynamics and concept combinations, especially in low-data scenarios.
    Downloads: 10 This Week
    Last Update:
    See Project
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • 5
    PromptSniffer

    PromptSniffer

    View Extract & Remove AI generation metadata with right click

    ...Core Functionality Read EXIF/Metadata: Extract and display comprehensive metadata from images AI Metadata Detection: Automatically identify and highlight AI generation metadata Metadata Removal: Strip AI generation metadata while preserving image quality Batch Processing: Handle multiple files with wildcard patterns Cross-Platform: Works on Windows, macOS, and Linux AI Tool Support ComfyUI: Detects and extracts workflow JSON data Stable Diffusion: Identifies prompts, parameters, and generation settings SwarmUI/StableSwarmUI: Handles JSON-formatted metadata Midjourney, DALL-E, NovelAI: Recognizes generation signatures Automatic1111, InvokeAI: Extracts generation parameters
    Downloads: 4 This Week
    Last Update:
    See Project
  • 6
    Universal Sentence Encoder

    Universal Sentence Encoder

    Encoder of greater-than-word length text trained on a variety of data

    The Universal Sentence Encoder (USE) is a pre-trained deep learning model designed to encode sentences into fixed-length embeddings for use in various natural language processing (NLP) tasks. It leverages Transformer and Deep Averaging Network (DAN) architectures to generate embeddings that capture the semantic meaning of sentences. The model is designed for tasks like sentiment analysis, semantic textual similarity, and clustering, and provides high-quality sentence representations in a...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    crème de la crème of AI courses

    crème de la crème of AI courses

    This repository is a curated collection of links to various courses

    crème de la crème of AI courses is an open-source repository that serves as a curated directory of high-quality educational resources related to artificial intelligence, machine learning, and modern data science. The project aggregates links to online courses, tutorials, lecture series, and learning materials from universities, research labs, and independent educators. The repository organizes courses by topic, difficulty level, format, and release year, allowing learners to quickly identify relevant material depending on their experience and interests. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    LangChain Extract

    LangChain Extract

    Did you say you like data?

    ...Developers can create reusable “extractors” that define what type of information should be pulled from a document, along with example prompts that improve extraction quality through in-context learning.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    GLM-4-32B-0414

    GLM-4-32B-0414

    Open Multilingual Multimodal Chat LMs

    ...It supports multilingual and multimodal chat capabilities with an extensive 32K token context length, making it ideal for dialogue, reasoning, and complex task completion. The model is pre-trained on 15 trillion tokens of high-quality data, including substantial synthetic reasoning datasets, and further enhanced with reinforcement learning and human preference alignment for improved instruction-following and function calling. Variants like GLM-Z1-32B-0414 offer deep reasoning and advanced mathematical problem-solving, while GLM-Z1-Rumination-32B-0414 specializes in long-form, complex research-style writing using scaled reinforcement learning and external search tools. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 10
    Improved Diffusion

    Improved Diffusion

    Release for Improved Denoising Diffusion Probabilistic Models

    improved-diffusion is an open source implementation of diffusion probabilistic models created by OpenAI. These models, also known as score-based generative models, are a class of generative models that have shown strong performance in producing high-quality synthetic data such as images. The repository provides code for training and sampling diffusion models with improved techniques that enhance stability, efficiency, and output fidelity. It includes scripts for setting up training runs, generating samples, and reproducing results from OpenAI’s research on diffusion-based generation. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    Bert-VITS2

    Bert-VITS2

    VITS2 backbone with multilingual-bert

    ...It provides emotional modeling through “emo embeddings,” allowing voices to be conditioned on different affective states during synthesis. Releases include optimizations for Japanese and English alignment, expanded training data, spec caching and pre-generation tools, as well as ONNX export for more lightweight inference deployments.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    Adala

    Adala

    Adala: Autonomous DAta (Labeling) Agent framework

    Adala is a data-centric AI framework focused on dataset curation, annotation, and validation. It helps AI teams manage high-quality training datasets by providing tools for data auditing, error detection, and quality assessment.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Resemble Enhance

    Resemble Enhance

    AI powered speech denoising and enhancement

    ...The models are trained on high-quality speech data, which helps the tool produce cleaner output than basic filtering alone. Its main value is giving developers and audio creators an open tool for upgrading imperfect speech recordings.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    axflow

    axflow

    The TypeScript framework for AI development

    ...Its core SDK enables developers to integrate language model capabilities into web applications while maintaining strong modular design principles. Additional components support data ingestion, evaluation, and model interaction workflows that are commonly required when building production AI systems. For example, the framework includes modules for connecting application data to language models, evaluating the quality of model outputs, and building streaming user interfaces. Because each component can be used independently, developers can adopt Axflow incrementally rather than committing to a monolithic framework. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    DPM-Solver

    DPM-Solver

    Fast ODE Solver for Diffusion Probabilistic Model Sampling

    DPM-Solver is a machine learning research implementation focused on accelerating the sampling process in diffusion probabilistic models used for generative AI tasks. Diffusion models are powerful generative systems capable of producing high-quality images and other data, but traditional sampling methods often require hundreds or thousands of computational steps. The project introduces a specialized numerical solver designed to approximate the diffusion process using a small number of high-order integration steps. By reformulating the sampling problem as the solution of a diffusion-related ordinary differential equation, the solver can produce high-quality samples much more efficiently. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    YiVal

    YiVal

    Your Automatic Prompt Engineering Assistant for GenAI Applications

    ...It focuses on experimentation and optimization by allowing users to test multiple prompt variations, configurations, and model parameters in parallel, then evaluate their outputs using structured metrics and scoring systems. The platform is particularly useful in production environments where prompt quality directly impacts user experience, as it provides a repeatable and data-driven approach to refining prompts rather than relying on manual trial and error. YiVal supports integration with various LLM providers and can orchestrate experiments across different models, making it adaptable to evolving AI ecosystems. It also includes evaluation pipelines that help quantify output quality based on criteria such as accuracy, coherence, or task-specific benchmarks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    LLMDataHub

    LLMDataHub

    Quick guide (especially) for trending instruction finetuning dataset

    LLMDataHub is an open-source repository that aggregates and organizes datasets specifically designed for training and fine-tuning large language models. The project aims to solve the challenge of discovering high-quality datasets by collecting resources that are otherwise scattered across multiple research communities and repositories. Each dataset entry typically includes information such as size, language coverage, intended use cases, and links to the original data sources. The repository focuses particularly on datasets useful for chatbot training, instruction-following tasks, and alignment training scenarios. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Autolabel

    Autolabel

    Label, clean and enrich text datasets with LLMs

    Autolabel is a Python library to label, clean and enrich datasets with Large Language Models (LLMs). Autolabel data for NLP tasks such as classification, question-answering and named entity recognition, entity matching and more. Seamlessly use commercial and open-source LLMs from providers such as OpenAI, Anthropic, HuggingFace, Google and more.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Consistency Models

    Consistency Models

    Official repo for consistency models

    consistency_models is the repository for Consistency Models, a new family of generative models introduced by OpenAI that aim to generate high-quality samples by mapping noise directly into data — circumventing the need for lengthy diffusion chains. It builds on and extends diffusion model frameworks (e.g. based on the guided-diffusion codebase), adding techniques like consistency distillation and consistency training to enable fast, often one-step, sample generation. The repo is implemented in PyTorch and includes support for large-scale experiments on datasets like ImageNet-64 and LSUN variants. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    CausalNex

    CausalNex

    A Python library that helps data scientists to infer causation

    CausalNex is a Python library that uses Bayesian Networks to combine machine learning and domain expertise for causal reasoning. You can use CausalNex to uncover structural relationships in your data, learn complex distributions, and observe the effect of potential interventions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    fastMRI

    fastMRI

    A large open dataset + tools to speed up MRI scans using ML

    fastMRI is a large-scale collaborative research project by Facebook AI Research (FAIR) and NYU Langone Health that explores how deep learning can accelerate magnetic resonance imaging (MRI) acquisition without compromising image quality. By enabling reconstruction of high-fidelity MR images from significantly fewer measurements, fastMRI aims to make MRI scanning faster, cheaper, and more accessible in clinical settings. The repository provides an open-source PyTorch framework with data loaders, subsampling utilities, reconstruction models, and evaluation metrics, supporting both research reproducibility and practical experimentation. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    PRM800K

    PRM800K

    800,000 step-level correctness labels on LLM solutions to MATH problem

    ...The repository releases the raw labels and the labeler instructions used in two project phases, enabling researchers to study how human raters graded intermediate reasoning. Data are stored as newline-delimited JSONL files tracked with Git LFS, where each line is a full solution sample that can contain many step-level labels and rich metadata such as labeler UUIDs, timestamps, generation identifiers, and quality-control flags. Each labeled step can include multiple candidate completions with ratings of -1, 0, or +1, optional human-written corrections (phase 1), and a chosen completion index, along with a final finish reason such as found_error, solution, bad_problem, or give_up.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    VALL-E

    VALL-E

    PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)

    ...Specifically, we train a neural codec language model (called VALL-E) using discrete codes derived from an off-the-shelf neural audio codec model, and regard TTS as a conditional language modeling task rather than continuous signal regression as in previous work. During the pre-training stage, we scale up the TTS training data to 60K hours of English speech which is hundreds of times larger than existing systems. VALL-E emerges in-context learning capabilities and can be used to synthesize high-quality personalized speech with only a 3-second enrolled recording of an unseen speaker as an acoustic prompt. Experiment results show that VALL-E significantly outperforms the state-of-the-art zero-shot TTS system in terms of speech naturalness and speaker similarity. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    Self-learning-Computer-Science

    Self-learning-Computer-Science

    Resources to learn computer science in your spare time

    Self-learning Computer Science is a curated, open-source guide repository designed to help learners independently study computer science topics using high-quality university-level resources. The author (an undergraduate CS student) assembled links to courses from institutions like MIT, UC Berkeley, Stanford, etc., covering mathematics, programming, data structures/algorithms, computer architecture, machine learning, software engineering and more. It’s aimed at learners who find traditional course structures restrictive and want a flexible, self-paced path through CS, with a focus on building depth and breadth rather than shortcut exam skills. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Super-PDF-Editor

    Super-PDF-Editor

    World's most comprehensive, powerful, process-based PDF editor

    ...OCR performs in pdf files, scanned pdf files and any pdf files. OCR performs in image files, and supports multiple image formats. Auto and manual image enhancement for better OCR accuracy and quality. Supports 165+ languages with three languages data set. Use Multiple Languages at once. International Languages: 127 Languages, High, Medium, and Fast Quality. Scanned Images (jpg, png, gif, tiff, bmp) Multi-Page and TIFF and GIF, Scanned PDFs.
    Downloads: 12 This Week
    Last Update:
    See Project
Auth0 Logo