Showing 766 open source projects for "data quality"

View related business solutions
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 1
    Bert-VITS2

    Bert-VITS2

    VITS2 backbone with multilingual-bert

    ...It provides emotional modeling through “emo embeddings,” allowing voices to be conditioned on different affective states during synthesis. Releases include optimizations for Japanese and English alignment, expanded training data, spec caching and pre-generation tools, as well as ONNX export for more lightweight inference deployments.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    re_data

    re_data

    re_data - fix data issues before your users & CEO would discover them

    re_data is an open-source data reliability framework for the modern data stack. Currently, re_data focuses on observing the dbt project (together with underlying data warehouse - Postgres, BigQuery, Snowflake, Redshift). Gather all relevant outputs about your data in one place using our cloud. Invite your team and debug it easily from there. Go back in time, and see your past metadata. Set up Slack notifications to always know when a new report is produced or an existing one got updated.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Adala

    Adala

    Adala: Autonomous DAta (Labeling) Agent framework

    Adala is a data-centric AI framework focused on dataset curation, annotation, and validation. It helps AI teams manage high-quality training datasets by providing tools for data auditing, error detection, and quality assessment.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Resemble Enhance

    Resemble Enhance

    AI powered speech denoising and enhancement

    ...The models are trained on high-quality speech data, which helps the tool produce cleaner output than basic filtering alone. Its main value is giving developers and audio creators an open tool for upgrading imperfect speech recordings.
    Downloads: 1 This Week
    Last Update:
    See Project
  • $300 Free Credits to Build on Google Cloud Icon
    $300 Free Credits to Build on Google Cloud

    New to Google Cloud? Get $300 in credits to explore Compute Engine, BigQuery, Cloud Run, Gemini Enterprise Agent Platform, and more.

    Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query petabytes in BigQuery, or build agents with Gemini Enterprise Agent Platform. Once your credits are used, keep building with 20+ always-free tier products including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. No commitment required—just sign up and start building.
    Claim $300 Free
  • 5
    axflow

    axflow

    The TypeScript framework for AI development

    ...Its core SDK enables developers to integrate language model capabilities into web applications while maintaining strong modular design principles. Additional components support data ingestion, evaluation, and model interaction workflows that are commonly required when building production AI systems. For example, the framework includes modules for connecting application data to language models, evaluating the quality of model outputs, and building streaming user interfaces. Because each component can be used independently, developers can adopt Axflow incrementally rather than committing to a monolithic framework. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    DPM-Solver

    DPM-Solver

    Fast ODE Solver for Diffusion Probabilistic Model Sampling

    DPM-Solver is a machine learning research implementation focused on accelerating the sampling process in diffusion probabilistic models used for generative AI tasks. Diffusion models are powerful generative systems capable of producing high-quality images and other data, but traditional sampling methods often require hundreds or thousands of computational steps. The project introduces a specialized numerical solver designed to approximate the diffusion process using a small number of high-order integration steps. By reformulating the sampling problem as the solution of a diffusion-related ordinary differential equation, the solver can produce high-quality samples much more efficiently. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    SolexaQA is a software to calculate quality statistics and visual representations of data quality for second-generation sequencing data.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    YiVal

    YiVal

    Your Automatic Prompt Engineering Assistant for GenAI Applications

    ...It focuses on experimentation and optimization by allowing users to test multiple prompt variations, configurations, and model parameters in parallel, then evaluate their outputs using structured metrics and scoring systems. The platform is particularly useful in production environments where prompt quality directly impacts user experience, as it provides a repeatable and data-driven approach to refining prompts rather than relying on manual trial and error. YiVal supports integration with various LLM providers and can orchestrate experiments across different models, making it adaptable to evolving AI ecosystems. It also includes evaluation pipelines that help quantify output quality based on criteria such as accuracy, coherence, or task-specific benchmarks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Mjograph is an XY (2D) graph editor that runs on Mac OSX and Java with the goal to provide researchers with a quick way to visualize numerical data and also create publication-quality plots.
    Downloads: 8 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    LLMDataHub

    LLMDataHub

    Quick guide (especially) for trending instruction finetuning dataset

    LLMDataHub is an open-source repository that aggregates and organizes datasets specifically designed for training and fine-tuning large language models. The project aims to solve the challenge of discovering high-quality datasets by collecting resources that are otherwise scattered across multiple research communities and repositories. Each dataset entry typically includes information such as size, language coverage, intended use cases, and links to the original data sources. The repository focuses particularly on datasets useful for chatbot training, instruction-following tasks, and alignment training scenarios. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    DAPP2

    DAPP2

    The Dairy Agriculture for People and the Planet (DAPP) 2 project

    ...Dept. of Agriculture, Agricultural Research Service initiative that was originally envisioned as a trans-disciplinary group of researchers brought together to share data and insights across various branches of science including dairy science, soil science, microbiology, nutrition science, analytical chemistry, functional food development, and others. The goal of this trans-disciplinary group is to increase the impact of research by combining results from disparate areas to allow correlations and wider overall trends to become evident, leading to greater understanding of the impacts of dairy products on health (cow and human), human nutrition, the environment, and dairy product usefulness and quality. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    TurboVNC

    TurboVNC

    High-speed, 3D-friendly, TightVNC-compatible remote desktop software

    TurboVNC is a high-performance, enterprise-quality version of VNC based on TightVNC, TigerVNC, and X.org. It contains a variant of Tight encoding that is tuned for maximum performance and compression with 3D applications (VirtualGL), video, and other image-intensive workloads. TurboVNC, in combination with VirtualGL, provides a complete solution for remotely displaying 3D applications with interactive performance. TurboVNC's high-speed encoding methods have been adopted by TigerVNC and...
    Leader badge
    Downloads: 133,797 This Week
    Last Update:
    See Project
  • 13
    MaxFEM

    MaxFEM

    Software for electromagnetic simulation

    MaxFem is an open software package for electromagnetic simulation by using finite element methods. The package can solve problems in electrostatics, direct current, magnetostatics and eddy-currents. Since version 0.4.0, MaxFEM requires Python 3. We have moved the installers to the MaxFEM website (see below). In order to improve MaxFEM, we will require you to fill out a simple form before downloading them.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    Autolabel

    Autolabel

    Label, clean and enrich text datasets with LLMs

    Autolabel is a Python library to label, clean and enrich datasets with Large Language Models (LLMs). Autolabel data for NLP tasks such as classification, question-answering and named entity recognition, entity matching and more. Seamlessly use commercial and open-source LLMs from providers such as OpenAI, Anthropic, HuggingFace, Google and more.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    uniCenta POS

    uniCenta POS

    uniCenta oPOS - dynamically evolving POS project

    Keep up-to-date with the latest news - Visit uniCenta's main site https://unicenta.com/about-unicenta-opos/unicenta-news/ uniCenta oPOS v5.0 is the latest community release. Get the latest uniCenta oPOS 5.4.0 https://unicenta.com/download-files/ if you would like to make a contribution and support the project or need business support help. 📢 uniCenta oPOS 5.4.0 in fully integrated with WooCommerce! ✅ Run your website and store with the same data ✅ Support table ordering at your...
    Leader badge
    Downloads: 871 This Week
    Last Update:
    See Project
  • 16
    chimp

    chimp

    Tooling that helps you do quality, faster

    Your Apollo GraphQL development companion for doing quality, faster. Chimp helps you write high-quality code from the get-go. No more putting tests and quality as an after-thought. Quality first, speed for free. Boilerplate is time-consuming, error-prone and boring! Chimp reduces that through its various generators and smart defaults. Modularity leads to maintainable and testable code, and this is a key feature of all Chimp's domain-driven and data-driven generators. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    NBi

    NBi

    NBi is a testing framework (add-on to NUnit)

    NBi is a testing framework (add-on to NUnit) for Business Intelligence. It supports most of the relational databases (SQL server, MySQL, postgreSQL ...) and OLAP platforms (Analysis Services, Mondrian ...) but also ETL and reporting components (Microsoft technologies). The main goal of this framework is to let users create tests with a declarative approach based on an Xml syntax. By the means of NBi, you don't need to develop C# code to specify your tests! Either, you don't need Visual...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    GitLab CE Server For Local Intranets

    GitLab CE Server For Local Intranets

    The Free & Popular Community git Server in a Complete Virtual Machine

    This VM is created for 2 reasons: 1. Very little initial setup work required to make a GIT Server live, within minutes. 2. This system should keep running for Years, without requiring Updates / Breakages. If you are new to Virtual Machines, then please watch the Video below ( taken from my other project. just replace td with gi wherever mentioned ) After starting this VM, please login to its administration panel with: Website Address: https://gi.local/ ( Accept Any Warnings due to...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    Consistency Models

    Consistency Models

    Official repo for consistency models

    consistency_models is the repository for Consistency Models, a new family of generative models introduced by OpenAI that aim to generate high-quality samples by mapping noise directly into data — circumventing the need for lengthy diffusion chains. It builds on and extends diffusion model frameworks (e.g. based on the guided-diffusion codebase), adding techniques like consistency distillation and consistency training to enable fast, often one-step, sample generation. The repo is implemented in PyTorch and includes support for large-scale experiments on datasets like ImageNet-64 and LSUN variants. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Crane

    Crane

    Crane is a FinOps Platform for Cloud Resource Analytics and Economics

    Crane is a FinOps Platform for Cloud Resource Analytics and Economics in Kubernetes clusters. The goal is not only to help users to manage cloud cost easily but also to ensure the quality of applications.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    CausalNex

    CausalNex

    A Python library that helps data scientists to infer causation

    CausalNex is a Python library that uses Bayesian Networks to combine machine learning and domain expertise for causal reasoning. You can use CausalNex to uncover structural relationships in your data, learn complex distributions, and observe the effect of potential interventions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    pseudocode.js

    pseudocode.js

    Beautiful pseudocode for the Web

    pseudocode.js is a JavaScript library that typesets pseudocode beautifully to HTML. Pseudocode.js takes a LaTeX-style input that supports the algorithmic constructs from LaTeX's algorithm packages. With or without LaTeX experience, a user should find the grammar fairly intuitive. The HTML output produced by pseudocode.js is (almost) identical to the pretty algorithms printed on publications that are typeset by LaTeX. Inserting math formulas in pseudocode.js is as easy as LaTeX. Just enclose...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    fastMRI

    fastMRI

    A large open dataset + tools to speed up MRI scans using ML

    fastMRI is a large-scale collaborative research project by Facebook AI Research (FAIR) and NYU Langone Health that explores how deep learning can accelerate magnetic resonance imaging (MRI) acquisition without compromising image quality. By enabling reconstruction of high-fidelity MR images from significantly fewer measurements, fastMRI aims to make MRI scanning faster, cheaper, and more accessible in clinical settings. The repository provides an open-source PyTorch framework with data loaders, subsampling utilities, reconstruction models, and evaluation metrics, supporting both research reproducibility and practical experimentation. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Swiple

    Swiple

    Swiple enables you to easily observe, understand, validate data

    ...Seamlessly incorporate data quality checks into your existing workflows without any coding or infrastructure changes, allowing you to focus on what matters most - your data. Save engineers weeks of time generating data quality checks. Swiple analyzes your dataset and builds data quality checks based on what is observed in your data. You just pick the ones you want.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    DataGym.ai

    DataGym.ai

    Open source annotation and labeling tool for image and video assets

    DATAGYM enables data scientists and machine learning experts to label images up to 10x faster. AI-assisted annotation tools reduce manual labeling effort, give you more time to finetune ML models and speed up your go to market of new products. Accelerate your computer vision projects by cutting down data preparation time up to 50%. A machine learning model is only as good as its training data. DATAGYM is an end-to-end workbench to create, annotate, manage, and export the right training data...
    Downloads: 0 This Week
    Last Update:
    See Project
Auth0 Logo