Showing 888 open source projects for "data quality"

View related business solutions
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • 1
    LLMDataHub

    LLMDataHub

    Quick guide (especially) for trending instruction finetuning dataset

    LLMDataHub is an open-source repository that aggregates and organizes datasets specifically designed for training and fine-tuning large language models. The project aims to solve the challenge of discovering high-quality datasets by collecting resources that are otherwise scattered across multiple research communities and repositories. Each dataset entry typically includes information such as size, language coverage, intended use cases, and links to the original data sources. The repository focuses particularly on datasets useful for chatbot training, instruction-following tasks, and alignment training scenarios. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    DAPP2

    DAPP2

    The Dairy Agriculture for People and the Planet (DAPP) 2 project

    ...Dept. of Agriculture, Agricultural Research Service initiative that was originally envisioned as a trans-disciplinary group of researchers brought together to share data and insights across various branches of science including dairy science, soil science, microbiology, nutrition science, analytical chemistry, functional food development, and others. The goal of this trans-disciplinary group is to increase the impact of research by combining results from disparate areas to allow correlations and wider overall trends to become evident, leading to greater understanding of the impacts of dairy products on health (cow and human), human nutrition, the environment, and dairy product usefulness and quality. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    TurboVNC

    TurboVNC

    High-speed, 3D-friendly, TightVNC-compatible remote desktop software

    TurboVNC is a high-performance, enterprise-quality version of VNC based on TightVNC, TigerVNC, and X.org. It contains a variant of Tight encoding that is tuned for maximum performance and compression with 3D applications (VirtualGL), video, and other image-intensive workloads. TurboVNC, in combination with VirtualGL, provides a complete solution for remotely displaying 3D applications with interactive performance. TurboVNC's high-speed encoding methods have been adopted by TigerVNC and...
    Leader badge
    Downloads: 133,797 This Week
    Last Update:
    See Project
  • 4
    MaxFEM

    MaxFEM

    Software for electromagnetic simulation

    MaxFem is an open software package for electromagnetic simulation by using finite element methods. The package can solve problems in electrostatics, direct current, magnetostatics and eddy-currents. Since version 0.4.0, MaxFEM requires Python 3. We have moved the installers to the MaxFEM website (see below). In order to improve MaxFEM, we will require you to fill out a simple form before downloading them.
    Downloads: 2 This Week
    Last Update:
    See Project
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • 5
    Autolabel

    Autolabel

    Label, clean and enrich text datasets with LLMs

    Autolabel is a Python library to label, clean and enrich datasets with Large Language Models (LLMs). Autolabel data for NLP tasks such as classification, question-answering and named entity recognition, entity matching and more. Seamlessly use commercial and open-source LLMs from providers such as OpenAI, Anthropic, HuggingFace, Google and more.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    uniCenta POS

    uniCenta POS

    uniCenta oPOS - dynamically evolving POS project

    Keep up-to-date with the latest news - Visit uniCenta's main site https://unicenta.com/about-unicenta-opos/unicenta-news/ uniCenta oPOS v5.0 is the latest community release. Get the latest uniCenta oPOS 5.4.0 https://unicenta.com/download-files/ if you would like to make a contribution and support the project or need business support help. 📢 uniCenta oPOS 5.4.0 in fully integrated with WooCommerce! ✅ Run your website and store with the same data ✅ Support table ordering at your...
    Leader badge
    Downloads: 871 This Week
    Last Update:
    See Project
  • 7
    chimp

    chimp

    Tooling that helps you do quality, faster

    Your Apollo GraphQL development companion for doing quality, faster. Chimp helps you write high-quality code from the get-go. No more putting tests and quality as an after-thought. Quality first, speed for free. Boilerplate is time-consuming, error-prone and boring! Chimp reduces that through its various generators and smart defaults. Modularity leads to maintainable and testable code, and this is a key feature of all Chimp's domain-driven and data-driven generators. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    NBi

    NBi

    NBi is a testing framework (add-on to NUnit)

    NBi is a testing framework (add-on to NUnit) for Business Intelligence. It supports most of the relational databases (SQL server, MySQL, postgreSQL ...) and OLAP platforms (Analysis Services, Mondrian ...) but also ETL and reporting components (Microsoft technologies). The main goal of this framework is to let users create tests with a declarative approach based on an Xml syntax. By the means of NBi, you don't need to develop C# code to specify your tests! Either, you don't need Visual...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    GitLab CE Server For Local Intranets

    GitLab CE Server For Local Intranets

    The Free & Popular Community git Server in a Complete Virtual Machine

    This VM is created for 2 reasons: 1. Very little initial setup work required to make a GIT Server live, within minutes. 2. This system should keep running for Years, without requiring Updates / Breakages. If you are new to Virtual Machines, then please watch the Video below ( taken from my other project. just replace td with gi wherever mentioned ) After starting this VM, please login to its administration panel with: Website Address: https://gi.local/ ( Accept Any Warnings due to...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 10
    Consistency Models

    Consistency Models

    Official repo for consistency models

    consistency_models is the repository for Consistency Models, a new family of generative models introduced by OpenAI that aim to generate high-quality samples by mapping noise directly into data — circumventing the need for lengthy diffusion chains. It builds on and extends diffusion model frameworks (e.g. based on the guided-diffusion codebase), adding techniques like consistency distillation and consistency training to enable fast, often one-step, sample generation. The repo is implemented in PyTorch and includes support for large-scale experiments on datasets like ImageNet-64 and LSUN variants. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Crane

    Crane

    Crane is a FinOps Platform for Cloud Resource Analytics and Economics

    Crane is a FinOps Platform for Cloud Resource Analytics and Economics in Kubernetes clusters. The goal is not only to help users to manage cloud cost easily but also to ensure the quality of applications.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    CausalNex

    CausalNex

    A Python library that helps data scientists to infer causation

    CausalNex is a Python library that uses Bayesian Networks to combine machine learning and domain expertise for causal reasoning. You can use CausalNex to uncover structural relationships in your data, learn complex distributions, and observe the effect of potential interventions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    pseudocode.js

    pseudocode.js

    Beautiful pseudocode for the Web

    pseudocode.js is a JavaScript library that typesets pseudocode beautifully to HTML. Pseudocode.js takes a LaTeX-style input that supports the algorithmic constructs from LaTeX's algorithm packages. With or without LaTeX experience, a user should find the grammar fairly intuitive. The HTML output produced by pseudocode.js is (almost) identical to the pretty algorithms printed on publications that are typeset by LaTeX. Inserting math formulas in pseudocode.js is as easy as LaTeX. Just enclose...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    Swiple

    Swiple

    Swiple enables you to easily observe, understand, validate data

    ...Seamlessly incorporate data quality checks into your existing workflows without any coding or infrastructure changes, allowing you to focus on what matters most - your data. Save engineers weeks of time generating data quality checks. Swiple analyzes your dataset and builds data quality checks based on what is observed in your data. You just pick the ones you want.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    fastMRI

    fastMRI

    A large open dataset + tools to speed up MRI scans using ML

    fastMRI is a large-scale collaborative research project by Facebook AI Research (FAIR) and NYU Langone Health that explores how deep learning can accelerate magnetic resonance imaging (MRI) acquisition without compromising image quality. By enabling reconstruction of high-fidelity MR images from significantly fewer measurements, fastMRI aims to make MRI scanning faster, cheaper, and more accessible in clinical settings. The repository provides an open-source PyTorch framework with data loaders, subsampling utilities, reconstruction models, and evaluation metrics, supporting both research reproducibility and practical experimentation. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    DataGym.ai

    DataGym.ai

    Open source annotation and labeling tool for image and video assets

    DATAGYM enables data scientists and machine learning experts to label images up to 10x faster. AI-assisted annotation tools reduce manual labeling effort, give you more time to finetune ML models and speed up your go to market of new products. Accelerate your computer vision projects by cutting down data preparation time up to 50%. A machine learning model is only as good as its training data. DATAGYM is an end-to-end workbench to create, annotate, manage, and export the right training data...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Amplify

    Amplify

    Automatic enrichment, enhancement, and explanation of your data

    Amplify attaches afterburners to your data. Amplify explains metadata extraction, classification, tagging, and reporting. Eriches derivative data generation like thumbnails, previews, conversions, etc. Enhances batteries-included value-adds like data quality reports, image augmentation, OCR, translations, etc. Amplify leverages the decentralized compute provided by Bacalhau to magically enrich your data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    PRM800K

    PRM800K

    800,000 step-level correctness labels on LLM solutions to MATH problem

    ...The repository releases the raw labels and the labeler instructions used in two project phases, enabling researchers to study how human raters graded intermediate reasoning. Data are stored as newline-delimited JSONL files tracked with Git LFS, where each line is a full solution sample that can contain many step-level labels and rich metadata such as labeler UUIDs, timestamps, generation identifiers, and quality-control flags. Each labeled step can include multiple candidate completions with ratings of -1, 0, or +1, optional human-written corrections (phase 1), and a chosen completion index, along with a final finish reason such as found_error, solution, bad_problem, or give_up.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19

    linSmith

    Smith chart intended for educational use

    A Smith charting program. You can enter either discrete components or transmission lines, see the results on screen and/or generate Postscript output. Component values can be changed numerically or using scrollbars.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 20
    FastQC

    FastQC

    A quality control analysis tool for high throughput sequencing data

    FastQC is a quality control analysis tool designed to spot potential problems in high throughput sequencing datasets. Its goal is to provide a simple way by which to check the quality of raw sequence data coming from high throughput sequencing pipelines. It does this by running a modular set of analyses on one or more raw sequence files in fastq or bam format.
    Downloads: 22 This Week
    Last Update:
    See Project
  • 21

    TransMCL

    a noval Transcriptome optimization tool

    To improve the quality of transcriptome assembly, we introduce TransMCL, a novel transcriptome optimization tool, to reconstruct full-length coding sequences while eliminating redundancy from assembled transcriptomes. TransMCL employs homologs from closely related species to guide the assembly, clustering raw transcripts from transcriptome data and genes from annotated genomes into hierarchical ortholog groups (HOGs).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    SASUnit

    SASUnit

    Unit testing for SAS(TM)-programs

    SASUnit is a unit testing framework for SAS(TM)-programs. It can be used for the development, execution and automatic documentation of tests for SAS programs. SASUnit is written purely on the basis of SAS macros and a few shell commands. There are two videos on YouTube: * Getting started: https://www.youtube.com/watch?v=Kc66hADHNyI * Usage of setup scripts: https://www.youtube.com/watch?v=9drW_6eg6G4 SASUnit is brought to you by HMS Analytical Software...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    VALL-E

    VALL-E

    PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)

    ...Specifically, we train a neural codec language model (called VALL-E) using discrete codes derived from an off-the-shelf neural audio codec model, and regard TTS as a conditional language modeling task rather than continuous signal regression as in previous work. During the pre-training stage, we scale up the TTS training data to 60K hours of English speech which is hundreds of times larger than existing systems. VALL-E emerges in-context learning capabilities and can be used to synthesize high-quality personalized speech with only a 3-second enrolled recording of an unseen speaker as an acoustic prompt. Experiment results show that VALL-E significantly outperforms the state-of-the-art zero-shot TTS system in terms of speech naturalness and speaker similarity. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    Feathr

    Feathr

    A scalable, unified data and AI engineering platform for enterprise

    Feathr is a data and AI engineering platform that is widely used in production at LinkedIn for many years and was open sourced in 2022. It is currently a project under LF AI & Data Foundation. Define data and feature transformations based on raw data sources (batch and streaming) using Pythonic APIs. Register transformations by names and get transformed data(features) for various use cases including AI modeling, compliance, go-to-market and more. Share transformations and data(features)...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Aestel

    Aestel

    Applications for data management

    "Information is data in action", and, consequently, having good quality data is essential. The AESTEL package contains two highly configurable applications for data management: A data loader and a reporting application, i.e. DataLoader and AEREA, respectively. The data loader application applies user-defined instructions to validate, process and load data.
    Downloads: 0 This Week
    Last Update:
    See Project
Auth0 Logo