Showing 766 open source projects for "data quality"

View related business solutions
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • 1
    DeepSearcher

    DeepSearcher

    Open Source Deep Research Alternative to Reason and Search

    DeepSearcher is an open-source “deep research” style system that combines retrieval with evaluation and reasoning to answer complex questions using private or enterprise data. It is designed around the idea that high-quality answers require more than top-k retrieval, so it orchestrates multi-step search, evidence collection, and synthesis into a comprehensive response. The project integrates with vector databases (including Milvus and related options) so organizations can index internal documents and query them with semantic retrieval. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Claude Code Plugins Directory

    Claude Code Plugins Directory

    Official, Anthropic-managed directory of high quality Claude Plugins

    Claude Code Plugins Directory repository provides a collection of plugins intended to extend Claude’s capabilities by turning the model into a specialized assistant tailored to specific workflows, teams, or organizational needs. These plugins define how Claude should access tools, retrieve data, and execute structured tasks so that outputs become more consistent and production-ready. The project emphasizes customizable automation by allowing developers to encode preferred workflows, domain...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 3
    Scientific Visualization

    Scientific Visualization

    An open access book on scientific visualization using python

    The Scientific Visualization book is a freely available open-access textbook that introduces how to produce effective scientific visualizations using Python, focusing especially on leveraging the popular plotting library Matplotlib (and related tools). It goes beyond simple plotting tutorials and emphasizes design principles: how to choose colors, layout subplots, annotate graphs, and present data in a way that is both accurate and visually compelling. As such, it serves as a guide for researchers, data scientists, and academic authors who need to create publication-quality figures or explanatory graphics, rather than quick exploratory plots. It includes extensive examples that demonstrate best practices — for instance handling multiple subplots, combining line plots with scatter/density overlays, or rendering high-resolution vector graphics for print.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    HY-Motion 1.0

    HY-Motion 1.0

    HY-Motion model for 3D character animation generation

    ...The training strategy for the HY-Motion series includes extensive pre-training on thousands of hours of varied motion data, fine-tuning on curated high-quality datasets, and reinforcement learning with human feedback, which improves both the plausibility and adaptability of generated motion sequences.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    OpenTelemetry

    OpenTelemetry

    OpenTelemetry Go API and SDK

    OpenTelemetry-Go is the Go implementation of OpenTelemetry. It provides a set of APIs to directly measure the performance and behavior of your software and send this data to observability platforms. High-quality, ubiquitous, and portable telemetry to enable effective observability. OpenTelemetry is a collection of APIs, SDKs, and tools. Use it to instrument, generate, collect, and export telemetry data (metrics, logs, and traces) to help you analyze your software’s performance and behavior.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Learn AI Engineering

    Learn AI Engineering

    Learn AI and LLMs from scratch using free resources

    ...The curation recognizes modern AI realities, including data pipelines, evaluation, prompt engineering, retrieval-augmented generation, and cost/performance trade-offs. It’s equally useful for refreshers—dipping into a specific module before a project—as it is for a full, self-directed curriculum. By centralizing the best references in one place, the repo reduces the overhead of finding, filtering, and sequencing resources, letting you focus on learning and building.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    Matrix

    Matrix

    Multi-Agent daTa geneRation Infra and eXperimentation framework

    Matrix is a distributed, large-scale engine for multi-agent synthetic data generation and experiments: it provides the infrastructure to run thousands of “agentic” workflows concurrently (e.g. multiple LLMs interacting, reasoning, generating content, data-processing pipelines) by leveraging distributed computing (like Ray + cluster management). The idea is to treat data generation as a “data-to-data” transformation: each input item defines a task, and the runtime orchestrates asynchronous,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    tsfresh

    tsfresh

    Automatic extraction of relevant features from time series

    tsfresh is a python package. It automatically calculates a large number of time series characteristics, the so called features. tsfresh is used to to extract characteristics from time series. Without tsfresh, you would have to calculate all characteristics by hand. With tsfresh this process is automated and all your features can be calculated automatically. Further tsfresh is compatible with pythons pandas and scikit-learn APIs, two important packages for Data Science endeavours in python....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    StabilityMatrix

    StabilityMatrix

    Multi-Platform Package Manager for Stable Diffusion

    StabilityMatrix is a project that helps organize, evaluate, and compare generative AI models and their behavior across prompts, datasets, or configuration settings. It provides a framework to run experiments systematically—capturing inputs, model configurations, outputs, and metrics—so researchers and practitioners can reason about differences in quality, robustness, and failure modes. The repository often bundles tooling for automated prompt sweeping, scoring heuristics (such as diversity,...
    Downloads: 89 This Week
    Last Update:
    See Project
  • Error to trace to log to deploy. One click. No SSH. Icon
    Error to trace to log to deploy. One click. No SSH.

    Catch the cause before the pager goes off.

    AppSignal links every error to the trace, the trace to the log, the log to the deploy that shipped it.
    Free 30 days.
  • 10
    Unstract

    Unstract

    No-code LLM Platform to launch APIs and ETL Pipelines

    Unstract is a powerful open-source, no-code platform built to automate the extraction and structuring of unstructured documents using large language models and flexible workflows, enabling developers and data teams to turn messy files into organized JSON content without complex coding. It integrates a visual Prompt Studio environment where users can iteratively design extraction schemas, compare outputs from different models, and monitor costs and accuracy side by side, making it easier to...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 11
    LangWatch

    LangWatch

    The platform for LLM evaluations and AI agent testing

    ...The platform provides tools for tracking model interactions, analyzing prompt behavior, and identifying issues such as hallucinations, latency problems, or unexpected responses. By collecting telemetry data from AI applications, LangWatch allows developers to understand how their systems perform in real-world usage scenarios. The platform includes dashboards that visualize model behavior, enabling teams to monitor trends in response quality and reliability over time. It also provides evaluation tools that allow developers to test prompts and compare outputs across different models or configurations. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    DocETL

    DocETL

    A system for agentic LLM-powered data processing and ETL

    DocETL is an open-source system designed to build and execute data processing pipelines powered by large language models, particularly for analyzing complex collections of documents and unstructured datasets. The platform allows developers and researchers to construct structured workflows that extract, transform, and organize information from sources such as reports, transcripts, legal documents, and other text-heavy data. Instead of relying on single prompts or ad-hoc scripts, DocETL...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    mistletoe

    mistletoe

    A fast, extensible and spec-compliant Markdown parser in pure Python

    mistletoe is a Markdown parser in pure Python, designed to be fast, spec-compliant and fully customizable. Apart from being the fastest CommonMark-compliant Markdown parser implementation in pure Python, mistletoe also supports easy definitions of custom tokens. Parsing Markdown into an abstract syntax tree also allows us to swap out renderers for different output formats, without touching any of the core components.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    PGFPlots

    PGFPlots

    A TeX package to draw normal and/or logarithmic plots directly in TeX

    PGFPlots, a TeX package to draw normal and/or logarithmic plots directly in TeX in two and three dimensions with a user-friendly interface, and PGFPlotstable, a TeX package to round and format numerical tables. Examples in manuals and/or on the website. PGFPlots draws high-quality function plots in normal or logarithmic scaling with a user-friendly interface directly in TeX. The user supplies axis labels, legend entries and the plot coordinates for one or more plots and PGFPlots applies axis...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    DeckTape

    DeckTape

    PDF exporter for HTML presentations

    DeckTape is a high-quality PDF exporter for HTML presentation frameworks. DeckTape is built on top of Puppeteer which relies on Google Chrome for laying out and rendering Web pages and provides a headless Chrome instance scriptable with a JavaScript API. DeckTape currently supports the following presentation frameworks out of the box. DeckTape also provides a generic command that works by emulating the end-user interaction, allowing it to be used to convert presentations from virtually any...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    Amper

    Amper

    Build tool for the Kotlin and Java languages

    Amper is an open-source data collection and metric reporting agent developed by JetBrains as part of their internal analytics and telemetry infrastructure for IntelliJ-based products. Its purpose is to gather usage statistics, performance metrics, error reports, and other diagnostic signals from IDE installations in a privacy-conscious way to help product teams understand real-world usage patterns and improve quality.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    NeMo Curator

    NeMo Curator

    Scalable data pre processing and curation toolkit for LLMs

    NeMo Curator is a Python library specifically designed for fast and scalable dataset preparation and curation for large language model (LLM) use-cases such as foundation model pretraining, domain-adaptive pretraining (DAPT), supervised fine-tuning (SFT) and paramter-efficient fine-tuning (PEFT). It greatly accelerates data curation by leveraging GPUs with Dask and RAPIDS, resulting in significant time savings. The library provides a customizable and modular interface, simplifying pipeline expansion and accelerating model convergence through the preparation of high-quality tokens. At the core of the NeMo Curator is the DocumentDataset which serves as the the main dataset class. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Claude Context

    Claude Context

    Code search MCP for Claude Code

    Claude Context is a tool designed to enhance the contextual understanding of large language models by managing and injecting relevant information into prompts. It focuses on improving response quality by ensuring that models have access to the most relevant data when generating outputs. The system integrates with vector databases and retrieval systems, enabling efficient storage and retrieval of contextual information. It supports workflows such as retrieval-augmented generation, where external knowledge is dynamically incorporated into model responses. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    DINOv3

    DINOv3

    Reference PyTorch implementation and models for DINOv3

    DINOv3 is the third-generation iteration of Meta’s self-supervised visual representation learning framework, building upon the ideas from DINO and DINOv2. It continues the paradigm of learning strong image representations without labels using teacher–student distillation, but introduces a simplified and more scalable training recipe that performs well across datasets and architectures. DINOv3 removes the need for complex augmentations or momentum encoders, streamlining the pipeline while...
    Downloads: 15 This Week
    Last Update:
    See Project
  • 20
    rtk

    rtk

    CLI proxy that reduces LLM token consumption

    ...RTK intercepts these command outputs and compresses them into concise summaries before sending them to the language model. This process helps maintain important information while removing redundant data such as boilerplate logs, long directory listings, or repetitive test outputs. By minimizing the amount of noise sent to the AI model, the tool improves reasoning quality and allows longer development sessions within the same context window. The system is implemented as a lightweight Rust binary that runs locally and integrates easily with common AI coding environments.
    Downloads: 23 This Week
    Last Update:
    See Project
  • 21
    TIGRE

    TIGRE

    TIGRE: Tomographic Iterative GPU-based Reconstruction Toolbox

    TIGRE is an open-source toolbox for fast and accurate 3D tomographic reconstruction for any geometry. Its focus is on iterative algorithms for improved image quality that have all been optimized to run on GPUs (including multi-GPUs) for improved speed. It combines the higher-level abstraction of MATLAB or Python with the performance of CUDA at a lower level in order to make it both fast and easy to use. TIGRE is free to download and distribute: use it, modify it, add to it, and share it. Our...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Agents 2.0

    Agents 2.0

    An Open-source Framework for Data-centric Language Agents

    ...During training, the system performs a forward execution where the agent completes a task and records the trajectory of prompts, outputs, and tool usage. A prompt-based loss function is then applied to evaluate the quality of the outcome, generating language-based gradients that guide improvements to the agent pipeline.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    prompts.chat

    prompts.chat

    Share, discover, and collect prompts

    prompts.chat, also known as Awesome ChatGPT Prompts, is an open-source community project that curates high-quality prompt examples for modern AI chat models. The repository functions as a centralized library where users can browse, share, and collect prompt templates designed to improve the usefulness and creativity of AI interactions. Originally built around ChatGPT use cases, the prompts are broadly compatible with many contemporary large language models, making the resource flexible...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 24
    Cinephage

    Cinephage

    The AIO solution to your self hosted media gathering needs

    Cinephage is an ambitious all-in-one media management platform aimed at self-hosters who want a unified interface for movies, TV shows, live TV, downloads, indexers, subtitles, and streaming workflows. Instead of relying on a patchwork of separate tools that each handle one slice of the media stack, Cinephage brings everything under a single database and responsive UI built with modern web frameworks like Svelte. It’s designed so that everything — from content discovery to library...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Uncertainty Baselines

    Uncertainty Baselines

    High-quality implementations of standard and SOTA methods

    Uncertainty Baselines is a collection of strong, well-documented training pipelines that make it straightforward to evaluate predictive uncertainty in modern machine learning models. Rather than offering toy scripts, it provides end-to-end recipes—data input, model architectures, training loops, evaluation metrics, and logging—so results are comparable across runs and research groups. The library spans canonical modalities and tasks, from image classification and NLP to tabular problems,...
    Downloads: 0 This Week
    Last Update:
    See Project
Auth0 Logo