Search Results for "data quality" - Page 5

Sort By:

Showing 766 open source projects for "data quality"

View related business solutions

Mac Clear Filters & Widen Search

$300 Free Credits for Your Google Cloud Projects
Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.

Start Free Trial
Ship Agents Faster
Transform your applications and workflows into powerful agentic systems at global scale.

Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.

Get Started Free
1

DeepSearcher

Open Source Deep Research Alternative to Reason and Search

DeepSearcher is an open-source “deep research” style system that combines retrieval with evaluation and reasoning to answer complex questions using private or enterprise data. It is designed around the idea that high-quality answers require more than top-k retrieval, so it orchestrates multi-step search, evidence collection, and synthesis into a comprehensive response. The project integrates with vector databases (including Milvus and related options) so organizations can index internal documents and query them with semantic retrieval. ...

Downloads: 0 This Week

Last Update: 2026-03-08
See Project
2

Claude Code Plugins Directory

Official, Anthropic-managed directory of high quality Claude Plugins

Claude Code Plugins Directory repository provides a collection of plugins intended to extend Claude’s capabilities by turning the model into a specialized assistant tailored to specific workflows, teams, or organizational needs. These plugins define how Claude should access tools, retrieve data, and execute structured tasks so that outputs become more consistent and production-ready. The project emphasizes customizable automation by allowing developers to encode preferred workflows, domain...

Downloads: 4 This Week

Last Update: 5 days ago
See Project
3

Scientific Visualization

An open access book on scientific visualization using python

The Scientific Visualization book is a freely available open-access textbook that introduces how to produce effective scientific visualizations using Python, focusing especially on leveraging the popular plotting library Matplotlib (and related tools). It goes beyond simple plotting tutorials and emphasizes design principles: how to choose colors, layout subplots, annotate graphs, and present data in a way that is both accurate and visually compelling. As such, it serves as a guide for researchers, data scientists, and academic authors who need to create publication-quality figures or explanatory graphics, rather than quick exploratory plots. It includes extensive examples that demonstrate best practices — for instance handling multiple subplots, combining line plots with scatter/density overlays, or rendering high-resolution vector graphics for print.

Downloads: 0 This Week

Last Update: 2026-01-04
See Project
4

HY-Motion 1.0

HY-Motion model for 3D character animation generation

...The training strategy for the HY-Motion series includes extensive pre-training on thousands of hours of varied motion data, fine-tuning on curated high-quality datasets, and reinforcement learning with human feedback, which improves both the plausibility and adaptability of generated motion sequences.

Downloads: 3 This Week

Last Update: 2026-05-25
See Project
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
5

OpenTelemetry

OpenTelemetry Go API and SDK

OpenTelemetry-Go is the Go implementation of OpenTelemetry. It provides a set of APIs to directly measure the performance and behavior of your software and send this data to observability platforms. High-quality, ubiquitous, and portable telemetry to enable effective observability. OpenTelemetry is a collection of APIs, SDKs, and tools. Use it to instrument, generate, collect, and export telemetry data (metrics, logs, and traces) to help you analyze your software’s performance and behavior.

Downloads: 0 This Week

Last Update: 2026-05-27
See Project
6

Learn AI Engineering

Learn AI and LLMs from scratch using free resources

...The curation recognizes modern AI realities, including data pipelines, evaluation, prompt engineering, retrieval-augmented generation, and cost/performance trade-offs. It’s equally useful for refreshers—dipping into a specific module before a project—as it is for a full, self-directed curriculum. By centralizing the best references in one place, the repo reduces the overhead of finding, filtering, and sequencing resources, letting you focus on learning and building.

Downloads: 1 This Week

Last Update: 2026-02-05
See Project
7

Matrix

Multi-Agent daTa geneRation Infra and eXperimentation framework

Matrix is a distributed, large-scale engine for multi-agent synthetic data generation and experiments: it provides the infrastructure to run thousands of “agentic” workflows concurrently (e.g. multiple LLMs interacting, reasoning, generating content, data-processing pipelines) by leveraging distributed computing (like Ray + cluster management). The idea is to treat data generation as a “data-to-data” transformation: each input item defines a task, and the runtime orchestrates asynchronous,...

Downloads: 0 This Week

Last Update: 2026-03-05
See Project
8

tsfresh

Automatic extraction of relevant features from time series

tsfresh is a python package. It automatically calculates a large number of time series characteristics, the so called features. tsfresh is used to to extract characteristics from time series. Without tsfresh, you would have to calculate all characteristics by hand. With tsfresh this process is automated and all your features can be calculated automatically. Further tsfresh is compatible with pythons pandas and scikit-learn APIs, two important packages for Data Science endeavours in python....

Downloads: 0 This Week

Last Update: 2026-05-31
See Project
9

StabilityMatrix

Multi-Platform Package Manager for Stable Diffusion

StabilityMatrix is a project that helps organize, evaluate, and compare generative AI models and their behavior across prompts, datasets, or configuration settings. It provides a framework to run experiments systematically—capturing inputs, model configurations, outputs, and metrics—so researchers and practitioners can reason about differences in quality, robustness, and failure modes. The repository often bundles tooling for automated prompt sweeping, scoring heuristics (such as diversity,...

Downloads: 89 This Week

Last Update: 2026-06-16
See Project
Error to trace to log to deploy. One click. No SSH.
Catch the cause before the pager goes off.

AppSignal links every error to the trace, the trace to the log, the log to the deploy that shipped it.

Free 30 days.
10

Unstract

No-code LLM Platform to launch APIs and ETL Pipelines

Unstract is a powerful open-source, no-code platform built to automate the extraction and structuring of unstructured documents using large language models and flexible workflows, enabling developers and data teams to turn messy files into organized JSON content without complex coding. It integrates a visual Prompt Studio environment where users can iteratively design extraction schemas, compare outputs from different models, and monitor costs and accuracy side by side, making it easier to...

Downloads: 3 This Week

Last Update: 2 days ago
See Project
11

LangWatch

The platform for LLM evaluations and AI agent testing

...The platform provides tools for tracking model interactions, analyzing prompt behavior, and identifying issues such as hallucinations, latency problems, or unexpected responses. By collecting telemetry data from AI applications, LangWatch allows developers to understand how their systems perform in real-world usage scenarios. The platform includes dashboards that visualize model behavior, enabling teams to monitor trends in response quality and reliability over time. It also provides evaluation tools that allow developers to test prompts and compare outputs across different models or configurations. ...

Downloads: 1 This Week

Last Update: 2026-06-19
See Project
12

DocETL

A system for agentic LLM-powered data processing and ETL

DocETL is an open-source system designed to build and execute data processing pipelines powered by large language models, particularly for analyzing complex collections of documents and unstructured datasets. The platform allows developers and researchers to construct structured workflows that extract, transform, and organize information from sources such as reports, transcripts, legal documents, and other text-heavy data. Instead of relying on single prompts or ad-hoc scripts, DocETL...

Downloads: 1 This Week

Last Update: 2026-06-17
See Project
13

mistletoe

A fast, extensible and spec-compliant Markdown parser in pure Python

mistletoe is a Markdown parser in pure Python, designed to be fast, spec-compliant and fully customizable. Apart from being the fastest CommonMark-compliant Markdown parser implementation in pure Python, mistletoe also supports easy definitions of custom tokens. Parsing Markdown into an abstract syntax tree also allows us to swap out renderers for different output formats, without touching any of the core components.

Downloads: 0 This Week

Last Update: 2025-12-07
See Project
14

PGFPlots

A TeX package to draw normal and/or logarithmic plots directly in TeX

PGFPlots, a TeX package to draw normal and/or logarithmic plots directly in TeX in two and three dimensions with a user-friendly interface, and PGFPlotstable, a TeX package to round and format numerical tables. Examples in manuals and/or on the website. PGFPlots draws high-quality function plots in normal or logarithmic scaling with a user-friendly interface directly in TeX. The user supplies axis labels, legend entries and the plot coordinates for one or more plots and PGFPlots applies axis...

Downloads: 1 This Week

Last Update: 2025-08-14
See Project
15

DeckTape

PDF exporter for HTML presentations

DeckTape is a high-quality PDF exporter for HTML presentation frameworks. DeckTape is built on top of Puppeteer which relies on Google Chrome for laying out and rendering Web pages and provides a headless Chrome instance scriptable with a JavaScript API. DeckTape currently supports the following presentation frameworks out of the box. DeckTape also provides a generic command that works by emulating the end-user interaction, allowing it to be used to convert presentations from virtually any...

Downloads: 2 This Week

Last Update: 2026-04-20
See Project
16

Amper

Build tool for the Kotlin and Java languages

Amper is an open-source data collection and metric reporting agent developed by JetBrains as part of their internal analytics and telemetry infrastructure for IntelliJ-based products. Its purpose is to gather usage statistics, performance metrics, error reports, and other diagnostic signals from IDE installations in a privacy-conscious way to help product teams understand real-world usage patterns and improve quality.

Downloads: 0 This Week

Last Update: 2026-03-30
See Project
17

NeMo Curator

Scalable data pre processing and curation toolkit for LLMs

NeMo Curator is a Python library specifically designed for fast and scalable dataset preparation and curation for large language model (LLM) use-cases such as foundation model pretraining, domain-adaptive pretraining (DAPT), supervised fine-tuning (SFT) and paramter-efficient fine-tuning (PEFT). It greatly accelerates data curation by leveraging GPUs with Dask and RAPIDS, resulting in significant time savings. The library provides a customizable and modular interface, simplifying pipeline expansion and accelerating model convergence through the preparation of high-quality tokens. At the core of the NeMo Curator is the DocumentDataset which serves as the the main dataset class. ...

Downloads: 0 This Week

Last Update: 2026-05-12
See Project
18

Claude Context

Code search MCP for Claude Code

Claude Context is a tool designed to enhance the contextual understanding of large language models by managing and injecting relevant information into prompts. It focuses on improving response quality by ensuring that models have access to the most relevant data when generating outputs. The system integrates with vector databases and retrieval systems, enabling efficient storage and retrieval of contextual information. It supports workflows such as retrieval-augmented generation, where external knowledge is dynamically incorporated into model responses. ...

Downloads: 0 This Week

Last Update: 2026-04-28
See Project
19

DINOv3

Reference PyTorch implementation and models for DINOv3

DINOv3 is the third-generation iteration of Meta’s self-supervised visual representation learning framework, building upon the ideas from DINO and DINOv2. It continues the paradigm of learning strong image representations without labels using teacher–student distillation, but introduces a simplified and more scalable training recipe that performs well across datasets and architectures. DINOv3 removes the need for complex augmentations or momentum encoders, streamlining the pipeline while...

Downloads: 15 This Week

Last Update: 2026-06-15
See Project
20

rtk

CLI proxy that reduces LLM token consumption

...RTK intercepts these command outputs and compresses them into concise summaries before sending them to the language model. This process helps maintain important information while removing redundant data such as boilerplate logs, long directory listings, or repetitive test outputs. By minimizing the amount of noise sent to the AI model, the tool improves reasoning quality and allows longer development sessions within the same context window. The system is implemented as a lightweight Rust binary that runs locally and integrates easily with common AI coding environments.

Downloads: 23 This Week

Last Update: 2026-06-12
See Project
21

TIGRE

TIGRE: Tomographic Iterative GPU-based Reconstruction Toolbox

TIGRE is an open-source toolbox for fast and accurate 3D tomographic reconstruction for any geometry. Its focus is on iterative algorithms for improved image quality that have all been optimized to run on GPUs (including multi-GPUs) for improved speed. It combines the higher-level abstraction of MATLAB or Python with the performance of CUDA at a lower level in order to make it both fast and easy to use. TIGRE is free to download and distribute: use it, modify it, add to it, and share it. Our...

Downloads: 0 This Week

Last Update: 2026-03-06
See Project
22

Agents 2.0

An Open-source Framework for Data-centric Language Agents

...During training, the system performs a forward execution where the agent completes a task and records the trajectory of prompts, outputs, and tool usage. A prompt-based loss function is then applied to evaluate the quality of the outcome, generating language-based gradients that guide improvements to the agent pipeline.

Downloads: 1 This Week

Last Update: 2026-03-04
See Project
23

prompts.chat

Share, discover, and collect prompts

prompts.chat, also known as Awesome ChatGPT Prompts, is an open-source community project that curates high-quality prompt examples for modern AI chat models. The repository functions as a centralized library where users can browse, share, and collect prompt templates designed to improve the usefulness and creativity of AI interactions. Originally built around ChatGPT use cases, the prompts are broadly compatible with many contemporary large language models, making the resource flexible...

Downloads: 4 This Week

Last Update: 3 days ago
See Project
24

Cinephage

The AIO solution to your self hosted media gathering needs

Cinephage is an ambitious all-in-one media management platform aimed at self-hosters who want a unified interface for movies, TV shows, live TV, downloads, indexers, subtitles, and streaming workflows. Instead of relying on a patchwork of separate tools that each handle one slice of the media stack, Cinephage brings everything under a single database and responsive UI built with modern web frameworks like Svelte. It’s designed so that everything — from content discovery to library...

Downloads: 0 This Week

Last Update: 5 days ago
See Project
25

Uncertainty Baselines

High-quality implementations of standard and SOTA methods

Uncertainty Baselines is a collection of strong, well-documented training pipelines that make it straightforward to evaluate predictive uncertainty in modern machine learning models. Rather than offering toy scripts, it provides end-to-end recipes—data input, model architectures, training loops, evaluation metrics, and logging—so results are comparable across runs and research groups. The library spans canonical modalities and tasks, from image classification and NLP to tabular problems,...

Downloads: 0 This Week

Last Update: 2026-03-24
See Project