Search Results for "data quality" - Page 5

Sort By:

Showing 888 open source projects for "data quality"

View related business solutions

Linux Clear Filters & Widen Search

$300 Free Credits to Build on Google Cloud
New to Google Cloud? Get $300 in credits to explore Compute Engine, BigQuery, Cloud Run, Gemini Enterprise Agent Platform, and more.

Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query petabytes in BigQuery, or build agents with Gemini Enterprise Agent Platform. Once your credits are used, keep building with 20+ always-free tier products including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. No commitment required—just sign up and start building.

Claim $300 Free
Build Agents and Models on One Platform
Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.

Try It Free
1

SALMONN family

A suite of advanced multi-modal LLMs

SALMONN is a family of advanced multi-modal large language models (LLMs) developed by ByteDance — designed to handle and integrate multiple data modalities (e.g. text, audio, video) rather than just plain text. The repository bundles different branches targeting specialized tasks (e.g. video-SALMONN, speech-quality assessment, general multimodal tasks), suggesting that the project is modular and extensible across domains. SALMONN aims to push the frontier of multi-modal AI by allowing models to process and reason over diverse inputs, which can be useful for applications such as video understanding, speech analytics, cross-modal retrieval, and general AI capable of interpreting rich, multi-sensory data.

Downloads: 0 This Week

Last Update: 2026-05-14
See Project
2

Claude Code Plugins Directory

Official, Anthropic-managed directory of high quality Claude Plugins

Claude Code Plugins Directory repository provides a collection of plugins intended to extend Claude’s capabilities by turning the model into a specialized assistant tailored to specific workflows, teams, or organizational needs. These plugins define how Claude should access tools, retrieve data, and execute structured tasks so that outputs become more consistent and production-ready. The project emphasizes customizable automation by allowing developers to encode preferred workflows, domain...

Downloads: 4 This Week

Last Update: 5 days ago
See Project
3

DeepSearcher

Open Source Deep Research Alternative to Reason and Search

DeepSearcher is an open-source “deep research” style system that combines retrieval with evaluation and reasoning to answer complex questions using private or enterprise data. It is designed around the idea that high-quality answers require more than top-k retrieval, so it orchestrates multi-step search, evidence collection, and synthesis into a comprehensive response. The project integrates with vector databases (including Milvus and related options) so organizations can index internal documents and query them with semantic retrieval. ...

Downloads: 0 This Week

Last Update: 2026-03-08
See Project
4

Scientific Visualization

An open access book on scientific visualization using python

The Scientific Visualization book is a freely available open-access textbook that introduces how to produce effective scientific visualizations using Python, focusing especially on leveraging the popular plotting library Matplotlib (and related tools). It goes beyond simple plotting tutorials and emphasizes design principles: how to choose colors, layout subplots, annotate graphs, and present data in a way that is both accurate and visually compelling. As such, it serves as a guide for researchers, data scientists, and academic authors who need to create publication-quality figures or explanatory graphics, rather than quick exploratory plots. It includes extensive examples that demonstrate best practices — for instance handling multiple subplots, combining line plots with scatter/density overlays, or rendering high-resolution vector graphics for print.

Downloads: 0 This Week

Last Update: 2026-01-04
See Project
Stop vibe-debugging.
Plug Claude into your app's actual errors.

AppSignal's MCP server hands Claude, Cursor, or Zed your real errors, traces, and the deploy that shipped them. AI writes the fix; you review the diff.

Free 30 days.
5

plotly.js

JavaScript charting library behind Plotly and Dash

Plotly JavaScript Open Source Graphing Library. Built on top of d3.js and stack.gl, Plotly.js is a high-level, declarative charting library. plotly.js ships with over 40 chart types, including 3D charts, statistical graphs, and SVG maps. plotly.js is free and open source and you can view the source, report issues or contribute on GitHub. For plotly.js to build with Webpack you will need to install ify-loader@v1.1.0+ and add it to your webpack.config.json. This adds Browserify transform...

Downloads: 4 This Week

Last Update: 2026-06-01
See Project
6

fireworks-tech-graph

Claude Code skill for generating production-quality SVG+PNG technical

fireworks-tech-graph is an AI-driven project focused on building structured knowledge graphs that map relationships between technologies, concepts, and entities within technical domains. It aims to transform unstructured information into interconnected graphs that can be queried and analyzed for insights, making it easier to understand complex ecosystems such as software stacks or research fields. The system likely leverages AI techniques for entity extraction, relationship mapping, and...

Downloads: 2 This Week

Last Update: 2026-06-03
See Project
7

StabilityMatrix

Multi-Platform Package Manager for Stable Diffusion

StabilityMatrix is a project that helps organize, evaluate, and compare generative AI models and their behavior across prompts, datasets, or configuration settings. It provides a framework to run experiments systematically—capturing inputs, model configurations, outputs, and metrics—so researchers and practitioners can reason about differences in quality, robustness, and failure modes. The repository often bundles tooling for automated prompt sweeping, scoring heuristics (such as diversity,...

Downloads: 86 This Week

Last Update: 2026-06-16
See Project
8

Learn AI Engineering

Learn AI and LLMs from scratch using free resources

...The curation recognizes modern AI realities, including data pipelines, evaluation, prompt engineering, retrieval-augmented generation, and cost/performance trade-offs. It’s equally useful for refreshers—dipping into a specific module before a project—as it is for a full, self-directed curriculum. By centralizing the best references in one place, the repo reduces the overhead of finding, filtering, and sequencing resources, letting you focus on learning and building.

Downloads: 1 This Week

Last Update: 2026-02-05
See Project
9

OpenTelemetry

OpenTelemetry Go API and SDK

OpenTelemetry-Go is the Go implementation of OpenTelemetry. It provides a set of APIs to directly measure the performance and behavior of your software and send this data to observability platforms. High-quality, ubiquitous, and portable telemetry to enable effective observability. OpenTelemetry is a collection of APIs, SDKs, and tools. Use it to instrument, generate, collect, and export telemetry data (metrics, logs, and traces) to help you analyze your software’s performance and behavior.

Downloads: 0 This Week

Last Update: 2026-05-27
See Project
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
10

tsfresh

Automatic extraction of relevant features from time series

tsfresh is a python package. It automatically calculates a large number of time series characteristics, the so called features. tsfresh is used to to extract characteristics from time series. Without tsfresh, you would have to calculate all characteristics by hand. With tsfresh this process is automated and all your features can be calculated automatically. Further tsfresh is compatible with pythons pandas and scikit-learn APIs, two important packages for Data Science endeavours in python....

Downloads: 0 This Week

Last Update: 2026-05-31
See Project
11

Matrix

Multi-Agent daTa geneRation Infra and eXperimentation framework

Matrix is a distributed, large-scale engine for multi-agent synthetic data generation and experiments: it provides the infrastructure to run thousands of “agentic” workflows concurrently (e.g. multiple LLMs interacting, reasoning, generating content, data-processing pipelines) by leveraging distributed computing (like Ray + cluster management). The idea is to treat data generation as a “data-to-data” transformation: each input item defines a task, and the runtime orchestrates asynchronous,...

Downloads: 0 This Week

Last Update: 2026-03-05
See Project
12

HY-Motion 1.0

HY-Motion model for 3D character animation generation

...The training strategy for the HY-Motion series includes extensive pre-training on thousands of hours of varied motion data, fine-tuning on curated high-quality datasets, and reinforcement learning with human feedback, which improves both the plausibility and adaptability of generated motion sequences.

Downloads: 2 This Week

Last Update: 2026-05-25
See Project
13

mistletoe

A fast, extensible and spec-compliant Markdown parser in pure Python

mistletoe is a Markdown parser in pure Python, designed to be fast, spec-compliant and fully customizable. Apart from being the fastest CommonMark-compliant Markdown parser implementation in pure Python, mistletoe also supports easy definitions of custom tokens. Parsing Markdown into an abstract syntax tree also allows us to swap out renderers for different output formats, without touching any of the core components.

Downloads: 0 This Week

Last Update: 2025-12-07
See Project
14

PGFPlots

A TeX package to draw normal and/or logarithmic plots directly in TeX

PGFPlots, a TeX package to draw normal and/or logarithmic plots directly in TeX in two and three dimensions with a user-friendly interface, and PGFPlotstable, a TeX package to round and format numerical tables. Examples in manuals and/or on the website. PGFPlots draws high-quality function plots in normal or logarithmic scaling with a user-friendly interface directly in TeX. The user supplies axis labels, legend entries and the plot coordinates for one or more plots and PGFPlots applies axis...

Downloads: 1 This Week

Last Update: 2025-08-14
See Project
15

Unstract

No-code LLM Platform to launch APIs and ETL Pipelines

Unstract is a powerful open-source, no-code platform built to automate the extraction and structuring of unstructured documents using large language models and flexible workflows, enabling developers and data teams to turn messy files into organized JSON content without complex coding. It integrates a visual Prompt Studio environment where users can iteratively design extraction schemas, compare outputs from different models, and monitor costs and accuracy side by side, making it easier to...

Downloads: 2 This Week

Last Update: 2 days ago
See Project
16

rtk

CLI proxy that reduces LLM token consumption

...RTK intercepts these command outputs and compresses them into concise summaries before sending them to the language model. This process helps maintain important information while removing redundant data such as boilerplate logs, long directory listings, or repetitive test outputs. By minimizing the amount of noise sent to the AI model, the tool improves reasoning quality and allows longer development sessions within the same context window. The system is implemented as a lightweight Rust binary that runs locally and integrates easily with common AI coding environments.

Downloads: 26 This Week

Last Update: 2026-06-12
See Project
17

DINOv3

Reference PyTorch implementation and models for DINOv3

DINOv3 is the third-generation iteration of Meta’s self-supervised visual representation learning framework, building upon the ideas from DINO and DINOv2. It continues the paradigm of learning strong image representations without labels using teacher–student distillation, but introduces a simplified and more scalable training recipe that performs well across datasets and architectures. DINOv3 removes the need for complex augmentations or momentum encoders, streamlining the pipeline while...

Downloads: 17 This Week

Last Update: 2026-06-15
See Project
18

Amper

Build tool for the Kotlin and Java languages

Amper is an open-source data collection and metric reporting agent developed by JetBrains as part of their internal analytics and telemetry infrastructure for IntelliJ-based products. Its purpose is to gather usage statistics, performance metrics, error reports, and other diagnostic signals from IDE installations in a privacy-conscious way to help product teams understand real-world usage patterns and improve quality.

Downloads: 0 This Week

Last Update: 2026-03-30
See Project
19

NeMo Curator

Scalable data pre processing and curation toolkit for LLMs

NeMo Curator is a Python library specifically designed for fast and scalable dataset preparation and curation for large language model (LLM) use-cases such as foundation model pretraining, domain-adaptive pretraining (DAPT), supervised fine-tuning (SFT) and paramter-efficient fine-tuning (PEFT). It greatly accelerates data curation by leveraging GPUs with Dask and RAPIDS, resulting in significant time savings. The library provides a customizable and modular interface, simplifying pipeline expansion and accelerating model convergence through the preparation of high-quality tokens. At the core of the NeMo Curator is the DocumentDataset which serves as the the main dataset class. ...

Downloads: 0 This Week

Last Update: 2026-05-12
See Project
20

Claude Context

Code search MCP for Claude Code

Claude Context is a tool designed to enhance the contextual understanding of large language models by managing and injecting relevant information into prompts. It focuses on improving response quality by ensuring that models have access to the most relevant data when generating outputs. The system integrates with vector databases and retrieval systems, enabling efficient storage and retrieval of contextual information. It supports workflows such as retrieval-augmented generation, where external knowledge is dynamically incorporated into model responses. ...

Downloads: 0 This Week

Last Update: 2026-04-28
See Project
21

OpenFreeMap

Free and open-source map hosting solution with custom styles

OpenFreeMap is a free and open-source map hosting platform that allows developers to display customizable maps in websites and applications without relying on commercial providers. It uses OpenStreetMap data and modern vector tile technologies to deliver high-quality maps with flexible styling options. The platform can be self-hosted or accessed through a public instance, offering full control or convenience depending on user needs. It removes common barriers such as API keys, usage limits, and tracking mechanisms, emphasizing privacy and accessibility. ...

Downloads: 1 This Week

Last Update: 2026-05-10
See Project
22

Agents 2.0

An Open-source Framework for Data-centric Language Agents

...During training, the system performs a forward execution where the agent completes a task and records the trajectory of prompts, outputs, and tool usage. A prompt-based loss function is then applied to evaluate the quality of the outcome, generating language-based gradients that guide improvements to the agent pipeline.

Downloads: 1 This Week

Last Update: 2026-03-04
See Project
23

prompts.chat

Share, discover, and collect prompts

prompts.chat, also known as Awesome ChatGPT Prompts, is an open-source community project that curates high-quality prompt examples for modern AI chat models. The repository functions as a centralized library where users can browse, share, and collect prompt templates designed to improve the usefulness and creativity of AI interactions. Originally built around ChatGPT use cases, the prompts are broadly compatible with many contemporary large language models, making the resource flexible...

Downloads: 4 This Week

Last Update: 3 days ago
See Project
24

LangWatch

The platform for LLM evaluations and AI agent testing

...The platform provides tools for tracking model interactions, analyzing prompt behavior, and identifying issues such as hallucinations, latency problems, or unexpected responses. By collecting telemetry data from AI applications, LangWatch allows developers to understand how their systems perform in real-world usage scenarios. The platform includes dashboards that visualize model behavior, enabling teams to monitor trends in response quality and reliability over time. It also provides evaluation tools that allow developers to test prompts and compare outputs across different models or configurations. ...

Downloads: 0 This Week

Last Update: 2026-06-19
See Project
25

Uncertainty Baselines

High-quality implementations of standard and SOTA methods

Uncertainty Baselines is a collection of strong, well-documented training pipelines that make it straightforward to evaluate predictive uncertainty in modern machine learning models. Rather than offering toy scripts, it provides end-to-end recipes—data input, model architectures, training loops, evaluation metrics, and logging—so results are comparable across runs and research groups. The library spans canonical modalities and tasks, from image classification and NLP to tabular problems,...

Downloads: 0 This Week

Last Update: 2026-03-24
See Project