Showing 888 open source projects for "data quality"

View related business solutions
  • Secure File Transfer for Windows with Cerberus by Redwood Icon
    Secure File Transfer for Windows with Cerberus by Redwood

    Protect and share files over FTP/S, SFTP, HTTPS and SCP with the #1 rated Windows file transfer server.

    Cerberus supports unlimited users and connections on a single IP, with built-in encryption, 2FA, and a browser-based web client — all deployable in under 15 minutes with a 25-day free trial.
    Try for Free
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 1
    SDGym

    SDGym

    Benchmarking synthetic data generation methods

    ...You also customize the process to include your own work. Select any of the publicly available datasets from the SDV project, or input your own data. Choose from any of the SDV synthesizers and baselines. Or write your own custom machine learning model. In addition to performance and memory usage, you can also measure synthetic data quality and privacy through a variety of metrics. Install SDGym using pip or conda. We recommend using a virtual environment to avoid conflicts with other software on your device.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Middleware

    Middleware

    Open-source DORA metrics platform for engineering teams

    Bring more visibility to your engineering pipeline, get the right data & actionable insights to unclog bottlenecks, ensuring smooth software delivery.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 3
    ydata-profiling

    ydata-profiling

    Create HTML profiling reports from pandas DataFrame objects

    ydata-profiling primary goal is to provide a one-line Exploratory Data Analysis (EDA) experience in a consistent and fast solution. Like pandas df.describe() function, that is so handy, ydata-profiling delivers an extended analysis of a DataFrame while allowing the data analysis to be exported in different formats such as html and json.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    whylogs

    whylogs

    The open standard for data logging

    whylogs is an open-source library for logging any kind of data. With whylogs, users are able to generate summaries of their datasets (called whylogs profiles) which they can use to track changes in their dataset Create data constraints to know whether their data looks the way it should. Quickly visualize key summary statistics about their datasets. whylogs profiles are the core of the whylogs library. They capture key statistical properties of data, such as the distribution (far beyond...
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 5
    Gyroflow

    Gyroflow

    Video stabilization using gyroscope data

    Gyroflow is an advanced open-source video stabilization application that uses gyroscope and motion sensor data to produce highly accurate and cinematic stabilization results. Instead of relying solely on visual estimation like traditional software stabilizers, it processes real motion data recorded by cameras or external sensors to achieve more precise compensation. This approach allows it to correct complex camera movement, rolling shutter distortion, and lens artifacts while preserving image quality.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 6
    Clustering.jl

    Clustering.jl

    A Julia package for data clustering

    Methods for data clustering and evaluation of clustering quality.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    CS-Ebook

    CS-Ebook

    Curated list of classic, high-quality computer science books

    CS-Ebook is a curated repository that compiles high-quality and classic computer science books across a wide range of software-related fields. It focuses on depth over volume, selecting only well-regarded titles that support structured learning and long-term skill development. It spans core areas such as computer fundamentals, programming languages, software engineering, mathematics, data science, and artificial intelligence, making it suitable for learners at different stages. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    Bespoke Curator

    Bespoke Curator

    Synthetic data curation for post-training and data extraction

    ...Curator includes tools for monitoring data generation processes and managing dataset quality while large batches of examples are being created. The framework also integrates with multiple inference systems and APIs, allowing users to generate data using different model providers or open-source inference engines.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    Zingg

    Zingg

    Scalable master data management and identity resolution

    Zingg is an open-source entity resolution and master data management platform for finding duplicate, related, or matching records across large datasets. It uses machine learning to learn how records should be compared, reducing the need for brittle hand-written matching rules. The project is designed for data engineering and analytics teams working on customer 360, supplier 360, deduplication, fuzzy matching, data quality, and golden record workflows. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • 10
    MinerU

    MinerU

    A high-quality tool for convert PDF to Markdown and JSON

    MinerU is an open-source, high-quality document extraction toolkit focused on converting PDFs (and other document formats) into structured Markdown and JSON. It leverages OCR and layout analysis to preserve semantic structure and metadata, ideal for research and data science workflows.
    Downloads: 18 This Week
    Last Update:
    See Project
  • 11
    WebP Codec

    WebP Codec

    Library to encode and decode images in WebP format

    libwebp is the reference codec library for Google’s WebP image format, providing both encoding and decoding along with command-line tools. It supplies cwebp to compress images into WebP and dwebp to decompress them back, making it easy to test quality/size trade-offs across presets and tuning parameters. The GitHub repository is a mirror; the canonical source of truth lives on Chromium’s git, and developer docs are hosted on WebP’s portal. The project underpins WebP support across browsers, imaging libraries, and many native apps thanks to its stable C API. Additional companion repos host test data and demos, including JavaScript builds and timing tests for various platforms. ...
    Downloads: 31 This Week
    Last Update:
    See Project
  • 12
    Panda-Helper

    Panda-Helper

    Panda-Helper: Data profiling utility for Pandas DataFrames and Series

    Panda-Helper is a simple data-profiling utility for Pandas DataFrames and Series. Assess data quality and usefulness with minimal effort. Quickly perform initial data exploration, so you can move on to more in-depth analysis.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Apache Airflow Provider

    Apache Airflow Provider

    Great Expectations Airflow operator

    Due to apply_default decorator removal, this version of the provider requires Airflow 2.1.0+. If your Airflow version is 2.1.0, and you want to install this provider version, first upgrade Airflow to at least version 2.1.0. Otherwise, your Airflow package version will be upgraded automatically, and you will have to manually run airflow upgrade db to complete the migration. This operator currently works with the Great Expectations V3 Batch Request API only. If you would like to use the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    gt R

    gt R

    Easily generate information-rich, publication-quality tables from R

    With the gt package, anyone can make wonderful-looking tables using the R programming language. The gt philosophy: we can construct a wide variety of useful tables with a cohesive set of table parts. These include the table header, the stub, the column labels and spanner column labels, the table body, and the table footer. It all begins with table data (be it a tibble or a data frame). You then decide how to compose your gt table with the elements and formatting you need for the task at...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Elementary

    Elementary

    Open-source data observability for analytics engineers

    Elementary is an open-source data observability solution for data & analytics engineers. Monitor your dbt project and data in minutes, and be the first to know of data issues. Gain immediate visibility, detect data issues, send actionable alerts, and understand the impact and root cause. Generate a data observability report, host it or share with your team. Monitoring of data quality metrics, freshness, volume and schema changes, including anomaly detection. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    pointblank

    pointblank

    Data quality assessment and metadata reporting for data frames

    With the pointblank package it’s really easy to methodically validate your data whether in the form of data frames or as database tables. On top of the validation toolset, the package gives you the means to provide and keep up-to-date with the information that defines your tables. For table validation, the agent object works with a large collection of simple (yet powerful!) validation functions. We can enable much more sophisticated validation checks by using custom expressions, segmenting...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    syslog-ng

    syslog-ng

    Log management solution that improves the performance of SIEM

    syslog-ng is the log management solution that improves the performance of your SIEM solution by reducing the amount and improving the quality of data feeding your SIEM. With syslog-ng Store Box, you can find the answer. Search billions of logs in seconds using full text queries with Boolean operators to pinpoint critical logs. syslog-ng Store Box provides secure, tamper-proof storage and custom reporting to demonstrate compliance. syslog-ng can deliver data from a wide variety of sources to Hadoop, Elasticsearch, MongoDB, and Kafka as well as many others. syslog-ng flexibly routes log data from X sources to Y destinations. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 18
    Internet Pi

    Internet Pi

    Raspberry Pi config for all things Internet

    Internet Pi is an Ansible playbook and configuration framework for a Raspberry Pi (or similar single-board computer) that transforms it into a network-infrastructure monitoring and ad-blocking unit. The typical use case is: plug the Pi into your home network, install Pi-hole for DNS-level ad-blocking and privacy, and deploy Prometheus + Grafana to monitor internet connection quality, ping latency, speed tests, and uptime trends. The intention is both practical (reduce ads, manage DNS) and...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    Datacap

    Datacap

    DataCap is integrated software for data transformation

    Datacap is an open-source data catalog and governance tool that helps organizations manage and document their data assets. It provides metadata management, lineage tracking, and collaboration features to ensure data transparency and quality. Datacap is designed for teams that need a lightweight, self-hosted solution to organize and govern their data ecosystems.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Datumaro

    Datumaro

    Dataset Management Framework, a Python library and a CLI tool to build

    ...It supports importing and exporting annotations and images across a wide variety of standards like COCO, PASCAL VOC, YOLO, ImageNet, Cityscapes, and many more, enabling easy integration with different training pipelines and tools. Datumaro makes it easy to merge datasets, split them into training/validation/test subsets, filter or transform annotations, and validate annotation quality — all while preserving metadata and supporting detailed statistics. It’s especially useful when you’re dealing with heterogeneous data sources or need to prepare complex datasets for machine learning workflows, freeing you from writing custom scripts for every format conversion.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 21
    Ember.js

    Ember.js

    A JavaScript framework for creating ambitious web applications

    ...It's designed to make building web applications a whole lot easier, with everything you need to build rich UIs that work on any device available to you right out of the box. Ember helps keep you at your most productive with its solid CLI, built-in router, fully-featured data access library called Ember Data, and many other great features. Ember also comes with a Glimmer rendering engine, one of the fastest rendering technologies on the market today. Ember has got everything that modern JS has to offer, and if you want more you can always turn to Ember's high-quality, curated community Addons to supercharge your application.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    InfiniteYou

    InfiniteYou

    Flexible Photo Recrafting While Preserving Your Identity

    ...Using an architecture built around diffusion transformers (DiTs), InfiniteYou introduces a component called InfuseNet that injects identity features derived from reference images into the generation process — via residual connections — so that the output matches the person’s identity closely, without sacrificing visual quality or text-image alignment. The team uses a multi-stage training strategy with synthetic multi-sample data per identity to fine-tune for both identity consistency and aesthetic quality. Compared to prior methods, InfiniteYou significantly improves on identity similarity, text-prompt adherence, overall image quality, and avoids common problems such as face copy-pasting artifacts.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    AG Grid

    AG Grid

    The best JavaScript Data Table for building enterprise applications

    The performance, feature set and quality of AG Grid have not been seen before in a JavaScript data grid. Many features in AG Grid are unique to AG Grid, and simply put AG Grid into a class of its own, without compromising on quality or performance. Over 600,000 monthly downloads of AG Grid Community and over 50% of the Fortune 500 using AG Grid Enterprise. AG Grid has become the JavaScript Datagrid of choice for Enterprise JavaScript developers.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    Evil Seed

    Evil Seed

    A Gem for creating partial anonymized dumps of your database

    Evil Seed is a Ruby tool for seeding databases with realistic, localized, and structured test data. It integrates with Rails and uses Faker, but allows more advanced customization like data relationships and repeatable sequences. It’s ideal for developers who need high-quality sample data for testing or demos.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    DevilutionX

    DevilutionX

    Diablo build for modern operating systems

    DevilutionX is a port of Diablo and Hellfire that strives to make it simple to run the game while providing engine improvements, bugfixes, and some optional quality-of-life features. Check out the manual for what features are available and how best to take advantage of them. You'll need access to the data from the original game. If you don't have an original CD then you can buy Diablo from GoG. Alternatively you can use spawn.mpq from the shareware version, in place of DIABDAT.MPQ, to play the shareware portion of the game. ...
    Downloads: 7 This Week
    Last Update:
    See Project
Auth0 Logo