Showing 195 open source projects for "data quality"

View related business solutions
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • 1
    pointblank

    pointblank

    Data quality assessment and metadata reporting for data frames

    With the pointblank package it’s really easy to methodically validate your data whether in the form of data frames or as database tables. On top of the validation toolset, the package gives you the means to provide and keep up-to-date with the information that defines your tables. For table validation, the agent object works with a large collection of simple (yet powerful!) validation functions. We can enable much more sophisticated validation checks by using custom expressions, segmenting...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Union Pandera

    Union Pandera

    Light-weight, flexible, expressive statistical data testing library

    ...Validate the functions that produce your data by automatically generating test cases for them. Integrate seamlessly with the Python ecosystem. Overcome the initial hurdle of defining a schema by inferring one from clean data, then refine it over time. Identify the critical points in your data pipeline, and validate data going in and out of them. Build confidence in the quality of your data by defining schemas for complex data objects.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    Best-of Python

    Best-of Python

    A ranked list of awesome Python open-source libraries

    ...Correctly generate plurals, ordinals, indefinite articles; convert numbers. Libraries for loading, collecting, and extracting data from a variety of data sources and formats. Libraries for data batch- and stream-processing, workflow automation, job scheduling, and other data pipeline tasks.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    Datumaro

    Datumaro

    Dataset Management Framework, a Python library and a CLI tool to build

    ...It supports importing and exporting annotations and images across a wide variety of standards like COCO, PASCAL VOC, YOLO, ImageNet, Cityscapes, and many more, enabling easy integration with different training pipelines and tools. Datumaro makes it easy to merge datasets, split them into training/validation/test subsets, filter or transform annotations, and validate annotation quality — all while preserving metadata and supporting detailed statistics. It’s especially useful when you’re dealing with heterogeneous data sources or need to prepare complex datasets for machine learning workflows, freeing you from writing custom scripts for every format conversion.
    Downloads: 3 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    ggpubr

    ggpubr

    'ggplot2' Based Publication Ready Plots

    ggpubr is an R package that provides easy-to-use wrapper functions around ggplot2 to create publication-ready visualizations with minimal code. It streamlines plot creation for researchers and analysts, allowing features such as statistical annotation, theme customization, and plot arrangement with fewer lines of code.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Compose.jl

    Compose.jl

    Declarative vector graphics

    Compose is a vector graphics library for Julia. It forms the basis for the statistical graphics system Gadfly. Compose is a declarative vector graphics system written in Julia. It's designed to simplify the creation of complex graphics and serves as the basis of the Gadfly data visualization package.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    GeoNode

    GeoNode

    GeoNode is an open source platform for geospatial data

    ...Social features like user profiles and commenting and rating systems allow for the development of communities around each platform to facilitate the use, management, and quality control of the data the GeoNode instance contains. It is also designed to be a flexible platform that software developers can extend, modify or integrate against to meet requirements in their own applications.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    Matplot++

    Matplot++

    Matplot++: A C++ Graphics Library for Data Visualization

    Data visualization can help programmers and scientists identify trends in their data and efficiently communicate these results with their peers. Modern C++ is being used for a variety of scientific applications, and this environment can benefit considerably from graphics libraries that attend the typical design goals toward scientific data visualization.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    Apache InLong

    Apache InLong

    Apache InLong - a one-stop integration framework for massive data

    ...InLong was originally built at Tencent, which has served online businesses for more than 8 years, to support massive data (data scale of more than 80 trillion pieces of data per day) reporting services in big data scenarios. The entire platform has integrated 5 modules: Ingestion, Convergence, Caching, Sorting, and Management, so that the business only needs to provide data sources, data service quality, data landing clusters and data landing formats.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 10
    Matplotlib

    Matplotlib

    matplotlib: plotting with Python

    Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. Matplotlib makes easy things easy and hard things possible. Matplotlib ships with several add-on toolkits, including 3D plotting with mplot3d, axes helpers in axes_grid1 and axis helpers in axisartist. A large number of third party packages extend and build on Matplotlib functionality, including several higher-level plotting interfaces (seaborn, HoloViews, ggplot, ...), and a...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 11
    gramm

    gramm

    Gramm is a complete data visualization toolbox for Matlab

    Gramm is a MATLAB toolbox that enables the rapid creation of complex, publication-quality figures. Its design philosophy focuses on a declarative approach, where users specify the desired end result, as opposed to the traditional imperative method involving for loops, if/else statements, etc. The MATLAB implementation of gramm is inspired by the "grammar of graphics" principles (Wilkinson 1999) and the ggplot2 library for R by Hadley Wickham. As a reference to this inspiration, gramm stands...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Backstage

    Backstage

    Backstage is an open platform for building developer portals

    ...Instead of restricting autonomy, standardization frees your engineers from infrastructure complexity. So you can return to building and scaling, quickly and safely. Every team can see all the services they own and related resources (deployments, data pipelines, pull request status, etc.)
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    ColorSchemes.jl

    ColorSchemes.jl

    colorschemes, colormaps, gradients, and palettes

    Color schemes, colormaps, gradients, and palettes. Choose ColorSchemes with care. Refer to Peter Kovesi's PerceptualColourMaps package, or to Fabio Crameri's Scientific Colour Maps for more information. If you want to make more advanced ColorSchemes, use linear-segment dictionaries or indexed lists, and use functions to generate color values, see the make_colorscheme() function in the ColorSchemeTools.jl package.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    PGFPlotsX.jl

    PGFPlotsX.jl

    Plots in Julia using the PGFPlots LaTeX package

    PGFPlotsX is a Julia package to generate publication quality figures using the LaTeX library PGFPlots. It is similar in spirit to the package PGFPlots.jl but it tries to have a very close mapping to the PGFPlots API as well as minimize the number of dependencies. The fact that the syntax is similar to the TeX version means that examples from Stack Overflow and the PGFPlots manual can easily be incorporated in the Julia code.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    CocoIndex

    CocoIndex

    ETL framework to index data for AI, such as RAG

    ...It lets users index and retrieve content based on meaning rather than keywords, making it ideal for modern AI-based search applications. CocoIndex leverages vector embeddings and integrates with various models and frameworks, including OpenAI and Hugging Face, to provide high-quality semantic understanding. It’s built for transparency, ease of use, and local control over your search data, distinguishing itself from closed, black-box systems. The tool is suitable for developers working on personal knowledge bases, AI search interfaces, or private LLM applications.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    awesome-single-cell

    awesome-single-cell

    Community-curated list of software packages and data resources

    Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc. List of software packages (and the people developing these methods) for single-cell data analysis, including RNA-seq, ATAC-seq, etc. Rapid, accurate and memory-frugal preprocessing of single-cell and single-nucleus RNA-seq data. Find bimodal, unimodal, and multimodal features in your data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    DynamicalSystems.jl

    DynamicalSystems.jl

    Award winning software library for nonlinear dynamics timeseries

    DynamicalSystems.jl is an award-winning Julia software library for nonlinear dynamics and nonlinear time series analysis. To install DynamicalSystems.jl, run import Pkg; Pkg.add("DynamicalSystems"). To learn how to use it and see its contents visit the documentation, which you can either find online or build locally by running the docs/make.jl file. DynamicalSystems.jl is part of JuliaDynamics, an organization dedicated to creating high-quality scientific software. All implemented algorithms...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    plotly.js

    plotly.js

    JavaScript charting library behind Plotly and Dash

    Plotly JavaScript Open Source Graphing Library. Built on top of d3.js and stack.gl, Plotly.js is a high-level, declarative charting library. plotly.js ships with over 40 chart types, including 3D charts, statistical graphs, and SVG maps. plotly.js is free and open source and you can view the source, report issues or contribute on GitHub. For plotly.js to build with Webpack you will need to install ify-loader@v1.1.0+ and add it to your webpack.config.json. This adds Browserify transform...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 19
    tsfresh

    tsfresh

    Automatic extraction of relevant features from time series

    tsfresh is a python package. It automatically calculates a large number of time series characteristics, the so called features. tsfresh is used to to extract characteristics from time series. Without tsfresh, you would have to calculate all characteristics by hand. With tsfresh this process is automated and all your features can be calculated automatically. Further tsfresh is compatible with pythons pandas and scikit-learn APIs, two important packages for Data Science endeavours in python....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    TIGRE

    TIGRE

    TIGRE: Tomographic Iterative GPU-based Reconstruction Toolbox

    TIGRE is an open-source toolbox for fast and accurate 3D tomographic reconstruction for any geometry. Its focus is on iterative algorithms for improved image quality that have all been optimized to run on GPUs (including multi-GPUs) for improved speed. It combines the higher-level abstraction of MATLAB or Python with the performance of CUDA at a lower level in order to make it both fast and easy to use. TIGRE is free to download and distribute: use it, modify it, add to it, and share it. Our...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    data-diff

    data-diff

    Efficiently diff rows across two different databases

    ...Replicating data at scale, across hundreds of tables, with low latency and at a reasonable infrastructure cost is a hard problem, and most data teams we’ve talked to, have faced data quality issues in their replication processes. The hard truth is that the quality of the replication is the quality of the data. Since copying entire datasets in batch is often infeasible at the modern data scale, businesses rely on the Change Data Capture (CDC) approach of replicating data using a continuous stream of updates.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Encord Active

    Encord Active

    The toolkit to test, validate, and evaluate your models and surface

    Encord Active is an open-source toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling to supercharge model performance. Encord Active has been designed as a all-in-one open source toolkit for improving your data quality and model performance. Use the intuitive UI to explore your data or access all the functionalities programmatically. Discover errors, outliers, and edge-cases within your data - all in one open source toolkit. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    iTop - IT Service Management & CMDB

    iTop - IT Service Management & CMDB

    An easy, extensible web based IT service management platform

    Whether you’re an infrastructure manager handling complex systems, a service support leader striving for customer satisfaction, or a decision-maker focused on ROI and compliance, iTop adapts to your processes to simplify your tasks, streamline operations, and enhance service quality. iTop (IT Operations Portal) by Combodo is an all-in-one, open-source ITSM platform designed to streamline IT operations. iTop offers a highly customizable, low-code Configuration Management Database (CMDB),...
    Leader badge
    Downloads: 791 This Week
    Last Update:
    See Project
  • 24
    QUAST

    QUAST

    Quality Assessment Tool for Genome Assemblies

    QUAST performs fast and convenient quality evaluation and comparison of genome assemblies. It is maintained by the Gurevich lab at HIPS (https://helmholtz-hips.de/en/hmsb). For the most up-to-date description, please visit http://quast.sf.net. Below are just some highlights. QUAST computes several well-known metrics, including contig accuracy, the number of genes discovered, N50, and others, as well as introducing new ones, like NA50 (see details in the paper and manual). A...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 25
    a R package. 'Qair' gives access to data bases used by air quality monitoring associations in France. It also contains functions to manipualte air quality datas.
    Downloads: 1 This Week
    Last Update:
    See Project
Auth0 Logo