Data Quality Tools for Mac

View 23 business solutions

Browse free open source Data Quality tools and projects for Mac below. Use the toggles on the left to filter open source Data Quality tools by OS, license, language, programming language, and project status.

  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    iTop - IT Service Management & CMDB

    iTop - IT Service Management & CMDB

    An easy, extensible web based IT service management platform

    Whether you’re an infrastructure manager handling complex systems, a service support leader striving for customer satisfaction, or a decision-maker focused on ROI and compliance, iTop adapts to your processes to simplify your tasks, streamline operations, and enhance service quality. iTop (IT Operations Portal) by Combodo is an all-in-one, open-source ITSM platform designed to streamline IT operations. iTop offers a highly customizable, low-code Configuration Management Database (CMDB), along with advanced tools for handling requests, incidents, problems, changes, and service management. iTop is ITIL-compliant, making it ideal for organizations looking for standardized and scalable IT processes. Trusted by organizations worldwide, iTop provides a flexible, extensible solution. The platform’s source code is openly available on GitHub [https://github.com/Combodo/iTop].
    Leader badge
    Downloads: 757 This Week
    Last Update:
    See Project
  • 2
    CSV Lint

    CSV Lint

    CSV Lint plug-in for Notepad++ for syntax highlighting

    CSV Lint plug-in for Notepad++ for syntax highlighting, csv validation, automatic column and datatype detecting fixed width datasets, change datetime format, decimal separator, sort data, count unique values, convert to xml, json, sql etc. A plugin for data cleaning and working with messy data files. Use CSV Lint for metadata discovery, technical data validation, and reformatting on tabular data files. It is not meant to be a replacement for spreadsheet programs like Excel or SPSS, but rather it's a quality control tool to examine, verify or polish up a dataset before further processing.
    Downloads: 22 This Week
    Last Update:
    See Project
  • 3
    lakeFS

    lakeFS

    lakeFS - Git-like capabilities for your object storage

    Increase data quality and reduce the painful cost of errors. Data engineering best practices using git-like operations on data. lakeFS is an open-source data version control for data lakes. It enables zero-copy Dev / Test isolated environments, continuous quality validation, atomic rollback on bad data, reproducibility, and more. Data is dynamic, it changes over time. Dealing with that without a data version control system is error-prone and labor-intensive. With lakeFS, your data lake is version controlled and you can easily time-travel between consistent snapshots of the lake. Easier ETL testing - test your ETLs on top of production data, in isolation, without copying anything. Safely experiment and test on full production data. Easily Collaborate on production data with your team. Automate data quality checks within data pipelines.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 4
    Diffgram

    Diffgram

    Training data (data labeling, annotation, workflow) for all data types

    From ingesting data to exploring it, annotating it, and managing workflows. Diffgram is a single application that will improve your data labeling and bring all aspects of training data under a single roof. Diffgram is world’s first truly open source training data platform that focuses on giving its users an unlimited experience. This is aimed to reduce your data labeling bills and increase your Training Data Quality. Training Data is the art of supervising machines through data. This includes the activities of annotation, which produces structured data; ready to be consumed by a machine learning model. Annotation is required because raw media is considered to be unstructured and not usable without it. That’s why training data is required for many modern machine learning use cases including computer vision, natural language processing and speech recognition.
    Downloads: 4 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 5
    ODD Platform

    ODD Platform

    First open-source data discovery and observability platform

    Unlock the power of big data with OpenDataDiscovery Platform. Experience seamless end-to-end insights, powered by unprecedented observability and trust - from ingestion to production - while building your ideal tech stack! Democratize data and accelerate insights. Find data that fits your use case and discover hints left by your peers to leverage existing knowledge. Explore tags, ownership details, links to other sources and other information to shorten and simplify data discovery phase. Forget unnerved stakeholders and wasting too much time on digging the root cause of data issues when it fails. With ODD’s automatic company-wide ingestion-to-product lineage you’ll have answers in just seconds and stakeholders won’t need to wait. Sleep well, knowing all your data is in check. Forget manual testing, days of debugging, and weeks of worrying. Know the impact of each code change with automatic testing. Enjoy lineage and alerts powered with data quality information.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 6
    Arize Phoenix

    Arize Phoenix

    Uncover insights, surface problems, monitor, and fine tune your LLM

    Phoenix provides ML insights at lightning speed with zero-config observability for model drift, performance, and data quality. Phoenix is an Open Source ML Observability library designed for the Notebook. The toolset is designed to ingest model inference data for LLMs, CV, NLP and tabular datasets. It allows Data Scientists to quickly visualize their model data, monitor performance, track down issues & insights, and easily export to improve. Deep Learning Models (CV, LLM, and Generative) are an amazing technology that will power many of future ML use cases. A large set of these technologies are being deployed into businesses (the real world) in what we consider a production setting.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 7
    FiftyOne

    FiftyOne

    The open-source tool for building high-quality datasets

    The open-source tool for building high-quality datasets and computer vision models. Nothing hinders the success of machine learning systems more than poor-quality data. And without the right tools, improving a model can be time-consuming and inefficient. FiftyOne supercharges your machine learning workflows by enabling you to visualize datasets and interpret models faster and more effectively. Improving data quality and understanding your model’s failure modes are the most impactful ways to boost the performance of your model. FiftyOne provides the building blocks for optimizing your dataset analysis pipeline. Use it to get hands-on with your data, including visualizing complex labels, evaluating your models, exploring scenarios of interest, identifying failure modes, finding annotation mistakes, and much more! Surveys show that machine learning engineers spend over half of their time wrangling data, but it doesn't have to be that way.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    Qualitis

    Qualitis

    Qualitis is a one-stop data quality management platform

    Qualitis is a data quality management platform that supports quality verification, notification, and management for various datasource. It is used to solve various data quality problems caused by data processing. Based on Spring Boot, Qualitis submits quality model task to Linkis platform. It provides functions such as data quality model construction, data quality model execution, data quality verification, reports of data quality generation and so on. At the same time, Qualitis provides enterprise-level features of financial-level resource isolation, management and access control. It is also guaranteed working well under high-concurrency, high-performance and high-availability scenarios.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    SDGym

    SDGym

    Benchmarking synthetic data generation methods

    The Synthetic Data Gym (SDGym) is a benchmarking framework for modeling and generating synthetic data. Measure performance and memory usage across different synthetic data modeling techniques – classical statistics, deep learning and more! The SDGym library integrates with the Synthetic Data Vault ecosystem. You can use any of its synthesizers, datasets or metrics for benchmarking. You also customize the process to include your own work. Select any of the publicly available datasets from the SDV project, or input your own data. Choose from any of the SDV synthesizers and baselines. Or write your own custom machine learning model. In addition to performance and memory usage, you can also measure synthetic data quality and privacy through a variety of metrics. Install SDGym using pip or conda. We recommend using a virtual environment to avoid conflicts with other software on your device.
    Downloads: 2 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    CleanVision

    CleanVision

    Automatically find issues in image datasets

    CleanVision automatically detects potential issues in image datasets like images that are: blurry, under/over-exposed, (near) duplicates, etc. This data-centric AI package is a quick first step for any computer vision project to find problems in the dataset, which you want to address before applying machine learning. CleanVision is super simple -- run the same couple lines of Python code to audit any image dataset! The quality of machine learning models hinges on the quality of the data used to train them, but it is hard to manually identify all of the low-quality data in a big dataset. CleanVision helps you automatically identify common types of data issues lurking in image datasets. This package currently detects issues in the raw images themselves, making it a useful tool for any computer vision task such as: classification, segmentation, object detection, pose estimation, keypoint detection, generative modeling, etc.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    Dagster

    Dagster

    An orchestration platform for the development, production

    Dagster is an orchestration platform for the development, production, and observation of data assets. Dagster as a productivity platform: With Dagster, you can focus on running tasks, or you can identify the key assets you need to create using a declarative approach. Embrace CI/CD best practices from the get-go: build reusable components, spot data quality issues, and flag bugs early. Dagster as a robust orchestration engine: Put your pipelines into production with a robust multi-tenant, multi-tool engine that scales technically and organizationally. Dagster as a unified control plane: The ‘single plane of glass’ data teams love to use. Rein in the chaos and maintain control over your data as the complexity scales. Centralize your metadata in one tool with built-in observability, diagnostics, cataloging, and lineage. Spot any issues and identify performance improvement opportunities.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    DataCleaner

    DataCleaner

    Data quality analysis, profiling, cleansing, duplicate detection +more

    DataCleaner is a data quality analysis application and a solution platform for DQ solutions. It's core is a strong data profiling engine, which is extensible and thereby adds data cleansing, transformations, enrichment, deduplication, matching and merging. Website: http://datacleaner.github.io
    Downloads: 6 This Week
    Last Update:
    See Project
  • 13
    Open Source Data Quality and Profiling

    Open Source Data Quality and Profiling

    World's first open source data quality & data preparation project

    This project is dedicated to open source data quality and data preparation solutions. Data Quality includes profiling, filtering, governance, similarity check, data enrichment alteration, real time alerting, basket analysis, bubble chart Warehouse validation, single customer view etc. defined by Strategy. This tool is developing high performance integrated data management platform which will seamlessly do Data Integration, Data Profiling, Data Quality, Data Preparation, Dummy Data Creation, Meta Data Discovery, Anomaly Discovery, Data Cleansing, Reporting and Analytic. It also had Hadoop ( Big data ) support to move files to/from Hadoop Grid, Create, Load and Profile Hive Tables. This project is also known as "Aggregate Profiler" Resful API for this project is getting built as (Beta Version) https://sourceforge.net/projects/restful-api-for-osdq/ apache spark based data quality is getting built at https://sourceforge.net/projects/apache-spark-osdq/
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    COBOL Data Definitions
    Parse, analyze and -- most importantly -- use COBOL data definitions. This gives you access to COBOL data from Python programs. Write data analyzers, one-time data conversion utilities and Python programs that are part of COBOL systems. Really.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 15
    Medrechaincode

    Medrechaincode

    Lifetime medical records in decentralized chain code-network

    Medrechaincode uses a blockchain technology to store patient life time medical records in decentralized node with single true version of patient medical records” Medrechaincode provides a consensus based access of patient medical record to different healthcare professionals like R&D labs, doctors, hospital & pharmacist. This platform will also help the healthcare professional to use integrated data quality tool to remove duplicity of patient records , profiling of medical records, metadata discovery , data cleansing , classification of medical records, bucketization, anomaly discovery to find out the anomalous trend of medical records. m This platform will help patient to reduced their consultation time, monetize their medical records, for labs to get the historical records & come up with new medicine and make a clinical trial.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    MentDB Projects

    MentDB Projects

    Generalized Interoperability and Strong AI

    MentDB is an open-source platform driving research into next-generation AI and universal data exchange. Our architecture is built around the revolutionary Mentalese Query Language (MQL). MentDB Weak (Generalized Interoperability): A unified data layer enabling seamless data exchange and application integration (SOA, ETL, Data Quality). We eliminate data silos through a single, generalized data language. MentDB Strong (Strong AI / AGI): The framework for exploring and building Machine Consciousness, free will, and advanced ethical reasoning systems. Based on new-generation AI algorithms.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    SolexaQA is a software to calculate quality statistics and visual representations of data quality for second-generation sequencing data.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    Apache Airflow Provider

    Apache Airflow Provider

    Great Expectations Airflow operator

    Due to apply_default decorator removal, this version of the provider requires Airflow 2.1.0+. If your Airflow version is 2.1.0, and you want to install this provider version, first upgrade Airflow to at least version 2.1.0. Otherwise, your Airflow package version will be upgraded automatically, and you will have to manually run airflow upgrade db to complete the migration. This operator currently works with the Great Expectations V3 Batch Request API only. If you would like to use the operator in conjunction with the V2 Batch Kwargs API, you must use a version below 0.1.0. This operator uses Great Expectations Checkpoints instead of the former ValidationOperators. Because of the above, this operator requires Great Expectations >=v0.13.9, which is pinned in the requirements.txt starting with release 0.0.5.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19

    Arthropod Easy Capture

    An arthropod specific, specimen level data capture application

    Arthropod Easy Capture (AEC) is an arthropod specific, open-source solution for handling specimen level host interactions. Developed in conjunction with the Plant Bugs Planetary Biodiversity, Tri-Tropic Interactions TCN, and Bee Database projects, AEC is designed for rapid, and accurate data capture, utilizing controlled vocabularies to maintain data quality. The application is Web-based, allowing for collaboration from multiple partners to a centralized, easily maintainable, database. The AEC community, as part of the Advancing Digitization of Biodiversity Collections (ADBC) program, is already tasked to develop the appropriate Web service for sharing data with the iDigBio data portal. This includes the application of Globally Unique Identifiers (GUIDs) for specimens, and mapping relevant data fields to DarwinCore. iDigBio is the aggregator for all of the ADBC Thematic Collection Network projects, including trophic level interaction data from the Tri-Tropic Interactions TCN.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20

    BMDExpress Data Viewer

    A Visualization Tool to Analyze BMDExpress Datasets

    Regulatory agencies increasingly apply benchmark dose (BMD) modeling to determine points of departure for risk assessment. BMDExpress applies BMD modeling to transcriptomic datasets to identify transcriptional BMDs. However, graphing and analytical capabilities within BMDExpress are limited, and the analysis of output files is challenging. We developed a web-based application, BMDExpress Data Viewer for visualizing and graphing BMDExpress output files. BMDExpress Data Viewer is a useful tool to visualize, explore and analyze BMDExpress output files. Visualizing the data in this manner enables rapid assessment of data quality, model fit, doses of peak activity, most sensitive pathway perturbations and other metrics that will be useful in applying toxicogenomics in risk assessment. Tool Link: http://apps.sciome.com:8082/BMDX_Viewer/ Publication Link: http://onlinelibrary.wiley.com/doi/10.1002/jat.3265/abstract
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Web-GUI for a benchmarking database which is based on the EFQM Framework for Corporate Data Quality Management.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Cleanlab

    Cleanlab

    The standard data-centric AI package for data quality and ML

    cleanlab helps you clean data and labels by automatically detecting issues in a ML dataset. To facilitate machine learning with messy, real-world data, this data-centric AI package uses your existing models to estimate dataset problems that can be fixed to train even better models. cleanlab cleans your data's labels via state-of-the-art confident learning algorithms, published in this paper and blog. See some of the datasets cleaned with cleanlab at labelerrors.com. This package helps you find label issues and other data issues, so you can train reliable ML models. All features of cleanlab work with any dataset and any model. Yes, any model: PyTorch, Tensorflow, Keras, JAX, HuggingFace, OpenAI, XGBoost, scikit-learn, etc. If you use a sklearn-compatible classifier, all cleanlab methods work out-of-the-box.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    CloverDX

    CloverDX

    Design, automate, operate and publish data pipelines at scale

    Please, visit www.cloverdx.com for latest product versions. Data integration platform; can be used to transform/map/manipulate data in batch and near-realtime modes. Suppors various input/output formats (CSV,FIXLEN,Excel,XML,JSON,Parquet, Avro,EDI/X12,HL7,COBOL,LOTUS, etc.). Connects to RDBMS/JMS/Kafka/SOAP/Rest/LDAP/S3/HTTP/FTP/ZIP/TAR. CloverDX offers 100+ specialized components which can be further extended by creation of "macros" - subgraphs - and libraries, shareable with 3rd parties. Simple data manipulation jobs can be created visually. More complex business logic can be implemented using Clover's domain-specific-language CTL, in Java or languages like Python or JavaScript. Through its DataServices functionality, it allows to quickly turn data pipelines into REST API endpoints. The platform allows to easily scale your data job across multiple cores or nodes/machines. Supports Docker/Kubernetes deployments and offers AWS/Azure images in their respective marketplace
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    DataQualityDashboard

    DataQualityDashboard

    A tool to help improve data quality standards in data science

    The goal of the Data Quality Dashboard (DQD) project is to design and develop an open-source tool to expose and evaluate observational data quality. This package will run a series of data quality checks against an OMOP CDM instance (currently supports v5.4, v5.3 and v5.2). It systematically runs the checks, evaluates the checks against some pre-specified threshold, and then communicates what was done in a transparent and easily understandable way. The quality checks were organized according to the Kahn Framework1 which uses a system of categories and contexts that represent strategies for assessing data quality. Using this framework, the Data Quality Dashboard takes a systematic-based approach to running data quality checks. Instead of writing thousands of individual checks, we use “data quality check types”. These “check types” are more general, parameterized data quality checks into which OMOP tables, fields, and concepts can be substituted to represent a singular data quality idea.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    EPRI Open PQ Dashboard

    EPRI Open PQ Dashboard

    Demos new techniques for extracting information from PQ data files

    Open PQ Dashboard version 1.0 provides visual displays to quickly convey the status and location of power quality (PQ) anomalies throughout the electrical power system. Summary displays starts with the choice of a geospatial map-view or annunciator panel, both with unique visualizations for across-the-room visualizations fit for a PQ operations center. Drill-downs are in place for various statistics and guide users all the way down to the waveform level. This version consist of a few proof-of-concept applications of applying event severity and trend values to heatmap displays—giving the PQ engineers a wide-area status of PQ for quick interpretation. Data quality has been added so users can quickly see when meters are providing incomplete or invalid data. This dashboard currently accepts power quality data from COMTRADE and PQDIF standard file formats. Other proprietary software interfaces have been added. See the installation manual for more details.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB