Java Data Quality Tools

View 136 business solutions

Browse free open source Java Data Quality Tools and projects below. Use the toggles on the left to filter open source Java Data Quality Tools by OS, license, language, programming language, and project status.

  • Top-Rated Free CRM Software Icon
    Top-Rated Free CRM Software

    216,000+ customers in over 135 countries grow their businesses with HubSpot

    HubSpot is an AI-powered customer platform with all the software, integrations, and resources you need to connect your marketing, sales, and customer service. HubSpot's connected platform enables you to grow your business faster by focusing on what matters most: your customers.
  • Achieve perfect load balancing with a flexible Open Source Load Balancer Icon
    Achieve perfect load balancing with a flexible Open Source Load Balancer

    Take advantage of Open Source Load Balancer to elevate your business security and IT infrastructure with a custom ADC Solution.

    Boost application security and continuity with SKUDONET ADC, our Open Source Load Balancer, that maximizes IT infrastructure flexibility. Additionally, save up to $470 K per incident with AI and SKUDONET solutions, further enhancing your organization’s risk management and cost-efficiency strategies.
  • 1
    DataCleaner

    DataCleaner

    Data quality analysis, profiling, cleansing, duplicate detection +more

    DataCleaner is a data quality analysis application and a solution platform for DQ solutions. It's core is a strong data profiling engine, which is extensible and thereby adds data cleansing, transformations, enrichment, deduplication, matching and merging. Website: http://datacleaner.github.io
    Leader badge
    Downloads: 163 This Week
    Last Update:
    See Project
  • 2
    Open Source Data Quality and Profiling

    Open Source Data Quality and Profiling

    World's first open source data quality & data preparation project

    This project is dedicated to open source data quality and data preparation solutions. Data Quality includes profiling, filtering, governance, similarity check, data enrichment alteration, real time alerting, basket analysis, bubble chart Warehouse validation, single customer view etc. defined by Strategy. This tool is developing high performance integrated data management platform which will seamlessly do Data Integration, Data Profiling, Data Quality, Data Preparation, Dummy Data Creation, Meta Data Discovery, Anomaly Discovery, Data Cleansing, Reporting and Analytic. It also had Hadoop ( Big data ) support to move files to/from Hadoop Grid, Create, Load and Profile Hive Tables. This project is also known as "Aggregate Profiler" Resful API for this project is getting built as (Beta Version) https://sourceforge.net/projects/restful-api-for-osdq/ apache spark based data quality is getting built at https://sourceforge.net/projects/apache-spark-osdq/
    Leader badge
    Downloads: 51 This Week
    Last Update:
    See Project
  • 3
    CloverDX

    CloverDX

    Design, automate, operate and publish data pipelines at scale

    Please, visit www.cloverdx.com for latest product versions. Data integration platform; can be used to transform/map/manipulate data in batch and near-realtime modes. Suppors various input/output formats (CSV,FIXLEN,Excel,XML,JSON,Parquet, Avro,EDI/X12,HL7,COBOL,LOTUS, etc.). Connects to RDBMS/JMS/Kafka/SOAP/Rest/LDAP/S3/HTTP/FTP/ZIP/TAR. CloverDX offers 100+ specialized components which can be further extended by creation of "macros" - subgraphs - and libraries, shareable with 3rd parties. Simple data manipulation jobs can be created visually. More complex business logic can be implemented using Clover's domain-specific-language CTL, in Java or languages like Python or JavaScript. Through its DataServices functionality, it allows to quickly turn data pipelines into REST API endpoints. The platform allows to easily scale your data job across multiple cores or nodes/machines. Supports Docker/Kubernetes deployments and offers AWS/Azure images in their respective marketplace
    Downloads: 30 This Week
    Last Update:
    See Project
  • 4
    Toolsverse ETL Framework

    Toolsverse ETL Framework

    Open source Extract Transform Load engine written in Java

    ETL Framework is a standalone Extract Transform Load engine written in Java. It includes executables for all major platforms and can be easily integrated into other applications. Key Features: * embeddable, open source and free * fast and scalable * uses target database features to do transformations and loads * manual and automatic data mapping * data streaming * bulk data loads * data quality features using SQL, JavaScript? and regex * data transformations Requirements * Java 1.6 and up * At least 4 MB of RAM New in 3.2 (01/18/2013) * Improved auto-update functionality * Bug fixes
    Downloads: 4 This Week
    Last Update:
    See Project
  • Employee monitoring software with screenshots Icon
    Employee monitoring software with screenshots

    Clear visibility and insights into how employees work. Even remotely.

    Stay productive working at any distance from anywhere with Monitask.
  • 5

    MOIRAI

    Simple Scientific Workflow System for CAGE Analysis

    Cap analysis of gene expression (CAGE) is a sequencing based technology to capture the 5’ ends of RNAs in a biological sample. After mapping, a CAGE peak on the genome indicates the position of an active transcriptional start site (TSS) and the number of reads correspond to its expression level. CAGE is prominently used in both the FANTOM and ENCODE project. MOIRAI is a compact yet flexible workflow system designed to carry out the main steps in data processing and analysis of CAGE data. MOIRAI has a graphical interface allowing wet-lab researchers to create, modify and run analysis workflows. Embedded within the workflows are graphical quality control indicators allowing users assess data quality and to quickly spot potential problems. MOIRAI package comes with three main workflows allowing users to map, annotate and perform an expression analysis over multiple samples.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    AMB Data Profiling Data Quality
    AMB New Generation Data Empowerment - offers a comprehensive approach to data governance needs with ground breaking features to locate, identify, discover, manage and protect your overall data infrastructure. Repeatable Process/Exposed Repository.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    A simple little engine to do fuzzy name & address searching. Helps improve data quality and avoids duplicate data entry.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    MentDB Weak

    MentDB Weak

    Mentalese Database Engine

    Welcome to MentDB (Mentalese Database). The platform provides tools for AI, SOA, ETL, ESB, database, web application, data quality, predictive analytics, chatbot ..., in a revolutionary data language (MQL). The server is based on a new generation of AI algorithm, and on an innovative SOA layer to reach the WWD. Mentalese is the language of thought structuring the human brain. This language is able to accommodate different common languages and allows autonomy in a machine. WWD literally means 'World Wide Data'. It is a global strategy. A concept of widespread standardization of the exchange of data or intelligences between companies and softwares in the world.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    ODD Platform

    ODD Platform

    First open-source data discovery and observability platform

    Unlock the power of big data with OpenDataDiscovery Platform. Experience seamless end-to-end insights, powered by unprecedented observability and trust - from ingestion to production - while building your ideal tech stack! Democratize data and accelerate insights. Find data that fits your use case and discover hints left by your peers to leverage existing knowledge. Explore tags, ownership details, links to other sources and other information to shorten and simplify data discovery phase. Forget unnerved stakeholders and wasting too much time on digging the root cause of data issues when it fails. With ODD’s automatic company-wide ingestion-to-product lineage you’ll have answers in just seconds and stakeholders won’t need to wait. Sleep well, knowing all your data is in check. Forget manual testing, days of debugging, and weeks of worrying. Know the impact of each code change with automatic testing. Enjoy lineage and alerts powered with data quality information.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Sage Intacct Cloud Accounting and Financial Management Software Icon
    Sage Intacct Cloud Accounting and Financial Management Software

    Cloud accounting, payroll, and HR that grows with you

    Drive your organization forward with the right solution at the right price. AI-powered continuous accounting and ERP to support your growth now and into the future.
  • 10
    Open Data Profiler is a an open source, extensible data profiler, which enables users to analyze and gather automatically data quality facts on data sources in various formats (XML, JDBC or CSV).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Qualitis

    Qualitis

    Qualitis is a one-stop data quality management platform

    Qualitis is a data quality management platform that supports quality verification, notification, and management for various datasource. It is used to solve various data quality problems caused by data processing. Based on Spring Boot, Qualitis submits quality model task to Linkis platform. It provides functions such as data quality model construction, data quality model execution, data quality verification, reports of data quality generation and so on. At the same time, Qualitis provides enterprise-level features of financial-level resource isolation, management and access control. It is also guaranteed working well under high-concurrency, high-performance and high-availability scenarios.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Restful APIs for Data Cleansing

    Restful APIs for Data Cleansing

    This is sister project for osDQ which provide Restful APIs

    (Beta Version) This is sister project for https://sourceforge.net/projects/dataquality/ . It provides Restful APIs for features for data quality and data preparation features. This project will help projects which want embed data quality and data preparation features in their project or UI using restful calls. Data Cleansing APIs Dockerfile: # Pull base image FROM frnde/jetty-9.4.2-jre8-alpine-cet ADD osdq-v0.0.1.war /var/lib/jetty/webapps/osdq.war EXPOSE 8080 Docker Image https://hub.docker.com/r/vreddym/osdq-web/tags
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    WhyLogs Java Library

    WhyLogs Java Library

    Profile and monitor your ML data pipeline end-to-end

    This is a Java implementation of WhyLogs, with support for Apache Spark integration for large scale datasets. Understanding the properties of data as it moves through applications is essential to keeping your ML/AI pipeline stable and improving your user experience, whether your pipeline is built for production or experimentation. WhyLogs is an open source statistical logging library that allows data science and ML teams to effortlessly profile ML/AI pipelines and applications, producing log files that can be used for monitoring, alerts, analytics, and error analysis. WhyLogs calculates approximate statistics for datasets of any size up to TB-scale, making it easy for users to identify changes in the statistical properties of a model's inputs or outputs. Using approximate statistics allows the package to run on minimal infrastructure and monitor an entire dataset, rather than miss outliers and other anomalies by only using a sample of the data to calculate statistics.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    apache spark data pipeline osDQ

    apache spark data pipeline osDQ

    osDQ dedicated to create apache spark based data pipeline using JSON

    This is an offshoot project of open source data quality (osDQ) project https://sourceforge.net/projects/dataquality/ This sub project will create apache spark based data pipeline where JSON based metadata (file) will be used to run data processing , data pipeline , data quality and data preparation and data modeling features for big data. This uses java API of apache spark. It can run in local mode also. Get json example at https://github.com/arrahtech/osdq-spark How to run Unzip the zip file Windows : java -cp .\lib\*;osdq-spark-0.0.1.jar org.arrah.framework.spark.run.TransformRunner -c .\example\samplerun.json Mac UNIX java -cp ./lib/*:./osdq-spark-0.0.1.jar org.arrah.framework.spark.run.TransformRunner -c ./example/samplerun.json For those on windows, you need to have hadoop distribtion unzipped on local drive and HADOOP_HOME set. Also copy winutils.exe from here into HADOOP_HOME\bin
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    A Data profiling project
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next