Compare the Top Data Cleansing Software that integrates with Hadoop as of September 2025

This a list of Data Cleansing software that integrates with Hadoop. Use the filters on the left to add additional filters for products that have integrations with Hadoop. View the products that work with Hadoop in the table below.

What is Data Cleansing Software for Hadoop?

Data cleansing software uses specific algorithms in order to search for anomalies across data sets with the purpose of correcting them. Compare and read user reviews of the best Data Cleansing software for Hadoop currently available using the table below. This list is updated regularly.

  • 1
    Composable DataOps Platform

    Composable DataOps Platform

    Composable Analytics

    Composable is an enterprise-grade DataOps platform built for business users that want to architect data intelligence solutions and deliver operational data-driven products leveraging disparate data sources, live feeds, and event data regardless of the format or structure of the data. With a modern, intuitive dataflow visual designer, built-in services to facilitate data engineering, and a composable architecture that enables abstraction and integration of any software or analytical approach, Composable is the leading integrated development environment to discover, manage, transform and analyze enterprise data.
    Starting Price: $8/hr - pay-as-you-go
  • 2
    Zuar Runner

    Zuar Runner

    Zuar, Inc.

    Utilizing the data that's spread across your organization shouldn't be so difficult! With Zuar Runner you can automate the flow of data from hundreds of potential sources into a single destination. Collect, transform, model, warehouse, report, monitor and distribute: it's all managed by Zuar Runner. Pull data from Amazon/AWS products, Google products, Microsoft products, Avionte, Backblaze, BioTrackTHC, Box, Centro, Citrix, Coupa, DigitalOcean, Dropbox, CSV, Eventbrite, Facebook Ads, FTP, Firebase, Fullstory, GitHub, Hadoop, Hubic, Hubspot, IMAP, Jenzabar, Jira, JSON, Koofr, LeafLogix, Mailchimp, MariaDB, Marketo, MEGA, Metrc, OneDrive, MongoDB, MySQL, Netsuite, OpenDrive, Oracle, Paycom, pCloud, Pipedrive, PostgreSQL, put.io, Quickbooks, RingCentral, Salesforce, Seafile, Shopify, Skybox, Snowflake, Sugar CRM, SugarSync, Tableau, Tamarac, Tardigrade, Treez, Wurk, XML Tables, Yandex Disk, Zendesk, Zoho, and more!
  • 3
    Ataccama ONE
    Ataccama reinvents the way data is managed to create value on an enterprise scale. Unifying Data Governance, Data Quality, and Master Data Management into a single, AI-powered fabric across hybrid and Cloud environments, Ataccama gives your business and data teams the ability to innovate with unprecedented speed while maintaining trust, security, and governance of your data.
  • 4
    IRI Voracity

    IRI Voracity

    IRI, The CoSort Company

    Voracity is the only high-performance, all-in-one data management platform accelerating AND consolidating the key activities of data discovery, integration, migration, governance, and analytics. Voracity helps you control your data in every stage of the lifecycle, and extract maximum value from it. Only in Voracity can you: 1) CLASSIFY, profile and diagram enterprise data sources 2) Speed or LEAVE legacy sort and ETL tools 3) MIGRATE data to modernize and WRANGLE data to analyze 4) FIND PII everywhere and consistently MASK it for referential integrity 5) Score re-ID risk and ANONYMIZE quasi-identifiers 6) Create and manage DB subsets or intelligently synthesize TEST data 7) Package, protect and provision BIG data 8) Validate, scrub, enrich and unify data to improve its QUALITY 9) Manage metadata and MASTER data. Use Voracity to comply with data privacy laws, de-muck and govern the data lake, improve the reliability of your analytics, and create safe, smart test data
  • 5
    Informatica MDM

    Informatica MDM

    Informatica

    Our market-leading, multidomain solution supports any master data domain, implementation style, and use case, in the cloud or on premises. Integrates best-in-class data integration, data quality, business process management, and data privacy. Tackle complex issues head-on with trusted views of business-critical master data. Automatically link master, transaction, and interaction data relationships across master data domains. Increase accuracy of data records with contact data verification, B2B, and B2C enrichment services. Update multiple master data records, dynamic data models, and collaborative workflows with one click. Reduce maintenance costs and speed deployment with AI-powered match tuning and rule recommendations. Increase productivity using search and pre-configured, highly granular charts and dashboards. Create high-quality data that helps you improve business outcomes with trusted, relevant information.
  • 6
    Talend Data Fabric
    Talend Data Fabric’s suite of cloud services efficiently handles all your integration and integrity challenges — on-premises or in the cloud, any source, any endpoint. Deliver trusted data at the moment you need it — for every user, every time. Ingest and integrate data, applications, files, events and APIs from any source or endpoint to any location, on-premise and in the cloud, easier and faster with an intuitive interface and no coding. Embed quality into data management and guarantee ironclad regulatory compliance with a thoroughly collaborative, pervasive and cohesive approach to data governance. Make the most informed decisions based on high quality, trustworthy data derived from batch and real-time processing and bolstered with market-leading data cleaning and enrichment tools. Get more value from your data by making it available internally and externally. Extensive self-service capabilities make building APIs easy— improve customer engagement.
  • 7
    matchit

    matchit

    360Science

    The foundation of our matching software, matchit® is designed specifically to deliver results that mirror human-like perception, at scale and without preprocessing. Using Artificial Intelligence, a proprietary phonetic algorithm, lexicons, and a contextual scoring engine, matchit defeats the errors, inconsistencies, and challenges commonly found in contact and business data. Conventional matching solutions require a user to define matching logic, which is a combination of functions and off-the-shelf fuzzy algorithms, used to produce an alphanumeric value. This alphanumeric value, or ‘match key’, forms the basis for comparing two records together and ultimately finding matches. Unlike conventional matching solutions, matchit doesn’t rely on a single comparison between match keys to find a match. Instead, matchit evaluates records contextually, running a variety of comparisons and scoring them individually to grade similarity between all the relevant elements that make up your data.
  • Previous
  • You're on page 1
  • Next