Compare the Top Data De-Identification Tools that integrate with Hadoop as of September 2025

This a list of Data De-Identification tools that integrate with Hadoop. Use the filters on the left to add additional filters for products that have integrations with Hadoop. View the products that work with Hadoop in the table below.

What are Data De-Identification Tools for Hadoop?

Data de-identification tools are designed to remove potentially identifiable information from datasets. These tools can be used to ensure that data is anonymized and compliant with data privacy regulations, such as GDPR. Data de-identification methods typically involve techniques like suppressing or masking of certain pieces of data. Other methods like pseudonymization, tokenization, and randomization may also be used in order to completely obfuscate the original data while still allowing analysis of the remaining dataset. Furthermore, some advanced data de-identification software includes additional features for monitoring access and preventing unauthorized use of sensitive personal information. In summary, data de-identification tools provide organizations with ways to ensure compliance by removing personally identifiable information from their datasets before sharing or publishing them publicly. Compare and read user reviews of the best Data De-Identification tools for Hadoop currently available using the table below. This list is updated regularly.

  • 1
    PHEMI Health DataLab
    The PHEMI Trustworthy Health DataLab is a unique, cloud-based, integrated big data management system that allows healthcare organizations to enhance innovation and generate value from healthcare data by simplifying the ingestion and de-identification of data with NSA/military-grade governance, privacy, and security built-in. Conventional products simply lock down data, PHEMI goes further, solving privacy and security challenges and addressing the urgent need to secure, govern, curate, and control access to privacy-sensitive personal healthcare information (PHI). This improves data sharing and collaboration inside and outside of an enterprise—without compromising the privacy of sensitive information or increasing administrative burden. PHEMI Trustworthy Health DataLab can scale to any size of organization, is easy to deploy and manage, connects to hundreds of data sources, and integrates with popular data science and business analysis tools.
  • 2
    Informatica Persistent Data Masking
    Retain context, form, and integrity while preserving privacy. Enhance data protection by de-sensitizing and de-identifying sensitive data, and pseudonymize data for privacy compliance and analytics. Obscured data retains context and referential integrity remain consistent, so the masked data can be used in testing, analytics, or support environments. As a highly scalable, high-performance data masking solution, Informatica Persistent Data Masking shields confidential data—such as credit card numbers, addresses, and phone numbers—from unintended exposure by creating realistic, de-identified data that can be shared safely internally or externally. It also allows you to reduce the risk of data breaches in nonproduction environments, produce higher-quality test data and streamline development projects, and ensure compliance with data-privacy mandates and regulations.
  • 3
    IBM InfoSphere Optim Data Privacy
    IBM InfoSphere® Optim™ Data Privacy provides extensive capabilities to effectively mask sensitive data across non-production environments, such as development, testing, QA or training. To protect confidential data this single offering provides a variety of transformation techniques that substitute sensitive information with realistic, fully functional masked data. Examples of masking techniques include substrings, arithmetic expressions, random or sequential number generation, date aging, and concatenation. The contextually accurate masking capabilities help masked data retain a similar format to the original information. Apply a range of masking techniques on-demand to transform personally-identifying information and confidential corporate data in applications, databases and reports. Data masking features help you to prevent misuse of information by masking, obfuscating, and privatizing personal information that is disseminated across non-production environments.
  • 4
    Informatica Dynamic Data Masking
    Your IT organization can apply sophisticated masking to limit sensitive data access with flexible data masking rules based on a user’s authentication level. Blocking, auditing, and alerting your users, IT personnel, and outsourced teams who access sensitive information, it ensures compliance with your security policies and industry and civil privacy regulations. Easily customize data-masking solutions for different regulatory or business requirements. Protect personal and sensitive information while supporting offshoring, outsourcing, and cloud-based initiatives. Secure big data by dynamically masking sensitive data in Hadoop.
  • 5
    Mage Platform

    Mage Platform

    Mage Data

    Mage Data™ is the leading solutions provider of data security and data privacy software for global enterprises. Built upon a patented and award-winning solution, the Mage platform enables organizations to stay on top of privacy regulations while ensuring security and privacy of data. Top Swiss Banks, Fortune 10 organizations, Ivy League Universities, and Industry Leaders in the financial and healthcare businesses protect their sensitive data with the Mage platform for Data Privacy and Security. Deploying state-of-the-art privacy enhancing technologies for securing data, Mage Data™ delivers robust data security while ensuring privacy of individuals. Visit the website to explore the company’s solutions.
  • Previous
  • You're on page 1
  • Next