Best Data De-Identification Tools for Apache Spark

Compare the Top Data De-Identification Tools that integrate with Apache Spark as of December 2025

This a list of Data De-Identification tools that integrate with Apache Spark. Use the filters on the left to add additional filters for products that have integrations with Apache Spark. View the products that work with Apache Spark in the table below.

What are Data De-Identification Tools for Apache Spark?

Data de-identification tools are designed to remove potentially identifiable information from datasets. These tools can be used to ensure that data is anonymized and compliant with data privacy regulations, such as GDPR. Data de-identification methods typically involve techniques like suppressing or masking of certain pieces of data. Other methods like pseudonymization, tokenization, and randomization may also be used in order to completely obfuscate the original data while still allowing analysis of the remaining dataset. Furthermore, some advanced data de-identification software includes additional features for monitoring access and preventing unauthorized use of sensitive personal information. In summary, data de-identification tools provide organizations with ways to ensure compliance by removing personally identifiable information from their datasets before sharing or publishing them publicly. Compare and read user reviews of the best Data De-Identification tools for Apache Spark currently available using the table below. This list is updated regularly.

  • 1
    Protegrity

    Protegrity

    Protegrity

    Our platform allows businesses to use data—including its application in advanced analytics, machine learning, and AI—to do great things without worrying about putting customers, employees, or intellectual property at risk. The Protegrity Data Protection Platform doesn't just secure data—it simultaneously classifies and discovers data while protecting it. You can't protect what you don't know you have. Our platform first classifies data, allowing users to categorize the type of data that can mostly be in the public domain. With those classifications established, the platform then leverages machine learning algorithms to discover that type of data. Classification and discovery finds the data that needs to be protected. Whether encrypting, tokenizing, or applying privacy methods, the platform secures the data behind the many operational systems that drive the day-to-day functions of business, as well as the analytical systems behind decision-making.
  • 2
    PHEMI Health DataLab
    The PHEMI Trustworthy Health DataLab is a unique, cloud-based, integrated big data management system that allows healthcare organizations to enhance innovation and generate value from healthcare data by simplifying the ingestion and de-identification of data with NSA/military-grade governance, privacy, and security built-in. Conventional products simply lock down data, PHEMI goes further, solving privacy and security challenges and addressing the urgent need to secure, govern, curate, and control access to privacy-sensitive personal healthcare information (PHI). This improves data sharing and collaboration inside and outside of an enterprise—without compromising the privacy of sensitive information or increasing administrative burden. PHEMI Trustworthy Health DataLab can scale to any size of organization, is easy to deploy and manage, connects to hundreds of data sources, and integrates with popular data science and business analysis tools.
  • 3
    Tonic

    Tonic

    Tonic

    Tonic automatically creates mock data that preserves key characteristics of secure datasets so that developers, data scientists, and salespeople can work conveniently without breaching privacy. Tonic mimics your production data to create de-identified, realistic, and safe data for your test environments. With Tonic, your data is modeled from your production data to help you tell an identical story in your testing environments. Safe, useful data created to mimic your real-world data, at scale. Generate data that looks, acts, and feels just like your production data and safely share it across teams, businesses, and international borders. PII/PHI identification, obfuscation, and transformation. Proactively protect your sensitive data with automatic scanning, alerts, de-identification, and mathematical guarantees of data privacy. Advanced sub setting across diverse database types. Collaboration, compliance, and data workflows — perfectly automated.
  • 4
    Mage Platform

    Mage Platform

    Mage Data

    Mage Data™ is the leading solutions provider of data security and data privacy software for global enterprises. Built upon a patented and award-winning solution, the Mage platform enables organizations to stay on top of privacy regulations while ensuring security and privacy of data. Top Swiss Banks, Fortune 10 organizations, Ivy League Universities, and Industry Leaders in the financial and healthcare businesses protect their sensitive data with the Mage platform for Data Privacy and Security. Deploying state-of-the-art privacy enhancing technologies for securing data, Mage Data™ delivers robust data security while ensuring privacy of individuals. Visit the website to explore the company’s solutions.
  • Previous
  • You're on page 1
  • Next