OpenAI Privacy Filter is an open-weight machine learning model designed to detect and mask personally identifiable information in text with high efficiency and contextual awareness. It operates as a bidirectional token classification system that labels sensitive data in a single forward pass rather than generating text sequentially, enabling fast processing for large datasets. The model supports long-context inputs, allowing it to analyze extensive documents without chunking, which improves consistency in redaction tasks. It can run locally on standard hardware, ensuring that sensitive information never leaves the user’s environment and supporting privacy-first workflows. The system is fine-tunable, enabling adaptation to specific datasets or compliance requirements across industries. It identifies multiple categories of sensitive data such as names, emails, and credentials, replacing them with placeholders to preserve structure.
Features
- Detection and masking of multiple categories of sensitive information
- Bidirectional token classification for fast single-pass processing
- Support for long-context inputs up to large token windows
- Local execution for privacy-preserving workflows
- Fine-tuning capabilities for domain-specific adaptation
- Configurable precision and recall for redaction control