Download Latest Version OpenMed v1.4.0 source code.tar.gz (15.9 MB)
Email in envelope

Get an email when there's a new version of OpenMed

Home / v1.3.0
Name Modified Size InfoDownloads / Week
Parent folder
OpenMed v1.3.0 source code.tar.gz 2026-04-29 8.9 MB
OpenMed v1.3.0 source code.zip 2026-04-29 9.1 MB
README.md 2026-04-29 12.8 kB
Totals: 3 Items   18.1 MB 0

OpenMed v1.3.0 is the privacy and anonymization release.

CleanShot 2026-04-29 at 10 03 53@2x

This release turns PII handling into a more complete cross-platform workflow: Faker-backed obfuscation, deterministic surrogates, a canonical PII label taxonomy, unified Privacy Filter routing across MLX and PyTorch, Nemotron-PII Privacy Filter artifacts, and a new interactive Privacy Filter Studio.

The headline: OpenMed can now detect, mask, remove, hash, date-shift, or realistically replace identifiers with locale-aware surrogates, while using the same extract_pii() / deidentify() API on Apple Silicon, Linux, Windows, and service deployments.

Highlights

  • Added a Faker-backed anonymization engine for method="replace".
  • Added deterministic, locale-aware, and format-preserving surrogate generation.
  • Added a canonical PII label taxonomy that normalizes English, Portuguese, and BIOES-tagged Privacy Filter labels into one stable label set.
  • Added unified Privacy Filter backend routing: MLX on Apple Silicon, PyTorch everywhere else.
  • Added PyTorch support for the OpenAI Privacy Filter family through PrivacyFilterTorchPipeline.
  • Added Nemotron-PII Privacy Filter artifacts for PyTorch and MLX.
  • Added family-aware fallback so Nemotron MLX model names resolve to the Nemotron PyTorch checkpoint on non-MLX hosts.
  • Added shared BIOES/Viterbi decoding and span refinement utilities used by both MLX and PyTorch Privacy Filter paths.
  • Added Portuguese (pt) support to the REST API schemas.
  • Added Privacy Filter Studio, an interactive FastAPI/static web demo for masking and deterministic randomization.
  • Added Python and Swift Privacy Filter classifier-head bias support for Nemotron-PII artifacts.

Why This Release Matters

PII de-identification is only useful when the output is both safe and usable.

Simple masking is sometimes enough, but many clinical, operational, and demo workflows need text that still looks realistic: names that look like names, phone numbers that keep their separators, dates that keep their local ordering, and repeated mentions that resolve to the same fake person.

OpenMed v1.3.0 moves beyond static replacement lists. It gives developers a single privacy API that can:

  • run locally on Apple Silicon through MLX
  • run on CPU or CUDA through Transformers/PyTorch
  • preserve downstream-friendly formats
  • generate locale-appropriate fake identifiers
  • behave deterministically for reproducible tests and demos
  • route OpenAI and Nemotron Privacy Filter checkpoints through the same code

That makes OpenMed more practical for clinical prototypes, privacy demos, evaluation harnesses, and local-first healthcare applications.

Faker-Backed Anonymization

method="replace" now uses openmed.core.anonymizer.Anonymizer.

The anonymizer supports:

  • cached per-locale Faker instances
  • deterministic seeding with hashlib.blake2b
  • label-keyed generator dispatch
  • format-preserving phones, dates, emails, and generic IDs
  • locale overrides such as locale="pt_BR" or locale="en_GB"
  • custom label generators through register_label_generator()
  • custom clinical Faker providers through register_clinical_provider()

Example:

:::python
from openmed import deidentify

text = "Patient Pedro Almeida, CPF 123.456.789-09, phone +351 912 345 678."

result = deidentify(
    text,
    method="replace",
    lang="pt",
    locale="pt_BR",
    consistent=True,
    seed=42,
)

print(result.deidentified_text)

Deterministic mode means the same (label, original value) pair maps to the same surrogate within a call. Passing seed= makes the output reproducible across runs.

Clinical And National IDs

OpenMed now includes custom Faker providers for clinical and national ID shapes where Faker's built-ins are missing or insufficient:

  • Aadhaar with Verhoeff checksum
  • German Steuer-ID
  • medical record numbers
  • US National Provider Identifier (NPI)

It also reuses Faker's locale-specific built-ins where they already validate against OpenMed's checksum logic:

  • pt_BR.cpf and pt_BR.cnpj
  • nl_NL.ssn for BSN
  • fr_FR.ssn for NIR
  • it_IT.ssn for Codice Fiscale
  • es_ES.nie

Canonical Label Taxonomy

openmed.core.labels introduces CANONICAL_LABELS and normalize_label().

This gives downstream code one stable label vocabulary even when models emit different naming schemes:

  • English lowercase labels such as first_name
  • Portuguese uppercase labels such as FIRSTNAME
  • Privacy Filter BIOES labels such as B-NAME, I-EMAIL, or S-PHONE
  • mixed-case or separator variants

The anonymizer, replacement mapping, and Privacy Filter routes now use this normalization layer to reduce model-family-specific branching.

Privacy Filter Family

OpenMed v1.3.0 exposes two Privacy Filter checkpoint families through the same public API:

Variant PyTorch MLX full MLX 8-bit
OpenAI Privacy Filter openai/privacy-filter OpenMed/privacy-filter-mlx OpenMed/privacy-filter-mlx-8bit
Nemotron-PII fine-tune OpenMed/privacy-filter-nemotron OpenMed/privacy-filter-nemotron-mlx OpenMed/privacy-filter-nemotron-mlx-8bit

Both families use the OpenAI Privacy Filter architecture. The Nemotron-PII artifacts are fine-tuned on the Nemotron PII dataset and reuse the existing Privacy Filter pipeline and model architecture.

Use the same API everywhere:

:::python
from openmed import extract_pii, deidentify

text = "Patient Sarah Connor, DOB 03/15/1985, MRN 4471882."

entities = extract_pii(
    text,
    model_name="OpenMed/privacy-filter-nemotron-mlx-8bit",
)

safe = deidentify(
    text,
    model_name="OpenMed/privacy-filter-nemotron-mlx-8bit",
    method="replace",
    consistent=True,
    seed=42,
)

On Apple Silicon with MLX available, MLX artifacts run through PrivacyFilterMLXPipeline. On other hosts, MLX-only model names are automatically substituted with the matching PyTorch checkpoint:

  • OpenMed/privacy-filter-mlx* -> openai/privacy-filter
  • OpenMed/privacy-filter-nemotron-mlx* -> OpenMed/privacy-filter-nemotron

A one-time UserWarning explains the substitution.

PyTorch Privacy Filter

openmed.torch.PrivacyFilterTorchPipeline loads the Privacy Filter family via Transformers:

  • auto-selects CUDA when available, otherwise CPU
  • supports compatible fine-tunes
  • emits the same entity dictionary shape as the MLX pipeline
  • uses trust_remote_code=True by default for the OpenAI Privacy Filter family

Install:

:::bash
pip install -U "openmed[hf]"

Run:

:::python
from openmed import extract_pii

result = extract_pii(
    "Alice Smith emailed alice@example.com.",
    model_name="openai/privacy-filter",
)

MLX Privacy Filter Updates

The Python MLX Privacy Filter runtime now shares decoding utilities with the PyTorch path:

  • TokenLabelInfo
  • build_label_info
  • viterbi_decode
  • labels_to_token_spans
  • trim_span_whitespace
  • refine_privacy_filter_span

This keeps BIOES/Viterbi decoding consistent across backends.

The MLX model class also now honors classifier_bias / unembedding_bias in artifact configs. This keeps the original OpenAI Privacy Filter bias-less by default while allowing Nemotron-PII artifacts to load their biased classifier head correctly.

Swift And OpenMedKit

OpenMedKit also gained Privacy Filter classifier-head bias support.

The native MLX artifact loader now decodes classifier_bias / unembedding_bias and builds the Privacy Filter head with a learned bias when Nemotron-PII artifacts require it, while preserving the bias-less baseline path.

The OpenMed Scan Demo privacy-filter option now points at OpenMed/privacy-filter-nemotron-mlx-8bit and labels the engine as OpenAI Nemotron Privacy Filter throughout the picker, download events, and README.

Privacy Filter Studio

This release adds examples/privacy_filter_studio/, an interactive two-pane web demo for PII de-identification.

It includes:

  • sample clinical and operational notes
  • mask and deterministic randomize modes
  • highlighted detected entities
  • per-entity labels and category colors
  • model/backend status
  • latency and entity counters
  • a first-run download toggle
  • cache-only model loading unless downloads are explicitly allowed

Run:

:::bash
pip install -U "openmed[mlx]"        # or "openmed[hf]" off Apple Silicon
uvicorn examples.privacy_filter_studio.app:app --reload --port 8770

Open:

:::text
http://127.0.0.1:8770

Override the model:

:::bash
OPENMED_STUDIO_MODEL=OpenMed/privacy-filter-nemotron-mlx-8bit \
  uvicorn examples.privacy_filter_studio.app:app --port 8770

Documentation And Examples

New and updated docs/examples:

  • docs/anonymization.md
  • examples/obfuscation_demo.py
  • examples/privacy_filter_unified.py
  • examples/privacy_filter_studio/
  • examples/privacy_filter_book/app.py

The anonymization guide covers deterministic surrogates, locale resolution, format preservation, custom generators, clinical ID providers, and the Privacy Filter routing model.

Breaking Changes

  • faker>=22.0 is now a required core dependency.
  • method="replace" no longer uses the old small static fake-data lists. Downstream tests that asserted exact prior replacement strings should be updated.
  • Privacy Filter routing through extract_pii() skips regex smart-merging by design, because the model already performs Viterbi-constrained BIOES span construction.

Other de-identification methods are unchanged:

  • mask
  • remove
  • hash
  • shift_dates

Upgrade Notes

Install or upgrade:

:::bash
pip install -U openmed

For PyTorch Privacy Filter support:

:::bash
pip install -U "openmed[hf]"

For Apple Silicon MLX support:

:::bash
pip install -U "openmed[mlx]"

Recommended checks for application upgrades:

  • If you assert exact method="replace" outputs, switch to seeded deterministic expectations or assert that originals are removed.
  • If you use MLX Privacy Filter model names on non-MLX hosts, expect a one-time warning and an automatic PyTorch substitution.
  • If you package Nemotron-PII MLX artifacts, keep classifier_bias or unembedding_bias in the artifact config when the classifier head has bias.
  • If you expose downloads in a UI, use explicit user control like Privacy Filter Studio's download toggle.

Validation

Release-prep validation included:

:::bash
git diff --check
.venv/bin/python -m compileall -q examples/privacy_filter_studio openmed/mlx/models/privacy_filter.py
.venv/bin/python -m pytest tests/unit/mlx/test_privacy_filter_mlx.py tests/unit/test_privacy_filter_routing.py
.venv/bin/python -m pytest tests/unit/core/test_anonymizer.py tests/unit/core/test_labels.py tests/unit/test_pii.py tests/unit/test_privacy_filter_routing.py tests/unit/test_pii_multilingual_regression.py tests/unit/mlx/test_privacy_filter_mlx.py tests/unit/service/test_api.py

Results captured during release prep:

  • Studio and MLX model compile check: passed
  • Studio FastAPI smoke test: passed
  • Privacy Filter routing/MLX subset: 20 passed, 8 skipped
  • Focused privacy/anonymization suite: 471 passed, 1 skipped, 11 warnings

The warnings are pre-existing span-validation warnings from multilingual PII regression fixtures.

Thank You

OpenMed v1.3.0 is about making privacy work feel less like a demo trick and more like an actual developer surface: local when possible, portable when needed, deterministic when tests demand it, and realistic enough for useful clinical workflows.

Thank you to everyone testing the Privacy Filter artifacts, poking at de-identification edge cases, trying the OpenMedKit paths, and helping OpenMed move toward a more practical open-source healthcare AI stack.

What's Changed

Full Changelog: https://github.com/maziyarpanahi/openmed/compare/v1.2.0...v1.3.0

Source: README.md, updated 2026-04-29