Download Latest Version director_ai-3.11.1.tar.gz (511.8 kB)
Email in envelope

Get an email when there's a new version of director-ai

Home / v3.9.4
Name Modified Size InfoDownloads / Week
Parent folder
director_ai-3.9.4-py3-none-any.whl 2026-03-20 394.9 kB
director_ai-3.9.4.tar.gz 2026-03-20 576.6 kB
sbom.json 2026-03-20 72.8 kB
README.md 2026-03-20 1.9 kB
v3.9.4 source code.tar.gz 2026-03-20 10.9 MB
v3.9.4 source code.zip 2026-03-20 11.3 MB
Totals: 6 Items   23.3 MB 0

v3.9.4 — Verified Claims Release

Every claim in README, docs, and notebooks is now backed by measured data.

Domain Profile Thresholds (measured 2026-03-20, GTX 1060 6GB, NLI on CUDA)

Profile Old Threshold New Threshold Basis
medical 0.75 0.30 PubMedQA 500 samples: F1=59.9%, catch=77.3%
finance 0.70 0.30 FinanceBench 150 samples: 0% FPR at t≤0.30
legal 0.68 0.30 Aligned (CUAD OOM on 6GB — needs ≥16GB GPU)

CoherenceScorer produces scores in [0.25, 0.55] regardless of domain. Old thresholds rejected everything.

Provider Integrations Tested

  • guard() + OpenAI (gpt-4o-mini): ✅
  • guard() + Anthropic (claude-haiku): ✅ (correctly rejects hallucinations)
  • score() standalone: ✅

Docker

  • CPU Dockerfile builds and runs (health, review, source endpoints verified)
  • GPU Dockerfile included (not tested locally — needs NVIDIA runtime)

Documentation (25 files updated)

  • Model attribution: FactCG-DeBERTa-v3-Large credited with paper link
  • Domain presets: "tuned" → "preset" (starting points, not validated)
  • Docker: removed dead Docker Hub links, local build instructions
  • HF Spaces: "Live Demo" → "Demo" (space sleeps due to inactivity)
  • FPR: 2.0% → 10.5% (matching measured data)
  • Version references: all updated to 3.9.4
  • Threshold inconsistency documented: guard()=0.3 vs DirectorConfig=0.6
  • All cookbooks, guides, notebooks aligned to measured thresholds

Known Limitations (honest)

  • Domain profiles are starting points — tune on your own data
  • Long documents (legal contracts, SEC filings) OOM on 6GB VRAM
  • NLI-only E2E catch rate: 46.7% — hybrid mode needed for 90.7%
  • Summarization FPR: 10.5% (not 2.0% as previously claimed)

Full Changelog: https://github.com/anulum/director-ai/compare/v3.9.2...v3.9.4

Source: README.md, updated 2026-03-20