| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| director_ai-3.9.4-py3-none-any.whl | 2026-03-20 | 394.9 kB | |
| director_ai-3.9.4.tar.gz | 2026-03-20 | 576.6 kB | |
| sbom.json | 2026-03-20 | 72.8 kB | |
| README.md | 2026-03-20 | 1.9 kB | |
| v3.9.4 source code.tar.gz | 2026-03-20 | 10.9 MB | |
| v3.9.4 source code.zip | 2026-03-20 | 11.3 MB | |
| Totals: 6 Items | 23.3 MB | 0 | |
v3.9.4 — Verified Claims Release
Every claim in README, docs, and notebooks is now backed by measured data.
Domain Profile Thresholds (measured 2026-03-20, GTX 1060 6GB, NLI on CUDA)
| Profile | Old Threshold | New Threshold | Basis |
|---|---|---|---|
| medical | 0.75 | 0.30 | PubMedQA 500 samples: F1=59.9%, catch=77.3% |
| finance | 0.70 | 0.30 | FinanceBench 150 samples: 0% FPR at t≤0.30 |
| legal | 0.68 | 0.30 | Aligned (CUAD OOM on 6GB — needs ≥16GB GPU) |
CoherenceScorer produces scores in [0.25, 0.55] regardless of domain. Old thresholds rejected everything.
Provider Integrations Tested
guard()+ OpenAI (gpt-4o-mini): ✅guard()+ Anthropic (claude-haiku): ✅ (correctly rejects hallucinations)score()standalone: ✅
Docker
- CPU Dockerfile builds and runs (health, review, source endpoints verified)
- GPU Dockerfile included (not tested locally — needs NVIDIA runtime)
Documentation (25 files updated)
- Model attribution: FactCG-DeBERTa-v3-Large credited with paper link
- Domain presets: "tuned" → "preset" (starting points, not validated)
- Docker: removed dead Docker Hub links, local build instructions
- HF Spaces: "Live Demo" → "Demo" (space sleeps due to inactivity)
- FPR: 2.0% → 10.5% (matching measured data)
- Version references: all updated to 3.9.4
- Threshold inconsistency documented: guard()=0.3 vs DirectorConfig=0.6
- All cookbooks, guides, notebooks aligned to measured thresholds
Known Limitations (honest)
- Domain profiles are starting points — tune on your own data
- Long documents (legal contracts, SEC filings) OOM on 6GB VRAM
- NLI-only E2E catch rate: 46.7% — hybrid mode needed for 90.7%
- Summarization FPR: 10.5% (not 2.0% as previously claimed)
Full Changelog: https://github.com/anulum/director-ai/compare/v3.9.2...v3.9.4