| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| README.md | 2026-05-07 | 3.4 kB | |
| v0.12.0 -- RR_Skillbook v2 rewrite + SM hardening source code.tar.gz | 2026-05-07 | 10.4 MB | |
| v0.12.0 -- RR_Skillbook v2 rewrite + SM hardening source code.zip | 2026-05-07 | 10.6 MB | |
| Totals: 3 Items | 21.0 MB | 7 | |
This is the merger of two release lines that had not yet shipped to PyPI: the 0.11.0 architectural rewrite and the 0.12.0 SkillManager hardening. Skipping a separate v0.11.0 tag — v0.12.0 supersets it.
0.11.0 — Architectural rewrite
RecursiveAgentcore abstraction extracted from RR (ace/core/recursive_agent.py). Generic recursive PydanticAI agent with sandbox, microcompaction, default tool set, depth-aware sub-agent registration.- RR collapsed into a single
RRStep. Orchestrator/worker split, batch machinery, andAttachInsightSourcesStepremoved. RR is now a true recursive loop. - Skillbook v2 — full schema rewrite, section-grouped storage (
context/harness), richerInsightSourceprovenance, BM25-backed retrieval (rank-bm25runtime dep).Skillbook.as_prompt()now returns markdown;python-toondropped. - Agentic SkillManager (first cut) — tool-calling loop (
ace/implementations/sm_tools.py) with atomic mutation tools (add_skill,update_skill,remove_skill,tag_skill) and read-only tools (search_skills,read_skill). - Reflector skillbook tools — Reflector can introspect / propose updates from inside the recursive loop.
- Anthropic prompt caching enabled by default for RR;
cache_read_tokens/cache_write_tokensforwarded in run metadata. - Logfire spans around recursive agent sessions.
- Online / offline mode in the ACE runner.
record_observationrenamed tothink.
0.12.0 — SM hardening
- Cross-trace generalization gate (four-criterion: ≥3 instances across ≥2 domains, named slot, no API-specific params in action, verifiable runtime trigger). Backed by skill_generalization.md (github.com) (14 cited sources).
- Action-equivalence rule — splits on action, not trigger surface.
- Atomicity rule for
insight— one trigger + one action; explicit good/bad shape examples. - ICL-grounded insight format drawn from icl_skill_formatting.md (github.com): 15-50 word cap, imperative voice, positive framing default.
- Evidence-only tagging — SM no longer iterates
injected_skill_ids; tags only skills the reflection actually implicates. - Broaden-via-comparison for UPDATE — same root cause in different niches → broaden
issue, don't duplicate. - Prompt caching for SM via
CachePoint(ttl="5m"), mirroring RR. - Hard removal cap removed —
harmful_count >= 3no longer auto-REMOVES skills. update_skillssignature:sourceis optional;SkillbookViewdropped from parameters.- Skillbook v1 legacy aliases removed — v2 is the only schema.
End-to-end retail result (Haiku 4.5)
| Metric | Value |
|---|---|
| Baseline pass@1 | 45.0% |
| With learned skillbook | 67.5% |
| Δ pass@1 | +22.5 pp (12 improved, 3 regressed) |
| Skillbook size | 35 skills |
Tau-bench fix
evaluation_type=ALL_WITH_NL_ASSERTIONS on both run_task and run_tasks call sites in ace-eval/src/ace_eval/e2e/benchmarks/tau_bench.py. Retail and any future benchmark with NL_ASSERTION in reward_basis now produces real reward numbers instead of crashing in reward computation.
See CHANGELOG.md (github.com) for full details.