Download Latest Version v0.12.0 -- RR_Skillbook v2 rewrite + SM hardening source code.zip (10.6 MB)
Email in envelope

Get an email when there's a new version of Agentic Context Engine

Home / v0.12.0
Name Modified Size InfoDownloads / Week
Parent folder
README.md 2026-05-07 3.4 kB
v0.12.0 -- RR_Skillbook v2 rewrite + SM hardening source code.tar.gz 2026-05-07 10.4 MB
v0.12.0 -- RR_Skillbook v2 rewrite + SM hardening source code.zip 2026-05-07 10.6 MB
Totals: 3 Items   21.0 MB 7

This is the merger of two release lines that had not yet shipped to PyPI: the 0.11.0 architectural rewrite and the 0.12.0 SkillManager hardening. Skipping a separate v0.11.0 tag — v0.12.0 supersets it.

0.11.0 — Architectural rewrite

  • RecursiveAgent core abstraction extracted from RR (ace/core/recursive_agent.py). Generic recursive PydanticAI agent with sandbox, microcompaction, default tool set, depth-aware sub-agent registration.
  • RR collapsed into a single RRStep. Orchestrator/worker split, batch machinery, and AttachInsightSourcesStep removed. RR is now a true recursive loop.
  • Skillbook v2 — full schema rewrite, section-grouped storage (context / harness), richer InsightSource provenance, BM25-backed retrieval (rank-bm25 runtime dep). Skillbook.as_prompt() now returns markdown; python-toon dropped.
  • Agentic SkillManager (first cut) — tool-calling loop (ace/implementations/sm_tools.py) with atomic mutation tools (add_skill, update_skill, remove_skill, tag_skill) and read-only tools (search_skills, read_skill).
  • Reflector skillbook tools — Reflector can introspect / propose updates from inside the recursive loop.
  • Anthropic prompt caching enabled by default for RR; cache_read_tokens / cache_write_tokens forwarded in run metadata.
  • Logfire spans around recursive agent sessions.
  • Online / offline mode in the ACE runner.
  • record_observation renamed to think.

0.12.0 — SM hardening

  • Cross-trace generalization gate (four-criterion: ≥3 instances across ≥2 domains, named slot, no API-specific params in action, verifiable runtime trigger). Backed by skill_generalization.md (github.com) (14 cited sources).
  • Action-equivalence rule — splits on action, not trigger surface.
  • Atomicity rule for insight — one trigger + one action; explicit good/bad shape examples.
  • ICL-grounded insight format drawn from icl_skill_formatting.md (github.com): 15-50 word cap, imperative voice, positive framing default.
  • Evidence-only tagging — SM no longer iterates injected_skill_ids; tags only skills the reflection actually implicates.
  • Broaden-via-comparison for UPDATE — same root cause in different niches → broaden issue, don't duplicate.
  • Prompt caching for SM via CachePoint(ttl="5m"), mirroring RR.
  • Hard removal cap removedharmful_count >= 3 no longer auto-REMOVES skills.
  • update_skills signature: source is optional; SkillbookView dropped from parameters.
  • Skillbook v1 legacy aliases removed — v2 is the only schema.

End-to-end retail result (Haiku 4.5)

Metric Value
Baseline pass@1 45.0%
With learned skillbook 67.5%
Δ pass@1 +22.5 pp (12 improved, 3 regressed)
Skillbook size 35 skills

Tau-bench fix

evaluation_type=ALL_WITH_NL_ASSERTIONS on both run_task and run_tasks call sites in ace-eval/src/ace_eval/e2e/benchmarks/tau_bench.py. Retail and any future benchmark with NL_ASSERTION in reward_basis now produces real reward numbers instead of crashing in reward computation.

See CHANGELOG.md (github.com) for full details.

Source: README.md, updated 2026-05-07