Download Latest Version smallcode-Linux-X64.tar.gz (11.3 MB)
Email in envelope

Get an email when there's a new version of SmallCode

Home / v1.2.3
Name Modified Size InfoDownloads / Week
Parent folder
smallcode-Linux-X64.tar.gz 2026-05-27 11.2 MB
smallcode-macOS-ARM64.tar.gz 2026-05-27 11.1 MB
smallcode-Windows-X64.tar.gz 2026-05-27 10.9 MB
README.md 2026-05-27 1.4 kB
v1.2.3 source code.tar.gz 2026-05-27 11.5 MB
v1.2.3 source code.zip 2026-05-27 11.6 MB
Totals: 6 Items   56.3 MB 2

[1.2.3] - 2026-05-27

fix: SMALLCODE_CACHE_SPLIT now defaults to true — fixes llama.cpp KV-cache invalidation loop

Root cause: buildCompactSystemPrompt() injected dynamic content (memory, knowledge, skills) into the system prompt on every turn. llama.cpp uses LCP (Longest Common Prefix) similarity to reuse KV-cache between requests. When the system prompt changes each turn, llama.cpp discards all context checkpoints and re-processes the full prompt from scratch — producing the infinite erased invalidated context checkpoint loop and making every turn as slow as the first.

Fix: SMALLCODE_CACHE_SPLIT now defaults to true (was false). Dynamic context (memory, knowledge, skills) moves to a <sc:context> block prepended to the latest user message instead of the system prompt. The system prompt stays identical across turns → llama.cpp can cache it → checkpoints are preserved → subsequent turns are fast.

This also benefits cloud providers (OpenAI, Anthropic) that do prefix caching on their side — a stable system prompt gets more cache hits.

Set SMALLCODE_CACHE_SPLIT=false in .env to revert to the old behaviour.

llama.cpp server flags that also help (from ggml-org/llama.cpp#19977):

--checkpoint-every-n-tokens 2048 --ctx-checkpoints 64

Verification

  • 90/90 unit tests pass (npm test)

Source: README.md, updated 2026-05-27