The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
README.md	2026-05-05	1.9 kB	0
v0.5.3 -- 2026-05-05 source code.tar.gz	2026-05-05	64.3 MB	0
v0.5.3 -- 2026-05-05 source code.zip	2026-05-05	69.1 MB	0
Totals: 3 Items		133.5 MB	0

Release notes — 2026-05-05

Bug fixes

Inaccessible audio recording URLs now surface a clear error. When an audio recording URL returns 403 or is otherwise unreachable, evaluators previously fell back to treating the URL as plain text and produced meaningless results. The system now raises a user-facing error pointing to the inaccessible recording. (#225 / [#216])
Eval template list now reflects the actual default version. The list view was hardcoding V1 for every template, so promoting a non-V1 version to default never showed up in the outer evals list. Added bulk version-metadata lookup so the list reflects the real default. (#229)
Eval-task usage reasons are no longer truncated. The per-log eval explanation in /tracer/eval-task/get_usage/ was being capped at 200 characters with a trailing …, making the full reason unrecoverable in the UI. The full string is now returned. (#229)
Test detail drawer no longer flashes the wrong width on reload. Reloading with the test detail drawer open briefly rendered a 90vw skeleton before collapsing to the 50vw voice drawer. The drawer now waits for store data to populate before sliding in, so it opens at the correct width with real content. (#210)
Long task labels no longer overflow. Replaced the overflowing label rendering with a custom tooltip. (#213)
Beta tag removed from chat simulation. (#226)

Fixed worker OOMs on high-volume eval tasks. The eval-task dispatcher was hydrating full ObservationSpan instances — including large attribute and I/O payloads — before enqueuing evaluations. Span IDs are now fetched via .only("id") / values_list("id", flat=True), sampling reuses the IDs returned by the random-sample query, and the cnt cap is pushed to Postgres via slicing. Behavior is unchanged; same set of span IDs is enqueued. (#207)

Source: README.md, updated 2026-05-05