LMCache - Browse /v0.4.7 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
lmcache-0.4.7-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl	2026-06-13	13.2 MB	0
lmcache-0.4.7-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl	2026-06-13	13.3 MB	0
lmcache-0.4.7-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl	2026-06-13	13.3 MB	0
lmcache-0.4.7-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl	2026-06-13	13.3 MB	0
lmcache-0.4.7.tar.gz	2026-06-13	6.2 MB	0
README.md	2026-06-13	9.7 kB	0
v0.4.7 source code.tar.gz	2026-06-13	6.2 MB	0
v0.4.7 source code.zip	2026-06-13	7.3 MB	0
Totals: 8 Items		72.7 MB	0

LMCache v0.4.7 Release

Interface / Config / CLI / Build Changes

Breaking / behavior changes (action may be needed)

python_ops_fallback now requires completion recorder ops (added missing ops)
LMCacheGroupView renamed to EngineGroupInfo
report_status is now per-kernel-group
Per-group tokens_per_chunk / slots_per_chunk now used instead of inferring from cache_config.block_size
goblin is deprecated (documented)
Blend v2 CI removed; CacheBlend now uses Blend v3

New / additive (opt-in)

New mp_transfer_mode config option
New SHM-based data transfer path for GPUs/CPU/Accelerators (POSIX SHM infra for CPU KV-cache IPC)
New hybrid memory allocator (HMA) support, with per-group block sizes and Mamba/GDN hybrid model (Qwen3.5) support
New MP coordinator backbone: server registration, coordinator CLI, L2 quota/usage/eviction, global CacheBlend fingerprint directory
New CLI quota management commands (set/get/list/delete)
New runtime DAX hotplug HTTP API (MP)
New --mode cpu and --transfer-mode options in server_bench
New backends: NIXL DOCA_MEMOS (NVIDIA CMX), Cloud Bigtable remote storage, Moore Threads MUSA support, multipath KV-cache offloading in NIXL backend
New multi_layer_block_kv_transfer unified MP transfer primitive
LMCache startup banner now printed in CLI and vLLM connectors
vLLM CPU 2-fused KV layout support
Token-level matching for non-block-aligned KV reuse (CacheBlend)

MP (Multi-Process Mode)

[#3245] Retain CUDA IPC events in MP adapter
[#3359] SHM-based data transfer path for GPUs/CPU/Accelerators
[#3382] Fix GPU block exhaustion deadlock at high concurrency with chunked KV loading
[#3488] Add mp coordinator backbone
[#3513] Add mp_transfer_mode config option
[#3516] Register MP servers with the coordinator
[#3522] Add coordinator CLI and mp server registration
[#3531] Introduce create_cache_context factory
[#3557] Refactor LMCache layer group for better compat with hybrid models
[#3608] Introduce object_group_id into the ObjectKey
[#3352] Add SHM-based NonGpuContext (server-side copy)
[#3612] Implement interface for multi-object group and sliding window support (HMA)
[#3630] Coordinator L2 Quota, Usage, Eviction
[#3597] Global CacheBlend fingerprint directory on the MP coordinator
[#3264] Add runtime DAX hotplug http API
[#3477] Add l2_evicted_object, add cachesalt to L1/L2 metrics
[#3478] Consolidate ParallelStrategy construction in vllm_multi_process_adapter
[#3558] Align MP server id with OTel service.instance.id
[#3508] Add multi_layer_block_kv_transfer Python fallback as unified MP transfer primitive
[#3563] Add POSIX SHM infra for CPU KV-cache IPC

Core / HMA

[#3419] Add support for hybrid memory allocator
[#3491] Bitmap-based prefetch result + pluggable TrimPolicy
[#3503] Native bulk-set: build found bitmap via batched_set + gather
[#3492] Sparse prefetch via TrimPolicy.SPARSE + covered_keys
[#3521] Support different block size for different groups
[#3613] Support Mamba/GDN hybrid models (Qwen3.5)
[#3616] Per-group tokens_per_chunk and slots_per_chunk
[#3635] Optimize DSV4 store/load size
[#3589] Add GDS L1 slab-file tier (cuFile DMA) for MP mode

CacheBlend

[#3364] Blend v3
[#3582] Token-level matching + per-token slot scatter for non-block-aligned KV reuse
[#3629] Reuse gpu_transfer.cache_contexts; drop CB GPU-context mirror
[#3541] Cleanup/remove blend v2 ci

Storage / Backends

[#3486] NIXL DOCA_MEMOS storage backend (NVIDIA CMX)
[#3453] nixl_storage: use LocalCPUBackend if nixl_buffer_device=cpu
[#3263] Added HFbucket MP
[#2418] Add multipath KV-cache offloading support in LMCache NIXL backend
[#3404] Integrate native Cloud Bigtable remote storage connector
[#3483] Add Moore Threads MUSA support for LMCache v1
[#3568] nixl: create storage directory if it doesn't exist
[#3274] Missing io_uring changes + nvme io_uring_cmd passthrough

Observability

[#3384] Add NVTX annotations to LocalDiskBackend disk read path
[#3607] Blend server trace sub-spans + V3 hit-rate breakdown

Operator

[#3543] CacheBlend: CacheBlendEngine CRD + injection webhook
[#3647] Emit --engine-type blend for CacheBlend engine
[#3646] Install cert-manager in e2e smoke suite

CLI

[#3611] Print LMCache startup banner in CLI and vLLM connectors
[#3625] Refactor query and trace cli
[#3623] Add quota management commands (set/get/list/delete)

XPU / Accelerators

[#3360] Add SYCL CacheGen + RoPE kernels and in-process blender XPU tests

Bugfixes

[#3327] gds: use parse_cache_key to handle LayerCacheEngineKey on restart
[#3441] Drop EngineArgs+asdict to fix vLLM 0.20+ pydantic error
[#3189] Fix LocalCPUBackend recovery when pinned CPU chunks block eviction
[#3469] Add missing completion recorder ops to python_ops_fallback
[#3463] Prevent stale prefetches and registry memory leaks by purging unregistered KV layouts
[#3410] Prevent negative pin count on unpinned remote memory objects
[#3278] PD restore pin=True in PD sync backend dedup path
[#3525] Resolve AttributeError in test_execute_calls_run_http_server
[#3602] Handle NL_X_NB_NH_BS_TWO_HS in get_group_data_ptrs
[#3606] Add missing enum to GPUVKFormat
[#3325] Graceful skip on slot_mapping/token_ids desync in wait_for_save (fixes [#3318])
[#3648] Correct retrieve log label prefix -> non_shifted

Performance / Optimization

[#3413] Avoid redundant PCIe transfer on leader rank during retrieve
[#3591] Optimize Python fallback path for block transfer operations

Refactor / Cleanup

[#3460] Move serializer registry + encoder/decoder helpers to end of custom_types.py
[#3445] Simplify redundant conditions in RawBlockCore
[#3216] Put lmcache_frontend into lmcache repo
[#3514] Add set_shape_desc_dtype helper to avoid scattered try/except
[#3545] Normalize block_ids to tolerate legacy vLLM connectors
[#3577] Normalize flat/nested block_ids in flat_block_ids and connector str
[#3567] Support vLLM CPU 2-fused KV layout
[#3598] Rename LMCacheGroupView to EngineGroupInfo
[#3599] Change report_status to be per-kernel-group in LMCache
[#3581] Remove unnecessary global statement in cuda_extension
[#3600] Utilize multi_layer_block_kv_transfer ops for data transfer path
[#3524] Add transfer timing logs to non-GPU path similar to CUDA path

Benchmarking

[#3283] Support benchmark fs and hf3fs backend via storage_backend_io_benchmark
[#3528] server_bench supports --mode cpu and --transfer-mode
[#3603] Support aligned L1 buffers for L2 adapters

CI/CD & Build

[#3456] Add http_api e2e test for MP HTTP server endpoints and CLI commands
[#3498] Relax timeout to reduce flakiness of some CI/CD tests
[#3502] Force vLLM Model Runner V1 in the PD comprehensive test
[#3489] Add pickle/shm vLLM + LMCache e2e validation on CPU
[#3538] Hot fix for the CPU test in multiprocess mode CI
[#3507] Add parity test between c_ops and python_ops_fallback
[#3321] Add unit tests for v1/utils/bloom_filter
[#3556] Improve CI stability: gemma-4 test & serde test
[#3614] Reduce ci cpu e2e test memory request
[#3621] cu129 images: pin vllm to the cu129 index (drop unsafe-best-match)
[#3590] Add CPU e2e test (vLLM and bench server)

Docs

[#3457] Update and restructure CLI reference
[#3481] Fix Docker examples and build metadata
[#3501] Combined doc drift updates May 27-Jun 2
[#3504] KV Cache Size Calculator: add hybrid SWA, DSA, placeholders for Mamba / Linear
[#3506] Add recipe for Gemma 3
[#3518] Deprecate goblin in doc
[#3461] Update README.md
[#3433] Auto-select model in CPU-offloading example to fit GPU
[#3534] Add filesystem connector backend guide
[#3645] Recipe update for Qwen 3.6 27B and general guideline for mamba models
[#2834] kv_cache_calculator: add Hunyuan & DeepSeek models, fix head_dim/CLA, add i18n UI

Chinese Translation

[#3386] Update Chinese documentation translations
[#3482] Update Chinese documentation translations
[#3588] Update Chinese documentation translations
[#3592] Correct machine translation errors in documentation

Chore / Maintenance

[#3443] Convert loglevel_api f-strings to %-format
[#3447] Convert internal_api_server f-string log calls to %-format
[#3136] Bump go.opentelemetry.io/otel from 1.36.0 to 1.41.0 in /operator
[#3596] Bump sphinxcontrib-mermaid from 1.2.2 to 2.0.2

New Contributors

@ChiragB254 made their first contribution in [#3443]
@Alorun made their first contribution in [#3447]
@JinuJeong made their first contribution in [#3327]
@catyion made their first contribution in [#3460]
@3xdevv made their first contribution in [#3481]
@nayeonikim made their first contribution in [#3445]
@sihara made their first contribution in [#3384]
@XuanCS made their first contribution in [#3278]
@Lyj1007 made their first contribution in [#3507]
@kirklandsign made their first contribution in [#3321]
@superleo made their first contribution in [#3483]
@feixiangpeng made their first contribution in [#3263]
@sonimwang made their first contribution in [#3592]
@KimmoZAG made their first contribution in [#2834]
@Chris-Sigopt made their first contribution in [#3606]
@ekaynar made their first contribution in [#2418]
@dhruvatr made their first contribution in [#3581]
@Kushagra963-lab made their first contribution in [#3534]

Source: README.md, updated 2026-06-13

LMCache Files

Supercharge Your LLM with the Fastest KV Cache Layer

LMCache v0.4.7 Release