The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
lmcache_cli-0.4.6.dev0-py3-none-any.whl	< 21 hours ago	1.5 MB	0
lmcache-0.4.5-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl	< 24 hours ago	12.5 MB	0
lmcache-0.4.5-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl	< 24 hours ago	12.6 MB	0
lmcache-0.4.5-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl	< 24 hours ago	12.6 MB	0
lmcache-0.4.5-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl	< 24 hours ago	12.6 MB	0
lmcache-0.4.5.tar.gz	< 24 hours ago	10.1 MB	0
README.md	2026-05-15	7.4 kB	0
v0.4.5 source code.tar.gz	2026-05-15	10.1 MB	0
v0.4.5 source code.zip	2026-05-15	10.9 MB	0
Totals: 9 Items		83.0 MB	0

LMCache v0.4.5

⚠️ Important: Default CUDA Wheel Changed to cu13

🆕 New Model Support

#3171 [MP][Feat] Support DeepSeek V4 by @liuyumoye
#3073 [Chore] Bump transformers to >= 5.4 for GLM-5.1 support by @sammshen
#3122 [Docs] Replace removed Qwen3-8B-Instruct in quickstart by @cr7258
#3199 [Docs] Add Recipes section with MiniMax-M2 as first entry by @sammshen

🆕 New Framework / Integration Support

#3165 [Core] TRT-LLM Integration by @sammshen
#3182 [Docs] Move TensorRT-LLM into quickstart as a tab by @sammshen
#3224 [MP] Add new mp connector snapshot for vllm 0.20.1 by @chunxiaozheng
#3235 [MP] Add the lmcache_mp_connector for dev by @chunxiaozheng

🆕 New Hardware / Device Support

#3287 fix(hpu): implement device-specific initialize_kvcaches_ptr for HPU connector by @hlin99
#3101 [ROCm] Add Dockerfiles for AMD Instinct GPUs by @andyluo7
#3211 [operator] Add gpuVendor field to support AMD GPUs by @elliotz-ai
#3091 feat(infra): Global abstraction of torch.device for multi-device support by @hlin99

Multi-Process (MP) Mode — Core & Features

[#3017] [MP] Refactor http server to make it extensible
[#3128] [MP] Introduce common http apis
[#3144] [MP] Introduce EventNotifier to replace direct os.eventfd usage
[#3142] [MP] Use unified c_ops backend import and fix gpu_kv_format_name property
[#3164] [MP] Disable Prometheus HTTP server for http_server entrypoint
[#3137] [MP] Add IsolatedLRU eviction policy + per-cache_salt quotas
[#3119] [MP] Add raw_block MP L2 adapter support via shared RawBlockCore
[#3161] [MP] daxbackend mp l2 support
[#3208] [MP] Make vLLM be able to reconnect after LMCache restarts
[#3185] [MP] Remove the middle /api/ in all endpoints of http_server
[#3013] [MP] Add test-cache CLI command for GPU mode
[#3111] [Chore][Revert] MP adapter signature shim from [#3100]

MP Observability & Metrics

[#3045] Centralize L2 adapter byte accounting via AdapterUsage
[#3116] Attach service.instance.id to OTel Resource
[#3103] Use monotonic clock for CUDA-host-callback event timestamps
[#3098] Add L0↔L1 throughput metrics (store/load GB/s)
[#3094] Add L1+L2 token-level cache hit rate metric
[#3112] Add L1/L2 failure health monitoring metrics
[#3124] Per-adapter L1↔L2 throughput metrics
[#3150] Add L1/L2 state metrics for MP mode
[#3139] num_chunks_loaded counter
[#3114] Real-reuse gap metrics
[#3167] Add EventBus self-monitoring metrics
[#3175] Per-request hit rate attributes on root OTel spans
[#3194] Verify full MP observability surface in long_doc_qa_l2
[#3253] Update grafana observability example panel
[#3257] Drop inflated L2 store throughput on fast-path
[#3205] Update dashboard metrics
[#3196] Expose blend token-level hit-rate counters
[#3232] Surface adapter type in NativeConnectorL2Adapter.report_status
[#3233] Surface CB-registered GPU contexts in /api/statusreport

Core Engine & Refactoring

[#3074] Make sure submit_store_task sees key from the same model
[#3088] Implement advance_request for store/prefetch controller
[#3078] Single source of truth for Layer Group Metadata
[#3162] Unify discover+normalize and add EngineType to REGISTER
[#3140] Implement serialization / deserialization
[#2634] Encoder Caching Support

Storage Backends & Adapters

[#3018] RDMA L1 memory preregistration for MooncakeStore L2 adapter
[#3064] Add S3 L2 adapter for MP mode
[#3170] S3L2Adapter: support container credentials via boto3 delegate
[#3188] Fix S3 L2 adapter listener race
[#2631] Update MooncakestoreConnector to new store.setup(dict) api
[#3172] Add batch operations to Mooncake L2 adapter
[#3227] Per-operation dedicated worker pools to Mooncake L2 connector
[#3060] Add Hugging Face Buckets as built-in remote storage backend
[#3160] Add support for AZURE_BLOB NIXL backend
[#3152] Harden nixl storage backend transfer and config handling
[#2966] Add batched_contains() override for NixlDynamicStorageBackend
[#2989] fix: use unique device_id per OBJ register/deregister cycle
[#2861] Add nixl_endpoint_list for per-worker object-storage endpoint distribution
[#2635] io_uring support for Rust raw block backend
[#3169] Add option to skip raw block checkpoint load
[#3120] Support 3FS storage backend via Usrbio (native) APIs

PD Backend / Disaggregation

[#2972] Add bidirectional NIXL cache probe
[#3038] Fully async PD backend

CacheBlend

[#3062] Per-request root OTel span and SpanRegistry for CB server tracing
[#3179] Fix CB lookup correctness, thread safety, and store-complete race
[#3234] Dedup overlapping matches in cb_lookup_pre_computed
[#3092] [ROCm] Triton block-sparse attention backend for CacheBlend
[#3254] [CLI][CB] propagate CLI kvcache clear to CB fingerprint table
[#3276] Revert "[CLI][CB] propagate CLI kvcache clear to CB fingerprint tabe"

Bug Fixes

[#3149] fix(disk): defer cache-policy hit update until load succeeds in get_blocking()
[#3197] fix: missing lock when clearing metadata cache
[#3146] fix(#3104): per-instance FastAPI app to fix 503 on cache endpoints in TP=1 non-MP
[#3244] fix: the missing /api/ purge
[#3002] [Bug] Missing validate() in env-only config path of lmcache_get_or_create_config
[#3178] [Chore][HotFix] EC UT
[#3176] Exclude hyphenated tags from setuptools_scm git describe

CI/CD

[#3109] Fix nightly build image job GHA
[#3123] fix egress block and invalid PEP 440 version from variant tags
[#3117] change blend CI cuda version and nightly tags
[#3113] split logs in blend CI test
[#3125] Update github workflow to avoid duplicated UTs and artifact builds
[#3126] Use existing NGC base image tags for cu12.9 and cu13.0 builds
[#3127] Add k3 build pipeline for unit tests
[#3133] Add missing egress endpoint to nightly Docker build
[#3135] Skip flaky test_p2p_backend_with_controller
[#3141] Fix nightly Docker build broken by nightly-cu13 tag
[#3134] Skip test PyPI publish on stable release
[#3222] Fix flaky comprehensive tests
[#3239] Route unit job to k8s queue
[#3240] Expose prefiller/decoder/proxy logs as artifacts
[#3256] Pin blend test vLLM to cu129 channel to avoid CUDA-13 PyPI wheel
[#3275] Swap default to cu13 wheels and cu12.9 secondary
[#3295] Allow PyTorch/NVIDIA egress endpoints in release workflows
[#2969] Revert "[CI]: add full tag selectively"

CLI / Benchmarking

[#3195] Add prefix-suffix-tuner workload for tiered + Blending KV-cache
[#3220] Count requests as successful when stream has no content but usage reports tokens
[#3279] Fix slim install imports and smoke-test wheel in CI

Tests

[#3157] Add Comprehensive test for GDS backend

Documentation

[#3110] Add MP mode to the vLLM quickstart tab
[#3107] Update benchmarking guide to use lmcache bench engine CLI
[#3131] Add docs for MP HTTP endpoints descriptions
[#3173] Add docs goblin easter egg
[#3204] Add doc for fs native connector
[#3209] Update LMCache Recipes

Chore / Misc

[#3095] Update CODEOWNERS for mp_observability
[#3129] Add .gitignore to rust/raw_block

Source: README.md, updated 2026-05-15

LMCache Files

Supercharge Your LLM with the Fastest KV Cache Layer

LMCache v0.4.5

⚠️ Important: Default CUDA Wheel Changed to cu13

🆕 New Model Support

🆕 New Framework / Integration Support

🆕 New Hardware / Device Support

Multi-Process (MP) Mode — Core & Features

MP Observability & Metrics

Core Engine & Refactoring

Storage Backends & Adapters

PD Backend / Disaggregation

CacheBlend

Bug Fixes

CI/CD

CLI / Benchmarking

Tests

Documentation

Chore / Misc

LMCache Files

Supercharge Your LLM with the Fastest KV Cache Layer

Get an email when there's a new version of LMCache

LMCache v0.4.5

⚠️ Important: Default CUDA Wheel Changed to cu13

🆕 New Model Support

🆕 New Framework / Integration Support

🆕 New Hardware / Device Support

Multi-Process (MP) Mode — Core & Features

MP Observability & Metrics

Core Engine & Refactoring

Storage Backends & Adapters

PD Backend / Disaggregation

CacheBlend

Bug Fixes

CI/CD

CLI / Benchmarking

Tests

Documentation

Chore / Misc