| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| lmcache_cli-0.4.6.dev0-py3-none-any.whl | < 21 hours ago | 1.5 MB | |
| lmcache-0.4.5-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl | < 24 hours ago | 12.5 MB | |
| lmcache-0.4.5-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl | < 24 hours ago | 12.6 MB | |
| lmcache-0.4.5-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl | < 24 hours ago | 12.6 MB | |
| lmcache-0.4.5-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl | < 24 hours ago | 12.6 MB | |
| lmcache-0.4.5.tar.gz | < 24 hours ago | 10.1 MB | |
| README.md | 2026-05-15 | 7.4 kB | |
| v0.4.5 source code.tar.gz | 2026-05-15 | 10.1 MB | |
| v0.4.5 source code.zip | 2026-05-15 | 10.9 MB | |
| Totals: 9 Items | 83.0 MB | 0 | |
LMCache v0.4.5
⚠️ Important: Default CUDA Wheel Changed to cu13
🆕 New Model Support
- #3171 [MP][Feat] Support DeepSeek V4 by @liuyumoye
- #3073 [Chore] Bump transformers to >= 5.4 for GLM-5.1 support by @sammshen
- #3122 [Docs] Replace removed Qwen3-8B-Instruct in quickstart by @cr7258
- #3199 [Docs] Add Recipes section with MiniMax-M2 as first entry by @sammshen
🆕 New Framework / Integration Support
- #3165 [Core] TRT-LLM Integration by @sammshen
- #3182 [Docs] Move TensorRT-LLM into quickstart as a tab by @sammshen
- #3224 [MP] Add new mp connector snapshot for vllm 0.20.1 by @chunxiaozheng
- #3235 [MP] Add the lmcache_mp_connector for dev by @chunxiaozheng
🆕 New Hardware / Device Support
- #3287 fix(hpu): implement device-specific initialize_kvcaches_ptr for HPU connector by @hlin99
- #3101 [ROCm] Add Dockerfiles for AMD Instinct GPUs by @andyluo7
- #3211 [operator] Add gpuVendor field to support AMD GPUs by @elliotz-ai
- #3091 feat(infra): Global abstraction of torch.device for multi-device support by @hlin99
Multi-Process (MP) Mode — Core & Features
- [#3017] [MP] Refactor http server to make it extensible
- [#3128] [MP] Introduce common http apis
- [#3144] [MP] Introduce EventNotifier to replace direct os.eventfd usage
- [#3142] [MP] Use unified c_ops backend import and fix gpu_kv_format_name property
- [#3164] [MP] Disable Prometheus HTTP server for http_server entrypoint
- [#3137] [MP] Add IsolatedLRU eviction policy + per-cache_salt quotas
- [#3119] [MP] Add raw_block MP L2 adapter support via shared RawBlockCore
- [#3161] [MP] daxbackend mp l2 support
- [#3208] [MP] Make vLLM be able to reconnect after LMCache restarts
- [#3185] [MP] Remove the middle
/api/in all endpoints of http_server - [#3013] [MP] Add test-cache CLI command for GPU mode
- [#3111] [Chore][Revert] MP adapter signature shim from [#3100]
MP Observability & Metrics
- [#3045] Centralize L2 adapter byte accounting via AdapterUsage
- [#3116] Attach service.instance.id to OTel Resource
- [#3103] Use monotonic clock for CUDA-host-callback event timestamps
- [#3098] Add L0↔L1 throughput metrics (store/load GB/s)
- [#3094] Add L1+L2 token-level cache hit rate metric
- [#3112] Add L1/L2 failure health monitoring metrics
- [#3124] Per-adapter L1↔L2 throughput metrics
- [#3150] Add L1/L2 state metrics for MP mode
- [#3139] num_chunks_loaded counter
- [#3114] Real-reuse gap metrics
- [#3167] Add EventBus self-monitoring metrics
- [#3175] Per-request hit rate attributes on root OTel spans
- [#3194] Verify full MP observability surface in long_doc_qa_l2
- [#3253] Update grafana observability example panel
- [#3257] Drop inflated L2 store throughput on fast-path
- [#3205] Update dashboard metrics
- [#3196] Expose blend token-level hit-rate counters
- [#3232] Surface adapter type in NativeConnectorL2Adapter.report_status
- [#3233] Surface CB-registered GPU contexts in /api/statusreport
Core Engine & Refactoring
- [#3074] Make sure
submit_store_tasksees key from the same model - [#3088] Implement advance_request for store/prefetch controller
- [#3078] Single source of truth for Layer Group Metadata
- [#3162] Unify discover+normalize and add EngineType to REGISTER
- [#3140] Implement serialization / deserialization
- [#2634] Encoder Caching Support
Storage Backends & Adapters
- [#3018] RDMA L1 memory preregistration for MooncakeStore L2 adapter
- [#3064] Add S3 L2 adapter for MP mode
- [#3170] S3L2Adapter: support container credentials via boto3 delegate
- [#3188] Fix S3 L2 adapter listener race
- [#2631] Update MooncakestoreConnector to new store.setup(dict) api
- [#3172] Add batch operations to Mooncake L2 adapter
- [#3227] Per-operation dedicated worker pools to Mooncake L2 connector
- [#3060] Add Hugging Face Buckets as built-in remote storage backend
- [#3160] Add support for
AZURE_BLOBNIXL backend - [#3152] Harden nixl storage backend transfer and config handling
- [#2966] Add batched_contains() override for NixlDynamicStorageBackend
- [#2989] fix: use unique device_id per OBJ register/deregister cycle
- [#2861] Add nixl_endpoint_list for per-worker object-storage endpoint distribution
- [#2635] io_uring support for Rust raw block backend
- [#3169] Add option to skip raw block checkpoint load
- [#3120] Support 3FS storage backend via Usrbio (native) APIs
PD Backend / Disaggregation
- [#2972] Add bidirectional NIXL cache probe
- [#3038] Fully async PD backend
CacheBlend
- [#3062] Per-request root OTel span and SpanRegistry for CB server tracing
- [#3179] Fix CB lookup correctness, thread safety, and store-complete race
- [#3234] Dedup overlapping matches in cb_lookup_pre_computed
- [#3092] [ROCm] Triton block-sparse attention backend for CacheBlend
- [#3254] [CLI][CB] propagate CLI kvcache clear to CB fingerprint table
- [#3276] Revert "[CLI][CB] propagate CLI kvcache clear to CB fingerprint tabe"
Bug Fixes
- [#3149] fix(disk): defer cache-policy hit update until load succeeds in get_blocking()
- [#3197] fix: missing lock when clearing metadata cache
- [#3146] fix(#3104): per-instance FastAPI app to fix 503 on cache endpoints in TP=1 non-MP
- [#3244] fix: the missing /api/ purge
- [#3002] [Bug] Missing validate() in env-only config path of lmcache_get_or_create_config
- [#3178] [Chore][HotFix] EC UT
- [#3176] Exclude hyphenated tags from setuptools_scm git describe
CI/CD
- [#3109] Fix nightly build image job GHA
- [#3123] fix egress block and invalid PEP 440 version from variant tags
- [#3117] change blend CI cuda version and nightly tags
- [#3113] split logs in blend CI test
- [#3125] Update github workflow to avoid duplicated UTs and artifact builds
- [#3126] Use existing NGC base image tags for cu12.9 and cu13.0 builds
- [#3127] Add k3 build pipeline for unit tests
- [#3133] Add missing egress endpoint to nightly Docker build
- [#3135] Skip flaky test_p2p_backend_with_controller
- [#3141] Fix nightly Docker build broken by nightly-cu13 tag
- [#3134] Skip test PyPI publish on stable release
- [#3222] Fix flaky comprehensive tests
- [#3239] Route unit job to k8s queue
- [#3240] Expose prefiller/decoder/proxy logs as artifacts
- [#3256] Pin blend test vLLM to cu129 channel to avoid CUDA-13 PyPI wheel
- [#3275] Swap default to cu13 wheels and cu12.9 secondary
- [#3295] Allow PyTorch/NVIDIA egress endpoints in release workflows
- [#2969] Revert "[CI]: add full tag selectively"
CLI / Benchmarking
- [#3195] Add prefix-suffix-tuner workload for tiered + Blending KV-cache
- [#3220] Count requests as successful when stream has no content but usage reports tokens
- [#3279] Fix slim install imports and smoke-test wheel in CI
Tests
- [#3157] Add Comprehensive test for GDS backend
Documentation
- [#3110] Add MP mode to the vLLM quickstart tab
- [#3107] Update benchmarking guide to use lmcache bench engine CLI
- [#3131] Add docs for MP HTTP endpoints descriptions
- [#3173] Add docs goblin easter egg
- [#3204] Add doc for fs native connector
- [#3209] Update LMCache Recipes
Chore / Misc
- [#3095] Update CODEOWNERS for mp_observability
- [#3129] Add .gitignore to rust/raw_block