Download Latest Version Release v0.4.5 _ CUDA 12.9 source code.tar.gz (10.1 MB)
Email in envelope

Get an email when there's a new version of LMCache

Home / v0.4.5
Name Modified Size InfoDownloads / Week
Parent folder
lmcache_cli-0.4.6.dev0-py3-none-any.whl < 21 hours ago 1.5 MB
lmcache-0.4.5-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl < 24 hours ago 12.5 MB
lmcache-0.4.5-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl < 24 hours ago 12.6 MB
lmcache-0.4.5-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl < 24 hours ago 12.6 MB
lmcache-0.4.5-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl < 24 hours ago 12.6 MB
lmcache-0.4.5.tar.gz < 24 hours ago 10.1 MB
README.md 2026-05-15 7.4 kB
v0.4.5 source code.tar.gz 2026-05-15 10.1 MB
v0.4.5 source code.zip 2026-05-15 10.9 MB
Totals: 9 Items   83.0 MB 0

LMCache v0.4.5

⚠️ Important: Default CUDA Wheel Changed to cu13

🆕 New Model Support

  • #3171 [MP][Feat] Support DeepSeek V4 by @liuyumoye
  • #3073 [Chore] Bump transformers to >= 5.4 for GLM-5.1 support by @sammshen
  • #3122 [Docs] Replace removed Qwen3-8B-Instruct in quickstart by @cr7258
  • #3199 [Docs] Add Recipes section with MiniMax-M2 as first entry by @sammshen

🆕 New Framework / Integration Support

  • #3165 [Core] TRT-LLM Integration by @sammshen
  • #3182 [Docs] Move TensorRT-LLM into quickstart as a tab by @sammshen
  • #3224 [MP] Add new mp connector snapshot for vllm 0.20.1 by @chunxiaozheng
  • #3235 [MP] Add the lmcache_mp_connector for dev by @chunxiaozheng

🆕 New Hardware / Device Support

  • #3287 fix(hpu): implement device-specific initialize_kvcaches_ptr for HPU connector by @hlin99
  • #3101 [ROCm] Add Dockerfiles for AMD Instinct GPUs by @andyluo7
  • #3211 [operator] Add gpuVendor field to support AMD GPUs by @elliotz-ai
  • #3091 feat(infra): Global abstraction of torch.device for multi-device support by @hlin99

Multi-Process (MP) Mode — Core & Features

  • [#3017] [MP] Refactor http server to make it extensible
  • [#3128] [MP] Introduce common http apis
  • [#3144] [MP] Introduce EventNotifier to replace direct os.eventfd usage
  • [#3142] [MP] Use unified c_ops backend import and fix gpu_kv_format_name property
  • [#3164] [MP] Disable Prometheus HTTP server for http_server entrypoint
  • [#3137] [MP] Add IsolatedLRU eviction policy + per-cache_salt quotas
  • [#3119] [MP] Add raw_block MP L2 adapter support via shared RawBlockCore
  • [#3161] [MP] daxbackend mp l2 support
  • [#3208] [MP] Make vLLM be able to reconnect after LMCache restarts
  • [#3185] [MP] Remove the middle /api/ in all endpoints of http_server
  • [#3013] [MP] Add test-cache CLI command for GPU mode
  • [#3111] [Chore][Revert] MP adapter signature shim from [#3100]

MP Observability & Metrics

  • [#3045] Centralize L2 adapter byte accounting via AdapterUsage
  • [#3116] Attach service.instance.id to OTel Resource
  • [#3103] Use monotonic clock for CUDA-host-callback event timestamps
  • [#3098] Add L0↔L1 throughput metrics (store/load GB/s)
  • [#3094] Add L1+L2 token-level cache hit rate metric
  • [#3112] Add L1/L2 failure health monitoring metrics
  • [#3124] Per-adapter L1↔L2 throughput metrics
  • [#3150] Add L1/L2 state metrics for MP mode
  • [#3139] num_chunks_loaded counter
  • [#3114] Real-reuse gap metrics
  • [#3167] Add EventBus self-monitoring metrics
  • [#3175] Per-request hit rate attributes on root OTel spans
  • [#3194] Verify full MP observability surface in long_doc_qa_l2
  • [#3253] Update grafana observability example panel
  • [#3257] Drop inflated L2 store throughput on fast-path
  • [#3205] Update dashboard metrics
  • [#3196] Expose blend token-level hit-rate counters
  • [#3232] Surface adapter type in NativeConnectorL2Adapter.report_status
  • [#3233] Surface CB-registered GPU contexts in /api/statusreport

Core Engine & Refactoring

  • [#3074] Make sure submit_store_task sees key from the same model
  • [#3088] Implement advance_request for store/prefetch controller
  • [#3078] Single source of truth for Layer Group Metadata
  • [#3162] Unify discover+normalize and add EngineType to REGISTER
  • [#3140] Implement serialization / deserialization
  • [#2634] Encoder Caching Support

Storage Backends & Adapters

  • [#3018] RDMA L1 memory preregistration for MooncakeStore L2 adapter
  • [#3064] Add S3 L2 adapter for MP mode
  • [#3170] S3L2Adapter: support container credentials via boto3 delegate
  • [#3188] Fix S3 L2 adapter listener race
  • [#2631] Update MooncakestoreConnector to new store.setup(dict) api
  • [#3172] Add batch operations to Mooncake L2 adapter
  • [#3227] Per-operation dedicated worker pools to Mooncake L2 connector
  • [#3060] Add Hugging Face Buckets as built-in remote storage backend
  • [#3160] Add support for AZURE_BLOB NIXL backend
  • [#3152] Harden nixl storage backend transfer and config handling
  • [#2966] Add batched_contains() override for NixlDynamicStorageBackend
  • [#2989] fix: use unique device_id per OBJ register/deregister cycle
  • [#2861] Add nixl_endpoint_list for per-worker object-storage endpoint distribution
  • [#2635] io_uring support for Rust raw block backend
  • [#3169] Add option to skip raw block checkpoint load
  • [#3120] Support 3FS storage backend via Usrbio (native) APIs

PD Backend / Disaggregation

  • [#2972] Add bidirectional NIXL cache probe
  • [#3038] Fully async PD backend

CacheBlend

  • [#3062] Per-request root OTel span and SpanRegistry for CB server tracing
  • [#3179] Fix CB lookup correctness, thread safety, and store-complete race
  • [#3234] Dedup overlapping matches in cb_lookup_pre_computed
  • [#3092] [ROCm] Triton block-sparse attention backend for CacheBlend
  • [#3254] [CLI][CB] propagate CLI kvcache clear to CB fingerprint table
  • [#3276] Revert "[CLI][CB] propagate CLI kvcache clear to CB fingerprint tabe"

Bug Fixes

  • [#3149] fix(disk): defer cache-policy hit update until load succeeds in get_blocking()
  • [#3197] fix: missing lock when clearing metadata cache
  • [#3146] fix(#3104): per-instance FastAPI app to fix 503 on cache endpoints in TP=1 non-MP
  • [#3244] fix: the missing /api/ purge
  • [#3002] [Bug] Missing validate() in env-only config path of lmcache_get_or_create_config
  • [#3178] [Chore][HotFix] EC UT
  • [#3176] Exclude hyphenated tags from setuptools_scm git describe

CI/CD

  • [#3109] Fix nightly build image job GHA
  • [#3123] fix egress block and invalid PEP 440 version from variant tags
  • [#3117] change blend CI cuda version and nightly tags
  • [#3113] split logs in blend CI test
  • [#3125] Update github workflow to avoid duplicated UTs and artifact builds
  • [#3126] Use existing NGC base image tags for cu12.9 and cu13.0 builds
  • [#3127] Add k3 build pipeline for unit tests
  • [#3133] Add missing egress endpoint to nightly Docker build
  • [#3135] Skip flaky test_p2p_backend_with_controller
  • [#3141] Fix nightly Docker build broken by nightly-cu13 tag
  • [#3134] Skip test PyPI publish on stable release
  • [#3222] Fix flaky comprehensive tests
  • [#3239] Route unit job to k8s queue
  • [#3240] Expose prefiller/decoder/proxy logs as artifacts
  • [#3256] Pin blend test vLLM to cu129 channel to avoid CUDA-13 PyPI wheel
  • [#3275] Swap default to cu13 wheels and cu12.9 secondary
  • [#3295] Allow PyTorch/NVIDIA egress endpoints in release workflows
  • [#2969] Revert "[CI]: add full tag selectively"

CLI / Benchmarking

  • [#3195] Add prefix-suffix-tuner workload for tiered + Blending KV-cache
  • [#3220] Count requests as successful when stream has no content but usage reports tokens
  • [#3279] Fix slim install imports and smoke-test wheel in CI

Tests

  • [#3157] Add Comprehensive test for GDS backend

Documentation

  • [#3110] Add MP mode to the vLLM quickstart tab
  • [#3107] Update benchmarking guide to use lmcache bench engine CLI
  • [#3131] Add docs for MP HTTP endpoints descriptions
  • [#3173] Add docs goblin easter egg
  • [#3204] Add doc for fs native connector
  • [#3209] Update LMCache Recipes

Chore / Misc

  • [#3095] Update CODEOWNERS for mp_observability
  • [#3129] Add .gitignore to rust/raw_block
Source: README.md, updated 2026-05-15