| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| flashinfer_jit_cache-0.6.12+cu130-cp39-abi3-manylinux_2_28_aarch64.whl | 2026-05-29 | 1.8 GB | |
| flashinfer_jit_cache-0.6.12+cu130-cp39-abi3-manylinux_2_28_x86_64.whl | 2026-05-29 | 1.5 GB | |
| flashinfer_jit_cache-0.6.12+cu129-cp39-abi3-manylinux_2_28_aarch64.whl | 2026-05-29 | 1.9 GB | |
| flashinfer_jit_cache-0.6.12+cu129-cp39-abi3-manylinux_2_28_x86_64.whl | 2026-05-29 | 1.9 GB | |
| flashinfer_jit_cache-0.6.12+cu128-cp39-abi3-manylinux_2_28_aarch64.whl | 2026-05-29 | 1.3 GB | |
| flashinfer_jit_cache-0.6.12+cu128-cp39-abi3-manylinux_2_28_x86_64.whl | 2026-05-29 | 1.3 GB | |
| flashinfer_cubin-0.6.12-py3-none-any.whl | 2026-05-29 | 447.5 MB | |
| flashinfer_python-0.6.12-py3-none-any.whl | 2026-05-29 | 14.0 MB | |
| flashinfer_python-0.6.12.tar.gz | 2026-05-29 | 9.5 MB | |
| README.md | 2026-05-29 | 8.5 kB | |
| Release v0.6.12 source code.tar.gz | 2026-05-29 | 4.3 MB | |
| Release v0.6.12 source code.zip | 2026-05-29 | 5.8 MB | |
| Totals: 12 Items | 10.2 GB | 10 | |
What's Changed
- Loosened trtllm_ragged_attention_deepseek shape assertion by @nvjullin in https://github.com/flashinfer-ai/flashinfer/pull/3064
- Update moe gemm by @IwakuraRein in https://github.com/flashinfer-ai/flashinfer/pull/3239
- perf: optimize per-token nvfp4 quantization kernel. by @IwakuraRein in https://github.com/flashinfer-ai/flashinfer/pull/3237
- build: add sccache-backed jit-cache builds and AOT diagnostics by @dierksen in https://github.com/flashinfer-ai/flashinfer/pull/3205
- non-override tactic control by @yanqinz2 in https://github.com/flashinfer-ai/flashinfer/pull/3260
- ci(jit-cache): limit sm110 builds to aarch64 by @dierksen in https://github.com/flashinfer-ai/flashinfer/pull/3275
- feat(moe): add SM120 W4A16 b12x kernels by @lukealonso in https://github.com/flashinfer-ai/flashinfer/pull/3271
- Add dynamic tokens-per-page TRTLLM-GEN GQA kernels by @PerkzZheng in https://github.com/flashinfer-ai/flashinfer/pull/3259
- fix(cute_dsl/moe): unbias autotuner profiling for tile_size enumeration by @leejnau in https://github.com/flashinfer-ai/flashinfer/pull/3252
- Support Kimi K2.5 H64 CuTe DSL MLA decode by @saltyminty in https://github.com/flashinfer-ai/flashinfer/pull/3235
- feat: FP8 output support for CUTLASS MLA paged attention by @carlyou in https://github.com/flashinfer-ai/flashinfer/pull/2779
- fix(jit): propagate -DNDEBUG to host-side cflags by @arpera in https://github.com/flashinfer-ai/flashinfer/pull/3278
- feat: add SM120 fmha_v2 kernels to AOT pip wheel builds by @blake-snc in https://github.com/flashinfer-ai/flashinfer/pull/2885
- bench(moe_deepseek): fix moe benchmark (supersedes [#2886]) by @leejnau in https://github.com/flashinfer-ai/flashinfer/pull/3292
- fix(gdn_decode): widen pool indices to Int64 to prevent int32 element-offset overflow by @vadiklyutiy in https://github.com/flashinfer-ai/flashinfer/pull/3230
- [chore] Add guard to blackwell GDN prefill by @jiahanc in https://github.com/flashinfer-ai/flashinfer/pull/3267
- fix: remove over-strict K%4 assert in get_shuffle_matrix_sf_a_row_indices by @jimmyzho in https://github.com/flashinfer-ai/flashinfer/pull/3163
- ci: isolate nightly package tests from source tree by @dierksen in https://github.com/flashinfer-ai/flashinfer/pull/3274
- Fix [Spark unit test CI]: defer torch._dynamo.disable to avoid import-time crash in CI by @kahyunnam in https://github.com/flashinfer-ai/flashinfer/pull/3290
- bench(moe_deepseek): scope autotune(True) to pre-warm only by @leejnau in https://github.com/flashinfer-ai/flashinfer/pull/3301
- Improved
simplemamba SSU kernel by @ishovkun in https://github.com/flashinfer-ai/flashinfer/pull/2962 - add cuda tile dependency for cuda 13.0 by @nv-yunzheq in https://github.com/flashinfer-ai/flashinfer/pull/3305
- [Fix] Fix XQA V tile reading from wrong page when nbVItersPerXIter > 1 by @qsang-nv in https://github.com/flashinfer-ai/flashinfer/pull/3022
- fix: MNNVL Allreduce uses bitwise sentinel checking to avoid subnormal value issue (#3053) by @timlee0212 in https://github.com/flashinfer-ai/flashinfer/pull/3304
- Fix: remove nvfp4 llama4 blocker by @IwakuraRein in https://github.com/flashinfer-ai/flashinfer/pull/3313
- [chore] add mamba codeowners list by @jimmyzho in https://github.com/flashinfer-ai/flashinfer/pull/3318
- Modify release deletion command in workflow by @aleozlx in https://github.com/flashinfer-ai/flashinfer/pull/3307
- Add to code owners by @dhiraj113 in https://github.com/flashinfer-ai/flashinfer/pull/3326
- feat: Add CuTe DSL grouped-gemm + combine fusion support by @nvcastet in https://github.com/flashinfer-ai/flashinfer/pull/2944
- fix(gdn): allow importing gdn_decode without a CUDA device by @kahyunnam in https://github.com/flashinfer-ai/flashinfer/pull/3293
- feat: enable glm5 router gemm by @b8zhong in https://github.com/flashinfer-ai/flashinfer/pull/3185
- fix(fmha_v2): fix FP8 V-scratch pipeline and varlen scheduler on SM90 by @jimmyzho in https://github.com/flashinfer-ai/flashinfer/pull/3276
- fix typo llama routing issue in trtllm-gen moe by @IwakuraRein in https://github.com/flashinfer-ai/flashinfer/pull/3303
- feat(logging,trace): cuda-graph-compatible level-5/10 logging + fi_trace template additions/fixes by @yyihuang in https://github.com/flashinfer-ai/flashinfer/pull/3172
- Use cudnn 9.23 new API to query workspace with override shape by @yanqinz2 in https://github.com/flashinfer-ai/flashinfer/pull/3291
- feat: Expose unpacked topk weights for routed moe (fp4) by @aleozlx in https://github.com/flashinfer-ai/flashinfer/pull/2425
- Reland support lse in trtllm paged attn kernels by @murphymatt in https://github.com/flashinfer-ai/flashinfer/pull/3116
- fix(CI unit tests, cute_dsl, spark): set USER env var before torch._dynamo import for unmapped UIDs by @kahyunnam in https://github.com/flashinfer-ai/flashinfer/pull/3314
- feat(trace): embed runnable init() in every TraceTemplate by @yyihuang in https://github.com/flashinfer-ai/flashinfer/pull/3221
- feat(cute_dsl/moe): deterministic balanced autotune profile inputs by @leejnau in https://github.com/flashinfer-ai/flashinfer/pull/3286
- feat(cute_dsl/moe): add
moe_output_memset_inplacedense memset wrapper by @leejnau in https://github.com/flashinfer-ai/flashinfer/pull/3328 - Fix/3170 dense blockscaled sm12x by @leonardHONG in https://github.com/flashinfer-ai/flashinfer/pull/3180
- test: enable bmm_mxfp8 cutlass backend coverage on SM12x by @leonardHONG in https://github.com/flashinfer-ai/flashinfer/pull/3183
- Ep api design - Build Infra dependencies by @Anerudhan in https://github.com/flashinfer-ai/flashinfer/pull/3315
- [feat] Add gemma RMS AR fusion by @jiahanc in https://github.com/flashinfer-ai/flashinfer/pull/3322
- checkpointing_ssu kernel: fused replay + conditional state-write for Mamba2 by @ishovkun in https://github.com/flashinfer-ai/flashinfer/pull/3324
- Ameyn/gdn bf16 dispatcher and 4d pool by @ameynaik-hub in https://github.com/flashinfer-ai/flashinfer/pull/3268
- Update trtllm FMHA cubins by @djmmoss in https://github.com/flashinfer-ai/flashinfer/pull/3317
- fix(trace): repair TGV and XQA MLA reference tests by @yyihuang in https://github.com/flashinfer-ai/flashinfer/pull/3365
- feat: Add 8x4 swizzle layout support to MXFP4 and MXFP8 CuTe-DSL kernels by @bkryu in https://github.com/flashinfer-ai/flashinfer/pull/3357
- Add AGENTS.md shim by @aleozlx in https://github.com/flashinfer-ai/flashinfer/pull/3342
- Add list_api script by @aleozlx in https://github.com/flashinfer-ai/flashinfer/pull/3341
- Support 4over6 nvfp4 for quantizer and fused MoE by @zianglih in https://github.com/flashinfer-ai/flashinfer/pull/3264
- Add DeepSeek V4 sparse MLA TRTLLM-GEN kernels by @PerkzZheng in https://github.com/flashinfer-ai/flashinfer/pull/3269
- Reject EP configurations in b12x MoE with a clear error by @kahyunnam in https://github.com/flashinfer-ai/flashinfer/pull/3302
- fix(cute_dsl): avoid MoE wrapper runner reference cycle by @leejnau in https://github.com/flashinfer-ai/flashinfer/pull/3340
- feat: Add support for LoRa delta in MOE mxint4 x bf16, MXFP8 & BF16 to trtllm backend by @djns99 in https://github.com/flashinfer-ai/flashinfer/pull/3153
- Restore monolithic CuTe-DSL MLA decode alongside modular, gated by cute_dsl_impl= by @pgera in https://github.com/flashinfer-ai/flashinfer/pull/3296
- feat: RMSNorm + RoPE fusion for WAN: flashinfer.diffusion_ops.fused_qk_rmsnorm_rope by @kahyunnam in https://github.com/flashinfer-ai/flashinfer/pull/3148
- fix deprecation warnings from cute-dsl by @b8zhong in https://github.com/flashinfer-ai/flashinfer/pull/3333
- feat(cute_dsl/moe): re-enable use_cold_l2_cache in CuteDslMoEWrapper TuningConfig by @leejnau in https://github.com/flashinfer-ai/flashinfer/pull/3384
- Add torch.compile-compatible custom op for fp4_quantize by @Kh4L in https://github.com/flashinfer-ai/flashinfer/pull/3081
- Replace SM120 W4A16 MoE kernels by @lukealonso in https://github.com/flashinfer-ai/flashinfer/pull/3336
- bump version to 0.6.12 by @aleozlx in https://github.com/flashinfer-ai/flashinfer/pull/3388
New Contributors
- @carlyou made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/2779
- @nvcastet made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/2944
- @Kh4L made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/3081
Full Changelog: https://github.com/flashinfer-ai/flashinfer/compare/v0.6.11rc1...v0.6.12