FlashInfer - Browse /v0.6.12 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
flashinfer_jit_cache-0.6.12+cu130-cp39-abi3-manylinux_2_28_aarch64.whl	2026-05-29	1.8 GB	0
flashinfer_jit_cache-0.6.12+cu130-cp39-abi3-manylinux_2_28_x86_64.whl	2026-05-29	1.5 GB	2
flashinfer_jit_cache-0.6.12+cu129-cp39-abi3-manylinux_2_28_aarch64.whl	2026-05-29	1.9 GB	1
flashinfer_jit_cache-0.6.12+cu129-cp39-abi3-manylinux_2_28_x86_64.whl	2026-05-29	1.9 GB	1
flashinfer_jit_cache-0.6.12+cu128-cp39-abi3-manylinux_2_28_aarch64.whl	2026-05-29	1.3 GB	1
flashinfer_jit_cache-0.6.12+cu128-cp39-abi3-manylinux_2_28_x86_64.whl	2026-05-29	1.3 GB	1
flashinfer_cubin-0.6.12-py3-none-any.whl	2026-05-29	447.5 MB	2
flashinfer_python-0.6.12-py3-none-any.whl	2026-05-29	14.0 MB	1
flashinfer_python-0.6.12.tar.gz	2026-05-29	9.5 MB	1
README.md	2026-05-29	8.5 kB	0
Release v0.6.12 source code.tar.gz	2026-05-29	4.3 MB	0
Release v0.6.12 source code.zip	2026-05-29	5.8 MB	0
Totals: 12 Items		10.2 GB	10

What's Changed

Loosened trtllm_ragged_attention_deepseek shape assertion by @nvjullin in https://github.com/flashinfer-ai/flashinfer/pull/3064
Update moe gemm by @IwakuraRein in https://github.com/flashinfer-ai/flashinfer/pull/3239
perf: optimize per-token nvfp4 quantization kernel. by @IwakuraRein in https://github.com/flashinfer-ai/flashinfer/pull/3237
build: add sccache-backed jit-cache builds and AOT diagnostics by @dierksen in https://github.com/flashinfer-ai/flashinfer/pull/3205
non-override tactic control by @yanqinz2 in https://github.com/flashinfer-ai/flashinfer/pull/3260
ci(jit-cache): limit sm110 builds to aarch64 by @dierksen in https://github.com/flashinfer-ai/flashinfer/pull/3275
feat(moe): add SM120 W4A16 b12x kernels by @lukealonso in https://github.com/flashinfer-ai/flashinfer/pull/3271
Add dynamic tokens-per-page TRTLLM-GEN GQA kernels by @PerkzZheng in https://github.com/flashinfer-ai/flashinfer/pull/3259
fix(cute_dsl/moe): unbias autotuner profiling for tile_size enumeration by @leejnau in https://github.com/flashinfer-ai/flashinfer/pull/3252
Support Kimi K2.5 H64 CuTe DSL MLA decode by @saltyminty in https://github.com/flashinfer-ai/flashinfer/pull/3235
feat: FP8 output support for CUTLASS MLA paged attention by @carlyou in https://github.com/flashinfer-ai/flashinfer/pull/2779
fix(jit): propagate -DNDEBUG to host-side cflags by @arpera in https://github.com/flashinfer-ai/flashinfer/pull/3278
feat: add SM120 fmha_v2 kernels to AOT pip wheel builds by @blake-snc in https://github.com/flashinfer-ai/flashinfer/pull/2885
bench(moe_deepseek): fix moe benchmark (supersedes [#2886]) by @leejnau in https://github.com/flashinfer-ai/flashinfer/pull/3292
fix(gdn_decode): widen pool indices to Int64 to prevent int32 element-offset overflow by @vadiklyutiy in https://github.com/flashinfer-ai/flashinfer/pull/3230
[chore] Add guard to blackwell GDN prefill by @jiahanc in https://github.com/flashinfer-ai/flashinfer/pull/3267
fix: remove over-strict K%4 assert in get_shuffle_matrix_sf_a_row_indices by @jimmyzho in https://github.com/flashinfer-ai/flashinfer/pull/3163
ci: isolate nightly package tests from source tree by @dierksen in https://github.com/flashinfer-ai/flashinfer/pull/3274
Fix [Spark unit test CI]: defer torch._dynamo.disable to avoid import-time crash in CI by @kahyunnam in https://github.com/flashinfer-ai/flashinfer/pull/3290
bench(moe_deepseek): scope autotune(True) to pre-warm only by @leejnau in https://github.com/flashinfer-ai/flashinfer/pull/3301
Improved simple mamba SSU kernel by @ishovkun in https://github.com/flashinfer-ai/flashinfer/pull/2962
add cuda tile dependency for cuda 13.0 by @nv-yunzheq in https://github.com/flashinfer-ai/flashinfer/pull/3305
[Fix] Fix XQA V tile reading from wrong page when nbVItersPerXIter > 1 by @qsang-nv in https://github.com/flashinfer-ai/flashinfer/pull/3022
fix: MNNVL Allreduce uses bitwise sentinel checking to avoid subnormal value issue (#3053) by @timlee0212 in https://github.com/flashinfer-ai/flashinfer/pull/3304
Fix: remove nvfp4 llama4 blocker by @IwakuraRein in https://github.com/flashinfer-ai/flashinfer/pull/3313
[chore] add mamba codeowners list by @jimmyzho in https://github.com/flashinfer-ai/flashinfer/pull/3318
Modify release deletion command in workflow by @aleozlx in https://github.com/flashinfer-ai/flashinfer/pull/3307
Add to code owners by @dhiraj113 in https://github.com/flashinfer-ai/flashinfer/pull/3326
feat: Add CuTe DSL grouped-gemm + combine fusion support by @nvcastet in https://github.com/flashinfer-ai/flashinfer/pull/2944
fix(gdn): allow importing gdn_decode without a CUDA device by @kahyunnam in https://github.com/flashinfer-ai/flashinfer/pull/3293
feat: enable glm5 router gemm by @b8zhong in https://github.com/flashinfer-ai/flashinfer/pull/3185
fix(fmha_v2): fix FP8 V-scratch pipeline and varlen scheduler on SM90 by @jimmyzho in https://github.com/flashinfer-ai/flashinfer/pull/3276
fix typo llama routing issue in trtllm-gen moe by @IwakuraRein in https://github.com/flashinfer-ai/flashinfer/pull/3303
feat(logging,trace): cuda-graph-compatible level-5/10 logging + fi_trace template additions/fixes by @yyihuang in https://github.com/flashinfer-ai/flashinfer/pull/3172
Use cudnn 9.23 new API to query workspace with override shape by @yanqinz2 in https://github.com/flashinfer-ai/flashinfer/pull/3291
feat: Expose unpacked topk weights for routed moe (fp4) by @aleozlx in https://github.com/flashinfer-ai/flashinfer/pull/2425
Reland support lse in trtllm paged attn kernels by @murphymatt in https://github.com/flashinfer-ai/flashinfer/pull/3116
fix(CI unit tests, cute_dsl, spark): set USER env var before torch._dynamo import for unmapped UIDs by @kahyunnam in https://github.com/flashinfer-ai/flashinfer/pull/3314
feat(trace): embed runnable init() in every TraceTemplate by @yyihuang in https://github.com/flashinfer-ai/flashinfer/pull/3221
feat(cute_dsl/moe): deterministic balanced autotune profile inputs by @leejnau in https://github.com/flashinfer-ai/flashinfer/pull/3286
feat(cute_dsl/moe): add moe_output_memset_inplace dense memset wrapper by @leejnau in https://github.com/flashinfer-ai/flashinfer/pull/3328
Fix/3170 dense blockscaled sm12x by @leonardHONG in https://github.com/flashinfer-ai/flashinfer/pull/3180
test: enable bmm_mxfp8 cutlass backend coverage on SM12x by @leonardHONG in https://github.com/flashinfer-ai/flashinfer/pull/3183
Ep api design - Build Infra dependencies by @Anerudhan in https://github.com/flashinfer-ai/flashinfer/pull/3315
[feat] Add gemma RMS AR fusion by @jiahanc in https://github.com/flashinfer-ai/flashinfer/pull/3322
checkpointing_ssu kernel: fused replay + conditional state-write for Mamba2 by @ishovkun in https://github.com/flashinfer-ai/flashinfer/pull/3324
Ameyn/gdn bf16 dispatcher and 4d pool by @ameynaik-hub in https://github.com/flashinfer-ai/flashinfer/pull/3268
Update trtllm FMHA cubins by @djmmoss in https://github.com/flashinfer-ai/flashinfer/pull/3317
fix(trace): repair TGV and XQA MLA reference tests by @yyihuang in https://github.com/flashinfer-ai/flashinfer/pull/3365
feat: Add 8x4 swizzle layout support to MXFP4 and MXFP8 CuTe-DSL kernels by @bkryu in https://github.com/flashinfer-ai/flashinfer/pull/3357
Add AGENTS.md shim by @aleozlx in https://github.com/flashinfer-ai/flashinfer/pull/3342
Add list_api script by @aleozlx in https://github.com/flashinfer-ai/flashinfer/pull/3341
Support 4over6 nvfp4 for quantizer and fused MoE by @zianglih in https://github.com/flashinfer-ai/flashinfer/pull/3264
Add DeepSeek V4 sparse MLA TRTLLM-GEN kernels by @PerkzZheng in https://github.com/flashinfer-ai/flashinfer/pull/3269
Reject EP configurations in b12x MoE with a clear error by @kahyunnam in https://github.com/flashinfer-ai/flashinfer/pull/3302
fix(cute_dsl): avoid MoE wrapper runner reference cycle by @leejnau in https://github.com/flashinfer-ai/flashinfer/pull/3340
feat: Add support for LoRa delta in MOE mxint4 x bf16, MXFP8 & BF16 to trtllm backend by @djns99 in https://github.com/flashinfer-ai/flashinfer/pull/3153
Restore monolithic CuTe-DSL MLA decode alongside modular, gated by cute_dsl_impl= by @pgera in https://github.com/flashinfer-ai/flashinfer/pull/3296
feat: RMSNorm + RoPE fusion for WAN: flashinfer.diffusion_ops.fused_qk_rmsnorm_rope by @kahyunnam in https://github.com/flashinfer-ai/flashinfer/pull/3148
fix deprecation warnings from cute-dsl by @b8zhong in https://github.com/flashinfer-ai/flashinfer/pull/3333
feat(cute_dsl/moe): re-enable use_cold_l2_cache in CuteDslMoEWrapper TuningConfig by @leejnau in https://github.com/flashinfer-ai/flashinfer/pull/3384
Add torch.compile-compatible custom op for fp4_quantize by @Kh4L in https://github.com/flashinfer-ai/flashinfer/pull/3081
Replace SM120 W4A16 MoE kernels by @lukealonso in https://github.com/flashinfer-ai/flashinfer/pull/3336
bump version to 0.6.12 by @aleozlx in https://github.com/flashinfer-ai/flashinfer/pull/3388

New Contributors

@carlyou made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/2779
@nvcastet made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/2944
@Kh4L made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/3081

Full Changelog: https://github.com/flashinfer-ai/flashinfer/compare/v0.6.11rc1...v0.6.12

Source: README.md, updated 2026-05-29

FlashInfer Files

FlashInfer: Kernel Library for LLM Serving

What's Changed

New Contributors

FlashInfer Files

FlashInfer: Kernel Library for LLM Serving

Get an email when there's a new version of FlashInfer

What's Changed

New Contributors