FlashInfer - Browse /v0.6.7 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
flashinfer_jit_cache-0.6.7+cu130-cp39-abi3-manylinux_2_28_aarch64.whl	2026-03-25	2.0 GB	0
flashinfer_jit_cache-0.6.7+cu130-cp39-abi3-manylinux_2_28_x86_64.whl	2026-03-25	2.1 GB	0
flashinfer_jit_cache-0.6.7+cu129-cp39-abi3-manylinux_2_28_aarch64.whl	2026-03-25	1.9 GB	0
flashinfer_jit_cache-0.6.7+cu129-cp39-abi3-manylinux_2_28_x86_64.whl	2026-03-25	1.9 GB	0
flashinfer_jit_cache-0.6.7+cu128-cp39-abi3-manylinux_2_28_aarch64.whl	2026-03-25	1.3 GB	0
flashinfer_jit_cache-0.6.7+cu128-cp39-abi3-manylinux_2_28_x86_64.whl	2026-03-25	1.3 GB	0
flashinfer_cubin-0.6.7-py3-none-any.whl	2026-03-25	295.1 MB	0
flashinfer_python-0.6.7-py3-none-any.whl	2026-03-25	9.2 MB	0
flashinfer_python-0.6.7.tar.gz	2026-03-25	6.5 MB	0
README.md	2026-03-25	8.7 kB	0
Release v0.6.7 source code.tar.gz	2026-03-25	3.1 MB	0
Release v0.6.7 source code.zip	2026-03-25	4.1 MB	0
Totals: 12 Items		10.8 GB	0

What's Changed

perf(gdn): optimize MTP kernel with ILP rows and SMEM v caching by @ameynaik-hub in https://github.com/flashinfer-ai/flashinfer/pull/2618
Feat/gdn decode pooled by @xutizhou in https://github.com/flashinfer-ai/flashinfer/pull/2521
fix(jit): GEMM kernels produce NaN under concurrency — missing GDC flags cause PDL synchronization barriers to compile as no-ops by @voipmonitor in https://github.com/flashinfer-ai/flashinfer/pull/2716
Support NVFP4 KV cache decode on SM120 by @Tom-Zheng in https://github.com/flashinfer-ai/flashinfer/pull/2520
feat: Add TRTLLM fmha_v2 library for SM90 attention with Skip-Softmax by @jimmyzho in https://github.com/flashinfer-ai/flashinfer/pull/2446
bump version to 0.6.6 by @aleozlx in https://github.com/flashinfer-ai/flashinfer/pull/2724
[benchmark] Add All Reduce benchmark by @jiahanc in https://github.com/flashinfer-ai/flashinfer/pull/2696
Revert "fix(jit): GEMM kernels produce NaN under concurrency — missing GDC flags cause PDL synchronization barriers to compile as no-ops" by @aleozlx in https://github.com/flashinfer-ai/flashinfer/pull/2737
refactor: refactoring cuda code to cute-dsl (part 1) by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/2428
Added missing padding by @nvjullin in https://github.com/flashinfer-ai/flashinfer/pull/2726
docker: add CUDA 13.1 Dockerfiles with cuda-tile by @yongwww in https://github.com/flashinfer-ai/flashinfer/pull/2774
[BugFix] guard against uint32 underflow in multi-CTA TopK chunk calculation by @LopezCastroRoberto in https://github.com/flashinfer-ai/flashinfer/pull/2592
fix: guard CUTLASS FMHA against SM12x and fix fmha_v2 SM121a check by @blake-snc in https://github.com/flashinfer-ai/flashinfer/pull/2560
fix: fix illegal memory access for NaN input in sampling kernels by @zack041 in https://github.com/flashinfer-ai/flashinfer/pull/2456
Add cuda-tile to package dependencies by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/2758
tests: skip sliding window + fp8 to prevent hang in fmha_v2 unit tests by @jimmyzho in https://github.com/flashinfer-ai/flashinfer/pull/2781
feat: Add autotuner config caching, thread safety, and documentation by @bkryu in https://github.com/flashinfer-ai/flashinfer/pull/2554
fix: block PR merge when CI is skipped due to pending authorization by @yongwww in https://github.com/flashinfer-ai/flashinfer/pull/2761
[feat] Add air top-p algorithm by @qsang-nv in https://github.com/flashinfer-ai/flashinfer/pull/2752
[chore] Add jiahanc to moe related code owner by @jiahanc in https://github.com/flashinfer-ai/flashinfer/pull/2748
fix: Fix cute dsl moe failure with nvidia-cutlass-dsl >= 4.4.0 by @nv-yunzheq in https://github.com/flashinfer-ai/flashinfer/pull/2735
[Spark unit test debugging] Fix for tests/attention/test_trtllm_gen_mla.py by @kahyunnam in https://github.com/flashinfer-ai/flashinfer/pull/2750
[Spark unit test debugging] Fix for tests/gemm/test_groupwise_scaled_gemm_fp8.py by @kahyunnam in https://github.com/flashinfer-ai/flashinfer/pull/2751
[feat] Add 2048 experts and 32 Top K by @jiahanc in https://github.com/flashinfer-ai/flashinfer/pull/2744
perf: Performance tune cute dsl RMSNorm variants by @bkryu in https://github.com/flashinfer-ai/flashinfer/pull/2777
feat: Add FP4 KV cache quant/dequant kernels by @samuellees in https://github.com/flashinfer-ai/flashinfer/pull/2757
Add cute-dsl backends to mxfp[8,4]_quantization for future refactor by @bkryu in https://github.com/flashinfer-ai/flashinfer/pull/2443
feat: FP32 dtype output for BF16 matmuls (CUTLASS & cuDNN) by @raayandhar in https://github.com/flashinfer-ai/flashinfer/pull/2644
Create separate cuDNN handle per GPU by @dhiraj113 in https://github.com/flashinfer-ai/flashinfer/pull/2688
CuteDSL MoE fix redundant output buffer zeroing by @leejnau in https://github.com/flashinfer-ai/flashinfer/pull/2811
Add NVFP4 KV cache quantization support for SM100 by @sychen52 in https://github.com/flashinfer-ai/flashinfer/pull/2702
[fix] Bugfix 1367: fix VariableBlockSparseAttention buffer overflow by dynamically resizing kv_lens_buffer by @qsang-nv in https://github.com/flashinfer-ai/flashinfer/pull/2802
fix: Workaround org teams perm issue for approval purposes by @aleozlx in https://github.com/flashinfer-ai/flashinfer/pull/2816
Implement override shape support for cuDNN GEMM operations by @yanqinz2 in https://github.com/flashinfer-ai/flashinfer/pull/2790
feat: Add support for TRTLLM MXFP8 non-gated MoE with ReLU2 by @danisereb in https://github.com/flashinfer-ai/flashinfer/pull/2707
Upgrade cutlass 4.2.1 -> 4.4.2 by @kahyunnam in https://github.com/flashinfer-ai/flashinfer/pull/2798
chore: cute dsl nvfp4 moe clean up by @nv-yunzheq in https://github.com/flashinfer-ai/flashinfer/pull/2775
fix: Add SM120 (RTX Blackwell desktop) support for NVFP4 MoE kernels by @brandonmmusic-max in https://github.com/flashinfer-ai/flashinfer/pull/2725
Protect against null clusterUuid in mnnvl.py by @akshaver in https://github.com/flashinfer-ai/flashinfer/pull/2626
Deprecation for gated_delta_rule_mtp's intermediate_states_buffer=True by @kahyunnam in https://github.com/flashinfer-ai/flashinfer/pull/2730
fix: Autotuner _find_nearest_profile non-power-of-2 num_tokens, create launchers for all supported tileN in trtllm fused MoE by @amitz-nv in https://github.com/flashinfer-ai/flashinfer/pull/2821
fix(jit): enable GDC for CUTLASS GEMM PDL — SM100 flag only by @voipmonitor in https://github.com/flashinfer-ai/flashinfer/pull/2780
[Fmha] Sparse MLA decode kernel selection heuristics by @PerkzZheng in https://github.com/flashinfer-ai/flashinfer/pull/2836
fix: add missing re-exports for rmsnorm quant and fused_add_rmsnorm q… by @DevashishLal-CB in https://github.com/flashinfer-ai/flashinfer/pull/2783
Add varlen and speculative decoding support to selective state update by @roikoren755 in https://github.com/flashinfer-ai/flashinfer/pull/2700
[feat] trtllm-gen mxfp8 gemm by @IwakuraRein in https://github.com/flashinfer-ai/flashinfer/pull/2653
[Spark bug] Fix arch 12.1 -> "sm120a" flag for Spark, CUDA 12.9 by @kahyunnam in https://github.com/flashinfer-ai/flashinfer/pull/2839
skip per-pr for draft PRs by @aleozlx in https://github.com/flashinfer-ai/flashinfer/pull/2831
feat(gdn): add padding index guard for bf16 decode kernel by @kaixih in https://github.com/flashinfer-ai/flashinfer/pull/2810
docker: Add CUDA 13.2 Docker containers by @bkryu in https://github.com/flashinfer-ai/flashinfer/pull/2843
[fix] bugfix 1419: Add batch size shape validation in decode and prefill run() APIs by @qsang-nv in https://github.com/flashinfer-ai/flashinfer/pull/2801
Update Docker CI tags to 20260322-ff86ea0 by @flashinfer-bot in https://github.com/flashinfer-ai/flashinfer/pull/2854
feat: Expose TRT-LLM FMHA style paged KV Cache and page table layout by @DomBrown in https://github.com/flashinfer-ai/flashinfer/pull/2770
[Spark unit test] Adjust tolerance for test_xqa, test_logits_processor by @kahyunnam in https://github.com/flashinfer-ai/flashinfer/pull/2828
Mamba2 SSD Combined Forward Pass (Blackwell CuTe DSL Kernel) by @ishovkun in https://github.com/flashinfer-ai/flashinfer/pull/2709
bump version to 0.6.7 & fix api breaking changes by @aleozlx in https://github.com/flashinfer-ai/flashinfer/pull/2832
[Spark unit test debugging] Fix for tests/autotuner/test_autotuner_core.py by @kahyunnam in https://github.com/flashinfer-ai/flashinfer/pull/2867
fix: use current CUDA device instead of tp_rank for SymmDeviceMemory allocation by @fzyzcjy in https://github.com/flashinfer-ai/flashinfer/pull/2662

New Contributors

@voipmonitor made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/2716
@dhiraj113 made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/2688
@leejnau made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/2811
@sychen52 made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/2702
@yanqinz2 made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/2790
@brandonmmusic-max made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/2725
@akshaver made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/2626
@DevashishLal-CB made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/2783
@roikoren755 made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/2700

Full Changelog: https://github.com/flashinfer-ai/flashinfer/compare/v0.6.6...v0.6.7

Source: README.md, updated 2026-03-25

FlashInfer Files

FlashInfer: Kernel Library for LLM Serving

What's Changed

New Contributors

FlashInfer Files

FlashInfer: Kernel Library for LLM Serving

Get an email when there's a new version of FlashInfer

What's Changed

New Contributors