| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| flashinfer_jit_cache-0.6.7+cu130-cp39-abi3-manylinux_2_28_aarch64.whl | 2026-03-25 | 2.0 GB | |
| flashinfer_jit_cache-0.6.7+cu130-cp39-abi3-manylinux_2_28_x86_64.whl | 2026-03-25 | 2.1 GB | |
| flashinfer_jit_cache-0.6.7+cu129-cp39-abi3-manylinux_2_28_aarch64.whl | 2026-03-25 | 1.9 GB | |
| flashinfer_jit_cache-0.6.7+cu129-cp39-abi3-manylinux_2_28_x86_64.whl | 2026-03-25 | 1.9 GB | |
| flashinfer_jit_cache-0.6.7+cu128-cp39-abi3-manylinux_2_28_aarch64.whl | 2026-03-25 | 1.3 GB | |
| flashinfer_jit_cache-0.6.7+cu128-cp39-abi3-manylinux_2_28_x86_64.whl | 2026-03-25 | 1.3 GB | |
| flashinfer_cubin-0.6.7-py3-none-any.whl | 2026-03-25 | 295.1 MB | |
| flashinfer_python-0.6.7-py3-none-any.whl | 2026-03-25 | 9.2 MB | |
| flashinfer_python-0.6.7.tar.gz | 2026-03-25 | 6.5 MB | |
| README.md | 2026-03-25 | 8.7 kB | |
| Release v0.6.7 source code.tar.gz | 2026-03-25 | 3.1 MB | |
| Release v0.6.7 source code.zip | 2026-03-25 | 4.1 MB | |
| Totals: 12 Items | 10.8 GB | 0 | |
What's Changed
- perf(gdn): optimize MTP kernel with ILP rows and SMEM v caching by @ameynaik-hub in https://github.com/flashinfer-ai/flashinfer/pull/2618
- Feat/gdn decode pooled by @xutizhou in https://github.com/flashinfer-ai/flashinfer/pull/2521
- fix(jit): GEMM kernels produce NaN under concurrency — missing GDC flags cause PDL synchronization barriers to compile as no-ops by @voipmonitor in https://github.com/flashinfer-ai/flashinfer/pull/2716
- Support NVFP4 KV cache decode on SM120 by @Tom-Zheng in https://github.com/flashinfer-ai/flashinfer/pull/2520
- feat: Add TRTLLM fmha_v2 library for SM90 attention with Skip-Softmax by @jimmyzho in https://github.com/flashinfer-ai/flashinfer/pull/2446
- bump version to 0.6.6 by @aleozlx in https://github.com/flashinfer-ai/flashinfer/pull/2724
- [benchmark] Add All Reduce benchmark by @jiahanc in https://github.com/flashinfer-ai/flashinfer/pull/2696
- Revert "fix(jit): GEMM kernels produce NaN under concurrency — missing GDC flags cause PDL synchronization barriers to compile as no-ops" by @aleozlx in https://github.com/flashinfer-ai/flashinfer/pull/2737
- refactor: refactoring cuda code to cute-dsl (part 1) by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/2428
- Added missing padding by @nvjullin in https://github.com/flashinfer-ai/flashinfer/pull/2726
- docker: add CUDA 13.1 Dockerfiles with cuda-tile by @yongwww in https://github.com/flashinfer-ai/flashinfer/pull/2774
- [BugFix] guard against uint32 underflow in multi-CTA TopK chunk calculation by @LopezCastroRoberto in https://github.com/flashinfer-ai/flashinfer/pull/2592
- fix: guard CUTLASS FMHA against SM12x and fix fmha_v2 SM121a check by @blake-snc in https://github.com/flashinfer-ai/flashinfer/pull/2560
- fix: fix illegal memory access for NaN input in sampling kernels by @zack041 in https://github.com/flashinfer-ai/flashinfer/pull/2456
- Add cuda-tile to package dependencies by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/2758
- tests: skip sliding window + fp8 to prevent hang in fmha_v2 unit tests by @jimmyzho in https://github.com/flashinfer-ai/flashinfer/pull/2781
- feat: Add autotuner config caching, thread safety, and documentation by @bkryu in https://github.com/flashinfer-ai/flashinfer/pull/2554
- fix: block PR merge when CI is skipped due to pending authorization by @yongwww in https://github.com/flashinfer-ai/flashinfer/pull/2761
- [feat] Add air top-p algorithm by @qsang-nv in https://github.com/flashinfer-ai/flashinfer/pull/2752
- [chore] Add jiahanc to moe related code owner by @jiahanc in https://github.com/flashinfer-ai/flashinfer/pull/2748
- fix: Fix cute dsl moe failure with nvidia-cutlass-dsl >= 4.4.0 by @nv-yunzheq in https://github.com/flashinfer-ai/flashinfer/pull/2735
- [Spark unit test debugging] Fix for tests/attention/test_trtllm_gen_mla.py by @kahyunnam in https://github.com/flashinfer-ai/flashinfer/pull/2750
- [Spark unit test debugging] Fix for tests/gemm/test_groupwise_scaled_gemm_fp8.py by @kahyunnam in https://github.com/flashinfer-ai/flashinfer/pull/2751
- [feat] Add 2048 experts and 32 Top K by @jiahanc in https://github.com/flashinfer-ai/flashinfer/pull/2744
- perf: Performance tune cute dsl RMSNorm variants by @bkryu in https://github.com/flashinfer-ai/flashinfer/pull/2777
- feat: Add FP4 KV cache quant/dequant kernels by @samuellees in https://github.com/flashinfer-ai/flashinfer/pull/2757
- Add cute-dsl backends to mxfp[8,4]_quantization for future refactor by @bkryu in https://github.com/flashinfer-ai/flashinfer/pull/2443
- feat: FP32 dtype output for BF16 matmuls (CUTLASS & cuDNN) by @raayandhar in https://github.com/flashinfer-ai/flashinfer/pull/2644
- Create separate cuDNN handle per GPU by @dhiraj113 in https://github.com/flashinfer-ai/flashinfer/pull/2688
- CuteDSL MoE fix redundant output buffer zeroing by @leejnau in https://github.com/flashinfer-ai/flashinfer/pull/2811
- Add NVFP4 KV cache quantization support for SM100 by @sychen52 in https://github.com/flashinfer-ai/flashinfer/pull/2702
- [fix] Bugfix 1367: fix VariableBlockSparseAttention buffer overflow by dynamically resizing kv_lens_buffer by @qsang-nv in https://github.com/flashinfer-ai/flashinfer/pull/2802
- fix: Workaround org teams perm issue for approval purposes by @aleozlx in https://github.com/flashinfer-ai/flashinfer/pull/2816
- Implement override shape support for cuDNN GEMM operations by @yanqinz2 in https://github.com/flashinfer-ai/flashinfer/pull/2790
- feat: Add support for TRTLLM MXFP8 non-gated MoE with ReLU2 by @danisereb in https://github.com/flashinfer-ai/flashinfer/pull/2707
- Upgrade cutlass 4.2.1 -> 4.4.2 by @kahyunnam in https://github.com/flashinfer-ai/flashinfer/pull/2798
- chore: cute dsl nvfp4 moe clean up by @nv-yunzheq in https://github.com/flashinfer-ai/flashinfer/pull/2775
- fix: Add SM120 (RTX Blackwell desktop) support for NVFP4 MoE kernels by @brandonmmusic-max in https://github.com/flashinfer-ai/flashinfer/pull/2725
- Protect against null clusterUuid in mnnvl.py by @akshaver in https://github.com/flashinfer-ai/flashinfer/pull/2626
- Deprecation for gated_delta_rule_mtp's intermediate_states_buffer=True by @kahyunnam in https://github.com/flashinfer-ai/flashinfer/pull/2730
- fix: Autotuner _find_nearest_profile non-power-of-2 num_tokens, create launchers for all supported tileN in trtllm fused MoE by @amitz-nv in https://github.com/flashinfer-ai/flashinfer/pull/2821
- fix(jit): enable GDC for CUTLASS GEMM PDL — SM100 flag only by @voipmonitor in https://github.com/flashinfer-ai/flashinfer/pull/2780
- [Fmha] Sparse MLA decode kernel selection heuristics by @PerkzZheng in https://github.com/flashinfer-ai/flashinfer/pull/2836
- fix: add missing re-exports for rmsnorm quant and fused_add_rmsnorm q… by @DevashishLal-CB in https://github.com/flashinfer-ai/flashinfer/pull/2783
- Add varlen and speculative decoding support to selective state update by @roikoren755 in https://github.com/flashinfer-ai/flashinfer/pull/2700
- [feat] trtllm-gen mxfp8 gemm by @IwakuraRein in https://github.com/flashinfer-ai/flashinfer/pull/2653
- [Spark bug] Fix arch 12.1 -> "sm120a" flag for Spark, CUDA 12.9 by @kahyunnam in https://github.com/flashinfer-ai/flashinfer/pull/2839
- skip per-pr for draft PRs by @aleozlx in https://github.com/flashinfer-ai/flashinfer/pull/2831
- feat(gdn): add padding index guard for bf16 decode kernel by @kaixih in https://github.com/flashinfer-ai/flashinfer/pull/2810
- docker: Add CUDA 13.2 Docker containers by @bkryu in https://github.com/flashinfer-ai/flashinfer/pull/2843
- [fix] bugfix 1419: Add batch size shape validation in decode and prefill run() APIs by @qsang-nv in https://github.com/flashinfer-ai/flashinfer/pull/2801
- Update Docker CI tags to 20260322-ff86ea0 by @flashinfer-bot in https://github.com/flashinfer-ai/flashinfer/pull/2854
- feat: Expose TRT-LLM FMHA style paged KV Cache and page table layout by @DomBrown in https://github.com/flashinfer-ai/flashinfer/pull/2770
- [Spark unit test] Adjust tolerance for test_xqa, test_logits_processor by @kahyunnam in https://github.com/flashinfer-ai/flashinfer/pull/2828
- Mamba2 SSD Combined Forward Pass (Blackwell CuTe DSL Kernel) by @ishovkun in https://github.com/flashinfer-ai/flashinfer/pull/2709
- bump version to 0.6.7 & fix api breaking changes by @aleozlx in https://github.com/flashinfer-ai/flashinfer/pull/2832
- [Spark unit test debugging] Fix for tests/autotuner/test_autotuner_core.py by @kahyunnam in https://github.com/flashinfer-ai/flashinfer/pull/2867
- fix: use current CUDA device instead of tp_rank for SymmDeviceMemory allocation by @fzyzcjy in https://github.com/flashinfer-ai/flashinfer/pull/2662
New Contributors
- @voipmonitor made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/2716
- @dhiraj113 made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/2688
- @leejnau made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/2811
- @sychen52 made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/2702
- @yanqinz2 made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/2790
- @brandonmmusic-max made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/2725
- @akshaver made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/2626
- @DevashishLal-CB made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/2783
- @roikoren755 made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/2700
Full Changelog: https://github.com/flashinfer-ai/flashinfer/compare/v0.6.6...v0.6.7