FlashInfer - Browse /v0.2.6 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
README.md	2025-06-06	10.0 kB	0
v0.2.6 source code.tar.gz	2025-06-06	1.1 MB	0
v0.2.6 source code.zip	2025-06-06	1.5 MB	0
Totals: 3 Items		2.6 MB	0

What's Changed

ci: select 2_28 manylinux builder for new torch+cuda versions by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/1000
misc: update REAMDME.md by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/1003
bugfix: Fix illegal memory access due to custom mask ptr by @yongchaoding in https://github.com/flashinfer-ai/flashinfer/pull/1008
misc: fix kv-layout doc references by @Edenzzzz in https://github.com/flashinfer-ai/flashinfer/pull/1009
misc: more benchmark scripts in Python by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/1010
misc: fix instrument code for mla profiler by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/1014
bugfix: import wrapper of mla decode by @dhy2000 in https://github.com/flashinfer-ai/flashinfer/pull/1013
feat: update decode attention APIs by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/1007
doc: use latest protobuf for profiler by @xslingcn in https://github.com/flashinfer-ai/flashinfer/pull/1021
feat: SM-constraint Communication Kernels by @yyihuang in https://github.com/flashinfer-ai/flashinfer/pull/994
feat: ragged tensor padding kernel for blackwell kernel alignment by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/1025
bugfix: fix custom mask not be reseted after convert custom mask into causal or non-causal by @yongchaoding in https://github.com/flashinfer-ai/flashinfer/pull/1028
fix: add zero init for KV tiled copy by @happierpig in https://github.com/flashinfer-ai/flashinfer/pull/1029
[NVIDIA] Add Cutlass MLA backend by @kaixih in https://github.com/flashinfer-ai/flashinfer/pull/1031
Add workflow to build aarch64 wheel by @yongwww in https://github.com/flashinfer-ai/flashinfer/pull/1036
Non-blocking host-to-device copy in the ragged prefill wrapper by @nandor in https://github.com/flashinfer-ai/flashinfer/pull/1040
fix: remove default ubuntu user in Lunar/Noble by @rickyfeng0119 in https://github.com/flashinfer-ai/flashinfer/pull/1042
feat: Softmax free sampling by @kf-zhang in https://github.com/flashinfer-ai/flashinfer/pull/1035
feat: add functional per-head FP8 quantization for FA3 by @happierpig in https://github.com/flashinfer-ai/flashinfer/pull/1033
add multi-item scoring by @arde171 in https://github.com/flashinfer-ai/flashinfer/pull/1015
[nvidia] cutlass fp8 blockwise/groupwise gemm support by @cyx-6 in https://github.com/flashinfer-ai/flashinfer/pull/1045
[nvidia] cutlass fp8 groupwise grouped gemm support by @cyx-6 in https://github.com/flashinfer-ai/flashinfer/pull/1047
fix: top_k_mask_logits hangs on -inf inputs by @xslingcn in https://github.com/flashinfer-ai/flashinfer/pull/1050
Benchmark: POD vs batched prefill by @Edenzzzz in https://github.com/flashinfer-ai/flashinfer/pull/1052
[nvidia] initial support for blackwell kernels by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/1039
Fix KV chunking for POD. by @AKKamath in https://github.com/flashinfer-ai/flashinfer/pull/1054
bugfix: temporally disable split-kv in blackwell mla by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/1055
bugfix: remove device allocation by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/1056
Parameterize prefix mask call (needed by POD-Attention) by @AKKamath in https://github.com/flashinfer-ai/flashinfer/pull/1059
bugfix: move cum_m calculation inside kernels by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/1060
misc: add pull request template by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/1062
bugfix: Cast build paths to str before setuputils Extension by @farnasirim in https://github.com/flashinfer-ai/flashinfer/pull/1058
Add PyTorch 2.7.0 build by @huydhn in https://github.com/flashinfer-ai/flashinfer/pull/1063
bugfix: adding lse output to blackwell fmha kernels by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/1071
bugfix: follow user-specified sm_scale for blackwell cutlass fmha by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/1072
misc: jit: Introduce JitSpec and Generate ninja file by @abcdabcd987 in https://github.com/flashinfer-ai/flashinfer/pull/1065
fix: fix a typo in docs by @acelyc111 in https://github.com/flashinfer-ai/flashinfer/pull/1077
misc: jit: Deprecate load_cuda_ops() by @abcdabcd987 in https://github.com/flashinfer-ai/flashinfer/pull/1066
misc: jit: fix missing _get_glibcxx_abi_build_flags by @abcdabcd987 in https://github.com/flashinfer-ai/flashinfer/pull/1080
misc: jit: Refactor gen JitSpec out of get_xxx_module by @abcdabcd987 in https://github.com/flashinfer-ai/flashinfer/pull/1069
misc: jit: Replace parallel_load_modules() with build_jit_specs() by @abcdabcd987 in https://github.com/flashinfer-ai/flashinfer/pull/1070
misc: jit: Import jit_env as a module by @abcdabcd987 in https://github.com/flashinfer-ai/flashinfer/pull/1073
misc: aot: Add script to build all AOT ops by @abcdabcd987 in https://github.com/flashinfer-ai/flashinfer/pull/1067
misc: aot: Refactor AOT packaging by @abcdabcd987 in https://github.com/flashinfer-ai/flashinfer/pull/1075
misc: aot: Remove has_prebuilt_ops by @abcdabcd987 in https://github.com/flashinfer-ai/flashinfer/pull/1076
ci: upgrade docker ci image by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/1082
bugfix: fix custom allreduce compilation in AOT mode by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/1083
perf: accelerate blackwell grouped gemm by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/1086
misc: update pull request template by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/1088
Fix Cutlass grouped GEMM stride by @cyx-6 in https://github.com/flashinfer-ai/flashinfer/pull/1081
bugfix: fix fp8 attention kernels aot compilation issue by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/1087
comm: refactor and initialize flashinfer.comm module by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/1089
misc: cleanup by @b8zhong in https://github.com/flashinfer-ai/flashinfer/pull/1092
misc: followup by @b8zhong in https://github.com/flashinfer-ai/flashinfer/pull/1093
[nvidia] Add Blackwell FMHA decode kernel from TRT-LLM by @joker-eph in https://github.com/flashinfer-ai/flashinfer/pull/1051
bugfix: fix ninja generation rule for non-cuda input by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/1097
jit: Update TVM JIT binding with the latest FFI refactor by @MasterJH5574 in https://github.com/flashinfer-ai/flashinfer/pull/1100
SM100 Groupwise GeMM K-Major Scale Supports by @cyx-6 in https://github.com/flashinfer-ai/flashinfer/pull/1102
misc: aot: Add platform tag to wheel by @abcdabcd987 in https://github.com/flashinfer-ai/flashinfer/pull/1105
feat: composable logits processor by @xslingcn in https://github.com/flashinfer-ai/flashinfer/pull/1099
feat: add trtllm all-reduce (non-MoE) by @yyihuang in https://github.com/flashinfer-ai/flashinfer/pull/1096
bugfix: host-precomuted plan function for blackwell fmha by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/1106
doc: fix LogitsPipe example by @xslingcn in https://github.com/flashinfer-ai/flashinfer/pull/1110
bugfix: bugfix for blackwell mla split-k by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/1109
Add CUTLASS fused moe kernels from TensorRT-LLM. by @wenscarl in https://github.com/flashinfer-ai/flashinfer/pull/1113
fix: initialize lamport buffer only once after creating new workspace by @yyihuang in https://github.com/flashinfer-ai/flashinfer/pull/1111
hotfix: fix the blackwell fmha stream by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/1116
fix head_dim not defined if sm_scale is not None by @majian4work in https://github.com/flashinfer-ai/flashinfer/pull/1119
doc: add Ask-AI widget by @xslingcn in https://github.com/flashinfer-ai/flashinfer/pull/1121
bugfix: Fix test and output shape of fp4 quantize by @wenscarl in https://github.com/flashinfer-ai/flashinfer/pull/1114
misc: update slack link by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/1120
release: bump version to v0.2.6 by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/1122

New Contributors

@yongchaoding made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/1008
@Edenzzzz made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/1009
@dhy2000 made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/1013
@kaixih made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/1031
@yongwww made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/1036
@rickyfeng0119 made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/1042
@kf-zhang made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/1035
@arde171 made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/1015
@farnasirim made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/1058
@huydhn made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/1063
@acelyc111 made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/1077
@b8zhong made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/1092
@joker-eph made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/1051
@wenscarl made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/1113
@majian4work made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/1119

Full Changelog: https://github.com/flashinfer-ai/flashinfer/compare/v0.2.5...v0.2.6

Source: README.md, updated 2025-06-06

FlashInfer Files

FlashInfer: Kernel Library for LLM Serving

What's Changed

New Contributors

FlashInfer Files

FlashInfer: Kernel Library for LLM Serving

Get an email when there's a new version of FlashInfer

What's Changed

New Contributors