ScaleLLM - Browse /v0.2.4 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
scalellm-0.2.4+cu118torch2.5.1-cp39-cp39-linux_x86_64.whl	2025-03-02	54.0 MB	0
scalellm-0.2.4+cu118torch2.5.1-cp310-cp310-linux_x86_64.whl	2025-03-02	54.0 MB	0
scalellm-0.2.4+cu118torch2.5.1-cp311-cp311-linux_x86_64.whl	2025-03-02	54.0 MB	0
scalellm-0.2.4+cu118torch2.5.1-cp312-cp312-linux_x86_64.whl	2025-03-02	54.0 MB	0
scalellm-0.2.4+cu118torch2.6.0-cp39-cp39-linux_x86_64.whl	2025-03-02	54.0 MB	0
scalellm-0.2.4+cu118torch2.6.0-cp310-cp310-linux_x86_64.whl	2025-03-02	54.0 MB	0
scalellm-0.2.4+cu118torch2.6.0-cp311-cp311-linux_x86_64.whl	2025-03-02	54.0 MB	0
scalellm-0.2.4+cu118torch2.6.0-cp312-cp312-linux_x86_64.whl	2025-03-02	54.0 MB	0
scalellm-0.2.4+cu124torch2.5.1-cp39-cp39-linux_x86_64.whl	2025-03-02	53.9 MB	0
scalellm-0.2.4+cu124torch2.5.1-cp310-cp310-linux_x86_64.whl	2025-03-02	53.9 MB	0
scalellm-0.2.4+cu124torch2.5.1-cp311-cp311-linux_x86_64.whl	2025-03-02	53.9 MB	0
scalellm-0.2.4+cu124torch2.5.1-cp312-cp312-linux_x86_64.whl	2025-03-02	53.9 MB	0
scalellm-0.2.4+cu124torch2.6.0-cp39-cp39-linux_x86_64.whl	2025-03-02	53.9 MB	0
scalellm-0.2.4+cu124torch2.6.0-cp310-cp310-linux_x86_64.whl	2025-03-02	53.9 MB	0
scalellm-0.2.4+cu124torch2.6.0-cp311-cp311-linux_x86_64.whl	2025-03-02	53.9 MB	0
scalellm-0.2.4+cu124torch2.6.0-cp312-cp312-linux_x86_64.whl	2025-03-02	53.9 MB	0
README.md	2025-03-01	3.2 kB	0
v0.2.4 source code.tar.gz	2025-03-01	8.0 MB	0
v0.2.4 source code.zip	2025-03-01	8.3 MB	0
Totals: 19 Items		879.6 MB	0

What's Changed

ci: add option to skip nvbench build by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/390
ci: build devel image with cuda 12.8 for blackwell by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/391
kernel: added query packing support for attention by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/392
refactor: rename attention to mha to differentiate it from mla by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/393
kernel: added triton aot compiler by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/394
kernel: generate smaller kernel instantiations by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/395
kernel: fix register spilling issue for attention head_dim=256 by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/397
upgrade libtorch to 2.6.0 and cutlass to 3.8.0 by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/398
kernel: added simple MLA kernel by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/396
kernel: added pipeline support for mla by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/399
kernel: added ping-pong rmem support for MLA by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/400
kernel: revert experimental TiledMMA separation change. by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/401
kernel: put query alwasy in registers for mha by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/402
kernel: use 8 warps to avoid register spilling for mla with hdim=512 by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/403
kernel: revert mla ping-pong rmem change by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/404
kernel: refactor mask logic to avoid using hard-coded stride. by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/405
kernel: added causal mask for MLA kernel by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/406
kernel: added blk_n=16 for MLA to support sm_86/sm_89 with only 100kb smem by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/407
kernel: fix mask bugs for MLA by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/408
kernel: use differnt TiledMma for GEMM qk and pv by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/409
kernel: added stage support for MLA kernel by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/410
misc: upgrade cuda version and add devcontainer for manylinux by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/412
kernel: added q and kv oob handling for MLA kernel by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/413
kernel: optimize mask loop for MLA kernel by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/414
kernel: added paged kv support for MLA kernel by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/415
kernel: fix kv oob issue and added more unittests for paged MLA by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/416
kernel: use FastDivmod in attention kernels by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/417

Full Changelog: https://github.com/vectorch-ai/ScaleLLM/compare/v0.2.3...v0.2.4

Source: README.md, updated 2025-03-01

ScaleLLM Files

A high-performance inference system for large language models

What's Changed

ScaleLLM Files

A high-performance inference system for large language models

Get an email when there's a new version of ScaleLLM

What's Changed