ScaleLLM - Browse /v0.2.3 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
scalellm-0.2.3+cu118torch2.4.1-cp38-cp38-linux_x86_64.whl	2025-01-26	51.9 MB	0
scalellm-0.2.3+cu118torch2.4.1-cp39-cp39-linux_x86_64.whl	2025-01-26	51.9 MB	0
scalellm-0.2.3+cu118torch2.4.1-cp310-cp310-linux_x86_64.whl	2025-01-26	51.9 MB	0
scalellm-0.2.3+cu118torch2.4.1-cp311-cp311-linux_x86_64.whl	2025-01-26	51.9 MB	0
scalellm-0.2.3+cu118torch2.4.1-cp312-cp312-linux_x86_64.whl	2025-01-26	51.9 MB	0
scalellm-0.2.3+cu118torch2.5.1-cp39-cp39-linux_x86_64.whl	2025-01-26	51.9 MB	0
scalellm-0.2.3+cu118torch2.5.1-cp310-cp310-linux_x86_64.whl	2025-01-26	51.9 MB	0
scalellm-0.2.3+cu118torch2.5.1-cp311-cp311-linux_x86_64.whl	2025-01-26	51.9 MB	0
scalellm-0.2.3+cu118torch2.5.1-cp312-cp312-linux_x86_64.whl	2025-01-26	51.9 MB	0
scalellm-0.2.3+cu121torch2.4.1-cp38-cp38-linux_x86_64.whl	2025-01-26	51.9 MB	0
scalellm-0.2.3+cu121torch2.4.1-cp39-cp39-linux_x86_64.whl	2025-01-26	51.9 MB	0
scalellm-0.2.3+cu121torch2.4.1-cp310-cp310-linux_x86_64.whl	2025-01-26	51.9 MB	0
scalellm-0.2.3+cu121torch2.4.1-cp311-cp311-linux_x86_64.whl	2025-01-26	51.9 MB	0
scalellm-0.2.3+cu121torch2.4.1-cp312-cp312-linux_x86_64.whl	2025-01-26	51.9 MB	0
scalellm-0.2.3+cu121torch2.5.1-cp39-cp39-linux_x86_64.whl	2025-01-26	51.9 MB	0
scalellm-0.2.3+cu121torch2.5.1-cp310-cp310-linux_x86_64.whl	2025-01-26	51.9 MB	0
scalellm-0.2.3+cu121torch2.5.1-cp311-cp311-linux_x86_64.whl	2025-01-26	51.9 MB	0
scalellm-0.2.3+cu121torch2.5.1-cp312-cp312-linux_x86_64.whl	2025-01-26	51.9 MB	0
scalellm-0.2.3+cu124torch2.4.1-cp38-cp38-linux_x86_64.whl	2025-01-26	52.4 MB	0
scalellm-0.2.3+cu124torch2.4.1-cp39-cp39-linux_x86_64.whl	2025-01-26	52.4 MB	0
scalellm-0.2.3+cu124torch2.4.1-cp310-cp310-linux_x86_64.whl	2025-01-26	52.4 MB	0
scalellm-0.2.3+cu124torch2.4.1-cp311-cp311-linux_x86_64.whl	2025-01-26	52.4 MB	0
scalellm-0.2.3+cu124torch2.4.1-cp312-cp312-linux_x86_64.whl	2025-01-26	52.4 MB	0
scalellm-0.2.3+cu124torch2.5.1-cp39-cp39-linux_x86_64.whl	2025-01-26	52.4 MB	0
scalellm-0.2.3+cu124torch2.5.1-cp310-cp310-linux_x86_64.whl	2025-01-26	52.4 MB	0
scalellm-0.2.3+cu124torch2.5.1-cp311-cp311-linux_x86_64.whl	2025-01-26	52.4 MB	0
scalellm-0.2.3+cu124torch2.5.1-cp312-cp312-linux_x86_64.whl	2025-01-26	52.4 MB	0
README.md	2025-01-26	4.3 kB	0
v0.2.3 source code.tar.gz	2025-01-26	8.0 MB	0
v0.2.3 source code.zip	2025-01-26	8.3 MB	0
Totals: 30 Items		1.4 GB	0

What's Changed

misc: remove legacy logic to support quantization for other types. by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/350
upgrade pytorch to 2.5.1 by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/351
added cuda 12.6 build image by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/353
fix cmake version issue for manylinux image by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/354
kernel: added attention kernel for sm80 (Happy new year!) by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/355
ci: fix package test workflow by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/357
kernel: refactor attention kernel for readibility by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/358
dev: config dev container with proper extensions by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/359
kernel: added attention bench for profiling before optimization by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/360
kernel: added logits soft cap support for attention by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/362
tools: added attention traits viewer by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/363
kernel: added swizzle for shared memory to avoid bank conflict by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/364
kernel: added causal, alibi, sliding window mask for attention by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/365
kernel: refactor attention kernel and add more unittests by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/366
kernel: added M/N OOB handling for attention by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/367
tools: update svg build to generate small file by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/368
kernel: Added attention params and tile for different input types. by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/369
kernel: added mqa and gqa support for attention by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/370
kernel: added var len and paged kv cache support for attention by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/371
kernel: added varlen and pagedkv unittests for attention by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/372
kernel: added attention kernel launch by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/373
kernel: added build script to generate kernel instantiations for attention by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/374
kernel: change attention input shape from [head, seq, dim] to [seq, head, dim] by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/375
kernel: added head_dim=96 support for attention by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/376
kernel: optimize attention kernel performance by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/377
upgrade cutlass to 3.7.0 by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/379
kernel: handle kv block range for attention kernel by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/382
kernel: use cp_async_zfill instead of cute::clear for oob handling by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/383
kernel: seperate oob iterations for better performance. by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/384
refactor: remove batch_prefill interface by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/385
refactor: stop build flash_infer kernel by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/386
feat: integrate in-house scale attention and use it by default by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/380
kernel: only zfill k once to improve perf for attention by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/387
refactor: skip flash_attn build by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/388
refactor: clean up kv cache set/get apis and improve slot id calculation perf by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/389

Full Changelog: https://github.com/vectorch-ai/ScaleLLM/compare/v0.2.2...v0.2.3

Source: README.md, updated 2025-01-26

ScaleLLM Files

A high-performance inference system for large language models

What's Changed

ScaleLLM Files

A high-performance inference system for large language models

Get an email when there's a new version of ScaleLLM

What's Changed