What's Changed
- ci: fix whell build script by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/418
- kernel: added attention combine kernel to support split kv by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/419
- kernel: refactor and added more unittests for attn combine kernel by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/420
- moe: added token dispatcher interface for MOE layer. by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/421
- moe: added local token dispatcher pytorch implementation for testing by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/422
- nccl: added all2all for nccl process group by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/423
- moe: added all to all token dispatcher pytorch implementation by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/424
- upgrade cutlass to 3.9 by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/425
- kernel: added fused gate for moe by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/426
- chore: added pre-commit-config by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/427
- kernel: added moe permute kernels by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/428
- chore: clean up attn dependencies by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/429
- chore: clean up JinjaChatTemplate by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/430
- test: added different dtype unittests for moe permute kernels by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/431
- refactor: use __ldlu to load/store data and refactor code for moe permute kernels by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/432
- upgrade pytorch to 2.7 by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/434
- chore: build manylinux2_28 builder image by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/435
- fix: fix manylinux2_28 build by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/436
- upgrade vcpkg after switch to manylinux_2_28 by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/437
- chore: add option to install py module into scalellm folder by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/438
- chore: add script to install zsh for devbox by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/439
- ci: enable docker cache by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/441
- kenerl: add kernel for moe permutation with mask map by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/433
- kernel: added align block permutation kernel for moe by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/442
- build: added build for blackwell by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/459
- chore: upgrade cutlass to v4.0 by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/460
- ci: change self-hosted runner tags by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/461
Full Changelog: https://github.com/vectorch-ai/ScaleLLM/compare/v0.2.4...v0.2.5