Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
Intel(r) Extension for PyTorch_ v2.8.0+cpu Release Notes source code.tar.gz | 2025-08-08 | 49.8 MB | |
Intel(r) Extension for PyTorch_ v2.8.0+cpu Release Notes source code.zip | 2025-08-08 | 50.4 MB | |
README.md | 2025-08-08 | 3.3 kB | |
Totals: 3 Items | 100.2 MB | 0 |
2.8.0
We are excited to announce the release of Intel® Extension for PyTorch* 2.8.0+cpu which accompanies PyTorch 2.8. This release mainly brings you new LLM model optimization including Qwen3 and Whisper large-v3, enhancement of API for multi-LoRA inference kernels and optimizations of LLM generation sampler. This release also includes a set of bug fixing and small optimizations. We want to sincerely thank our dedicated community for your contributions.
Besides providing optimization in Intel® Extension for PyTorch*, over the past years, we have also upstreamed most of our features and optimizations for Intel® platforms into PyTorch* and will continue pushing remaining ones into PyTorch* in future. Moving forward, we will change our working model to prioritize developing new features and optimization directly in PyTorch*, and de-prioritize development in Intel® Extension for PyTorch*, effective after 2.8 release. We will continue providing critical bug fixes and security patches if needed throughout the PyTorch* 2.9 timeframe to ensure a smooth transition for our partners and community.
Highlights
- Qwen3 support
Qwen3 has recently been released, the latest addition to the Qwen family of large language models. Intel® Extension for PyTorch* provides support of Qwen3 since its launch date with early release version for MoE models like Qwen3-30B and middle-size dense model like Qwen3-14B. Related optimizations have been included in this official release.
- Whisper large-v3 support
Intel® Extension for PyTorch* provides optimization for whisper-large-v3, a state-of-the-art model for automatic speech recognition (ASR) and speech translation. Key improvements include replacing the cross-attention mechanism with the Indirect Access Key-Value (IAKV) Cache kernel, bringing you well-performing experience with weight-only INT8 quantization on Intel® Xeon® processors.
- General Large Language Model (LLM) optimization
Intel® Extension for PyTorch* provides sgmv support in the API for multi-LoRA inference kernels for LLM serving frameworks and optimizes the LLM generation sampler. A full list of optimized models can be found at LLM optimization.
-
Bug fixing and other optimization
Full Changelog: https://github.com/intel/intel-extension-for-pytorch/compare/v2.7.0+cpu...v2.8.0+cpu