Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
Intel(r) Neural Speed v0.1 Release source code.tar.gz | 2023-12-22 | 3.4 MB | |
Intel(r) Neural Speed v0.1 Release source code.zip | 2023-12-22 | 3.6 MB | |
README.md | 2023-12-22 | 1.2 kB | |
Totals: 3 Items | 6.9 MB | 0 |
Highlights Features Examples
Highlights - Created Neural Speed project, spinning off from Intel Extension for Transformers
Features - Support GPTQ models - Enable Beam Search post-processing. - Add MX-Format (FP8_E5M2, FP8_E4M3, FP4_E2M1, NF4) - Refactor Transformers Extension for Low-bit Inference Runtime based on the latest Jblas - Support Tensor Parallelism with jblas and shared memory. - Improving the performance of Client CPUs. - Enabling streaming LLM for Runtime - Enhance QLoRA on CPU with optimized dropout operator. - Add Script for PPL Evaluation. - Refine Python API. - Allow CompileBF16 on GCC11. - Multi-Round chat with ChatGLM2. - Shift-RoPE-based Streaming-LLM. - Enable MHA fusion for LLM. - Support AVX_VNNI and AVX2 - Optimize QBits backend. - GELU support
Examples - Enable finetune for Qwen-7b-chat on CPU. - Enable Whisper C++ API - Apply the STS task to BAAI/BGE models. - Enable Qwen graph. - Enable instruction_tuning Stable Diffusion examples. - Enable Mistral-7b. - Enable Falcon-180B - Enable Baichuan/Baichuan2 example.
Validated Configurations - Python 3.9, 3.10, 3.11 - GCC 13.1, 11.1 - Centos 8.4 & Ubuntu 20.04 & Windows 10