Download Latest Version Intel(r) Neural Speed v1.0 Release source code.tar.gz (3.6 MB)
Email in envelope

Get an email when there's a new version of Neural Speed

Home / v0.1
Name Modified Size InfoDownloads / Week
Parent folder
Intel(r) Neural Speed v0.1 Release source code.tar.gz 2023-12-22 3.4 MB
Intel(r) Neural Speed v0.1 Release source code.zip 2023-12-22 3.6 MB
README.md 2023-12-22 1.2 kB
Totals: 3 Items   6.9 MB 0

Highlights Features Examples

Highlights - Created Neural Speed project, spinning off from Intel Extension for Transformers

Features - Support GPTQ models - Enable Beam Search post-processing. - Add MX-Format (FP8_E5M2, FP8_E4M3, FP4_E2M1, NF4) - Refactor Transformers Extension for Low-bit Inference Runtime based on the latest Jblas - Support Tensor Parallelism with jblas and shared memory. - Improving the performance of Client CPUs. - Enabling streaming LLM for Runtime - Enhance QLoRA on CPU with optimized dropout operator. - Add Script for PPL Evaluation. - Refine Python API. - Allow CompileBF16 on GCC11. - Multi-Round chat with ChatGLM2. - Shift-RoPE-based Streaming-LLM. - Enable MHA fusion for LLM. - Support AVX_VNNI and AVX2 - Optimize QBits backend. - GELU support

Examples - Enable finetune for Qwen-7b-chat on CPU. - Enable Whisper C++ API - Apply the STS task to BAAI/BGE models. - Enable Qwen graph. - Enable instruction_tuning Stable Diffusion examples. - Enable Mistral-7b. - Enable Falcon-180B - Enable Baichuan/Baichuan2 example.

Validated Configurations - Python 3.9, 3.10, 3.11 - GCC 13.1, 11.1 - Centos 8.4 & Ubuntu 20.04 & Windows 10

Source: README.md, updated 2023-12-22