Download Latest Version Intel(r) Neural Speed v1.0 Release source code.tar.gz (3.6 MB)
Email in envelope

Get an email when there's a new version of Neural Speed

Home / v1.0a
Name Modified Size InfoDownloads / Week
Parent folder
Intel(r) Neural Speed v1.0a Release source code.tar.gz 2024-03-22 3.5 MB
Intel(r) Neural Speed v1.0a Release source code.zip 2024-03-22 3.8 MB
README.md 2024-03-22 2.7 kB
Totals: 3 Items   7.3 MB 0

Highlights Improvements Examples Bug Fixing Validated Configurations

Highlights - Improve performance on CPU client - Support batching and submit GPT-J results to MLPerf v4.0

Improvements - Support continuous batching and beam search inference (7c2199 ) - Improvement for AVX2 platform (bc5ee16, aa4a8a, 35c6d10 ) - Support FFN_fusion for the ChatGLM2(96fadd ) - Enable loading model from modelscope (ad3d19 ) - Extend long input tokens length (eb41b9 , e76a58e ) - [BesTLA] Improve RTN quantization accuracy of int4 and int3 (a90aea) - [BesTLA] New thread pool and hybrid dispatcher (fd19a44 )

Examples - Enable Mixtral 8x7B (9bcb612 ) - Enable Mistral-GPTQ (96dc55 ) - Implement the YaRN rop scaling feature (6c36f54 ) - Enable Qwen 1-5 (750b35 ) - Support GPTQ & AWQ inference for Qwen v1, v1.5 and Mixtral-8x7B (a129213) • Support GPTQ for Baichuan2-13B & Falcon 7B & Phi-1.5 (eed9b3) - Enable Baichuan-7B and refactor Baichuan-13B (8d5fe2d) - Enable StableLM2-1.6B & StableLM2-Zephyr-1.6B & StableLM-3B (872876 ) - Enable ChatGLM3 (94e74d ) - Enable Gemma-2B (e4c5f71 )

Bug Fixing - Fix convert_quantized model bug (37d01f3 ) - Fix Autoround acc regression (991c35 ) - Fix Qwen load error (2309fbb ) - Fix the GGUF convert issue (5293ffa )

Validated Configurations - Python 3.9, 3.10, 3.11 - Ubuntu 22.04

Source: README.md, updated 2024-03-22