Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
Intel(r) Neural Speed v1.0a Release source code.tar.gz | 2024-03-22 | 3.5 MB | |
Intel(r) Neural Speed v1.0a Release source code.zip | 2024-03-22 | 3.8 MB | |
README.md | 2024-03-22 | 2.7 kB | |
Totals: 3 Items | 7.3 MB | 0 |
Highlights Improvements Examples Bug Fixing Validated Configurations
Highlights - Improve performance on CPU client - Support batching and submit GPT-J results to MLPerf v4.0
Improvements - Support continuous batching and beam search inference (7c2199 ) - Improvement for AVX2 platform (bc5ee16, aa4a8a, 35c6d10 ) - Support FFN_fusion for the ChatGLM2(96fadd ) - Enable loading model from modelscope (ad3d19 ) - Extend long input tokens length (eb41b9 , e76a58e ) - [BesTLA] Improve RTN quantization accuracy of int4 and int3 (a90aea) - [BesTLA] New thread pool and hybrid dispatcher (fd19a44 )
Examples - Enable Mixtral 8x7B (9bcb612 ) - Enable Mistral-GPTQ (96dc55 ) - Implement the YaRN rop scaling feature (6c36f54 ) - Enable Qwen 1-5 (750b35 ) - Support GPTQ & AWQ inference for Qwen v1, v1.5 and Mixtral-8x7B (a129213) • Support GPTQ for Baichuan2-13B & Falcon 7B & Phi-1.5 (eed9b3) - Enable Baichuan-7B and refactor Baichuan-13B (8d5fe2d) - Enable StableLM2-1.6B & StableLM2-Zephyr-1.6B & StableLM-3B (872876 ) - Enable ChatGLM3 (94e74d ) - Enable Gemma-2B (e4c5f71 )
Bug Fixing - Fix convert_quantized model bug (37d01f3 ) - Fix Autoround acc regression (991c35 ) - Fix Qwen load error (2309fbb ) - Fix the GGUF convert issue (5293ffa )
Validated Configurations - Python 3.9, 3.10, 3.11 - Ubuntu 22.04