Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
Intel(r) Neural Speed v0.2 Release source code.tar.gz | 2024-01-22 | 3.5 MB | |
Intel(r) Neural Speed v0.2 Release source code.zip | 2024-01-22 | 3.7 MB | |
README.md | 2024-01-22 | 2.5 kB | |
Totals: 3 Items | 7.1 MB | 0 |
Highlights Improvements Examples Bug Fixing Validated Configurations
Highlights - Support Q4_0, Q5_0 and Q8_0 GGUF models and AWQ - Enhance Tensor Parallelism with shared memory in multi-sockets in single node
Improvements - Rename Bestla files and their usage (d5c26d4 ) - Update Python API and reorg scripts (40663e ) - Enable AWQ with Llama2 example (9be307f ) - Enable clang tidy (227e89 ) - TP support multi-node (6dbaa0 ) - Support accuracy calculation for GPTQ models (7b124aa ) - Enable log with NEURAL_SPEED_VERBOSE (a8d9e7)
Examples - Add Magicoder example (749caca ) - Enable whisper large example (24b270 ) - Add Docker file and Readme (f57d4e1 ) - Support multi-batch ChatGLM-V1 inference (c9fb9d)
Bug Fixing - Fix avx512-s8-dequant and asymmetric related bug (fad80b14 ) - Fix warmup prompt length and add ns_log_level control (070b6b ) - Fix convert: remove hardcode of AWQ (7729bb ) - Fix the ChatGLM convert issue. (7671467 ) - Fix Bestla windows compile issue (760e5f )
Validated Configurations - Python 3.10 - Ubuntu 22.04