Download Latest Version Intel(r) Neural Speed v1.0 Release source code.tar.gz (3.6 MB)
Email in envelope

Get an email when there's a new version of Neural Speed

Home / v0.2
Name Modified Size InfoDownloads / Week
Parent folder
Intel(r) Neural Speed v0.2 Release source code.tar.gz 2024-01-22 3.5 MB
Intel(r) Neural Speed v0.2 Release source code.zip 2024-01-22 3.7 MB
README.md 2024-01-22 2.5 kB
Totals: 3 Items   7.1 MB 0

Highlights Improvements Examples Bug Fixing Validated Configurations

Highlights - Support Q4_0, Q5_0 and Q8_0 GGUF models and AWQ - Enhance Tensor Parallelism with shared memory in multi-sockets in single node

Improvements - Rename Bestla files and their usage (d5c26d4 ) - Update Python API and reorg scripts (40663e ) - Enable AWQ with Llama2 example (9be307f ) - Enable clang tidy (227e89 ) - TP support multi-node (6dbaa0 ) - Support accuracy calculation for GPTQ models (7b124aa ) - Enable log with NEURAL_SPEED_VERBOSE (a8d9e7)

Examples - Add Magicoder example (749caca ) - Enable whisper large example (24b270 ) - Add Docker file and Readme (f57d4e1 ) - Support multi-batch ChatGLM-V1 inference (c9fb9d)

Bug Fixing - Fix avx512-s8-dequant and asymmetric related bug (fad80b14 ) - Fix warmup prompt length and add ns_log_level control (070b6b ) - Fix convert: remove hardcode of AWQ (7729bb ) - Fix the ChatGLM convert issue. (7671467 ) - Fix Bestla windows compile issue (760e5f )

Validated Configurations - Python 3.10 - Ubuntu 22.04

Source: README.md, updated 2024-01-22