Neural Speed - Browse /v0.2 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
Intel(r) Neural Speed v0.2 Release source code.tar.gz	2024-01-22	3.5 MB	0
Intel(r) Neural Speed v0.2 Release source code.zip	2024-01-22	3.7 MB	0
README.md	2024-01-22	2.5 kB	0
Totals: 3 Items		7.1 MB	0

Highlights Improvements Examples Bug Fixing Validated Configurations

Highlights - Support Q4_0, Q5_0 and Q8_0 GGUF models and AWQ - Enhance Tensor Parallelism with shared memory in multi-sockets in single node

Improvements - Rename Bestla files and their usage (d5c26d4 ) - Update Python API and reorg scripts (40663e ) - Enable AWQ with Llama2 example (9be307f ) - Enable clang tidy (227e89 ) - TP support multi-node (6dbaa0 ) - Support accuracy calculation for GPTQ models (7b124aa ) - Enable log with NEURAL_SPEED_VERBOSE (a8d9e7)

Examples - Add Magicoder example (749caca ) - Enable whisper large example (24b270 ) - Add Docker file and Readme (f57d4e1 ) - Support multi-batch ChatGLM-V1 inference (c9fb9d)

Bug Fixing - Fix avx512-s8-dequant and asymmetric related bug (fad80b14 ) - Fix warmup prompt length and add ns_log_level control (070b6b ) - Fix convert: remove hardcode of AWQ (7729bb ) - Fix the ChatGLM convert issue. (7671467 ) - Fix Bestla windows compile issue (760e5f )

Validated Configurations - Python 3.10 - Ubuntu 22.04

Source: README.md, updated 2024-01-22

Neural Speed Files

An innovative library for efficient LLM inference

Neural Speed Files

An innovative library for efficient LLM inference

Get an email when there's a new version of Neural Speed