Neural Speed - Browse /v0.1 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
Intel(r) Neural Speed v0.1 Release source code.tar.gz	2023-12-22	3.4 MB	0
Intel(r) Neural Speed v0.1 Release source code.zip	2023-12-22	3.6 MB	0
README.md	2023-12-22	1.2 kB	0
Totals: 3 Items		6.9 MB	0

Highlights Features Examples

Highlights - Created Neural Speed project, spinning off from Intel Extension for Transformers

Features - Support GPTQ models - Enable Beam Search post-processing. - Add MX-Format (FP8_E5M2, FP8_E4M3, FP4_E2M1, NF4) - Refactor Transformers Extension for Low-bit Inference Runtime based on the latest Jblas - Support Tensor Parallelism with jblas and shared memory. - Improving the performance of Client CPUs. - Enabling streaming LLM for Runtime - Enhance QLoRA on CPU with optimized dropout operator. - Add Script for PPL Evaluation. - Refine Python API. - Allow CompileBF16 on GCC11. - Multi-Round chat with ChatGLM2. - Shift-RoPE-based Streaming-LLM. - Enable MHA fusion for LLM. - Support AVX_VNNI and AVX2 - Optimize QBits backend. - GELU support

Examples - Enable finetune for Qwen-7b-chat on CPU. - Enable Whisper C++ API - Apply the STS task to BAAI/BGE models. - Enable Qwen graph. - Enable instruction_tuning Stable Diffusion examples. - Enable Mistral-7b. - Enable Falcon-180B - Enable Baichuan/Baichuan2 example.

Validated Configurations - Python 3.9, 3.10, 3.11 - GCC 13.1, 11.1 - Centos 8.4 & Ubuntu 20.04 & Windows 10

Source: README.md, updated 2023-12-22

Neural Speed Files

An innovative library for efficient LLM inference

Neural Speed Files

An innovative library for efficient LLM inference

Get an email when there's a new version of Neural Speed