Neural Speed - Browse /v0.3 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
Intel(r) Neural Speed v0.3 Release source code.tar.gz	2024-02-23	3.5 MB	0
Intel(r) Neural Speed v0.3 Release source code.zip	2024-02-23	3.7 MB	0
README.md	2024-02-23	4.0 kB	0
Totals: 3 Items		7.2 MB	0

Highlights Improvements Examples Bug Fixing Validated Configurations

Highlights - Contributed GPT-J inference to MLPerf v4.0 submission (mlperf commits) - Enabled 3-bit low precision inference (ee40f28)

Improvements - Optimization of Layernormalization (98ffee45) - Update Qwen python API (51088a) - Load processed model automatically (662553) - Support continuous batching in Offline and Server (66cb9f5) - Support loading models from HF directly (bb80273) - Support autoround (e2d3652) - Enable OMP in BesTLA (3afae427) - Enable log with NEURAL_SPEED_VERBOSE (a8d9e7) - Add YaRN rope scaling data structure (8c846d6) - Improvements targeting Windows (464239)

Examples - Enable Qwen 1.8B (ea4b713) - Enable Phi-2, Phi-1.5 and Phi-1 (c212d8) - Support 3bits & 4bits GPTQ for Gpt-j 6B (4c9070) - Support Solar 10.7B with GPTQ (26c68c7, 90f5cbd) - Support Qwen GGUF inference (cd67b92)

Bug Fixing - Fix log-level introduced perf problem (6833b2f, 6f85518f) - Fix straightforward-API issues (4c082b7) - Fix a blocker on Windows platforms (4adc15) - Fix whisper python API. (c97dbe) - Fix Qwen loading & Mistral GPTQ convert (d47984c) - Fix clang-tidy issues (ad54a1f) - Fix Mistral online loading issues (0470b1f) - Handles models that require a HF token access ID (33ffaf07) - Fix the GGUF convert issue (5293ffa5) - Fix GPTQ & AWQ convert issue (150e752)

Validated Configurations - Python 3.10 - Ubuntu 22.04

Source: README.md, updated 2024-02-23

Neural Speed Files

An innovative library for efficient LLM inference

Neural Speed Files

An innovative library for efficient LLM inference

Get an email when there's a new version of Neural Speed