Download Latest Version Intel(r) Neural Speed v1.0 Release source code.tar.gz (3.6 MB)
Email in envelope

Get an email when there's a new version of Neural Speed

Home / v0.3
Name Modified Size InfoDownloads / Week
Parent folder
Intel(r) Neural Speed v0.3 Release source code.tar.gz 2024-02-23 3.5 MB
Intel(r) Neural Speed v0.3 Release source code.zip 2024-02-23 3.7 MB
README.md 2024-02-23 4.0 kB
Totals: 3 Items   7.2 MB 0

Highlights Improvements Examples Bug Fixing Validated Configurations

Highlights - Contributed GPT-J inference to MLPerf v4.0 submission (mlperf commits) - Enabled 3-bit low precision inference (ee40f28)

Improvements - Optimization of Layernormalization (98ffee45) - Update Qwen python API (51088a) - Load processed model automatically (662553) - Support continuous batching in Offline and Server (66cb9f5) - Support loading models from HF directly (bb80273) - Support autoround (e2d3652) - Enable OMP in BesTLA (3afae427) - Enable log with NEURAL_SPEED_VERBOSE (a8d9e7) - Add YaRN rope scaling data structure (8c846d6) - Improvements targeting Windows (464239)

Examples - Enable Qwen 1.8B (ea4b713) - Enable Phi-2, Phi-1.5 and Phi-1 (c212d8) - Support 3bits & 4bits GPTQ for Gpt-j 6B (4c9070) - Support Solar 10.7B with GPTQ (26c68c7, 90f5cbd) - Support Qwen GGUF inference (cd67b92)

Bug Fixing - Fix log-level introduced perf problem (6833b2f, 6f85518f) - Fix straightforward-API issues (4c082b7) - Fix a blocker on Windows platforms (4adc15) - Fix whisper python API. (c97dbe) - Fix Qwen loading & Mistral GPTQ convert (d47984c) - Fix clang-tidy issues (ad54a1f) - Fix Mistral online loading issues (0470b1f) - Handles models that require a HF token access ID (33ffaf07) - Fix the GGUF convert issue (5293ffa5) - Fix GPTQ & AWQ convert issue (150e752)

Validated Configurations - Python 3.10 - Ubuntu 22.04

Source: README.md, updated 2024-02-23