Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
Intel(r) Neural Speed v0.3 Release source code.tar.gz | 2024-02-23 | 3.5 MB | |
Intel(r) Neural Speed v0.3 Release source code.zip | 2024-02-23 | 3.7 MB | |
README.md | 2024-02-23 | 4.0 kB | |
Totals: 3 Items | 7.2 MB | 0 |
Highlights Improvements Examples Bug Fixing Validated Configurations
Highlights - Contributed GPT-J inference to MLPerf v4.0 submission (mlperf commits) - Enabled 3-bit low precision inference (ee40f28)
Improvements - Optimization of Layernormalization (98ffee45) - Update Qwen python API (51088a) - Load processed model automatically (662553) - Support continuous batching in Offline and Server (66cb9f5) - Support loading models from HF directly (bb80273) - Support autoround (e2d3652) - Enable OMP in BesTLA (3afae427) - Enable log with NEURAL_SPEED_VERBOSE (a8d9e7) - Add YaRN rope scaling data structure (8c846d6) - Improvements targeting Windows (464239)
Examples - Enable Qwen 1.8B (ea4b713) - Enable Phi-2, Phi-1.5 and Phi-1 (c212d8) - Support 3bits & 4bits GPTQ for Gpt-j 6B (4c9070) - Support Solar 10.7B with GPTQ (26c68c7, 90f5cbd) - Support Qwen GGUF inference (cd67b92)
Bug Fixing - Fix log-level introduced perf problem (6833b2f, 6f85518f) - Fix straightforward-API issues (4c082b7) - Fix a blocker on Windows platforms (4adc15) - Fix whisper python API. (c97dbe) - Fix Qwen loading & Mistral GPTQ convert (d47984c) - Fix clang-tidy issues (ad54a1f) - Fix Mistral online loading issues (0470b1f) - Handles models that require a HF token access ID (33ffaf07) - Fix the GGUF convert issue (5293ffa5) - Fix GPTQ & AWQ convert issue (150e752)
Validated Configurations - Python 3.10 - Ubuntu 22.04