| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| 0.15.4 source code.tar.gz | 2025-08-20 | 1.3 MB | |
| 0.15.4 source code.zip | 2025-08-20 | 1.4 MB | |
| README.md | 2025-08-20 | 421 Bytes | |
| Totals: 3 Items | 2.7 MB | 0 | |
This version brings another speedup in Vulkan inference.
Prediction (--steps 128)
RTX 3090 24GB, AMD EPYC 7313 16-Core Processor https://github.com/b4rtaz/distributed-llama/pull/252
| Model | Tokens/s (version 0.15.1) | Tokens/s (version 0.15.2) | Tokens/s (version 0.15.3) | Tokens/s (This version) |
|---|---|---|---|---|
llama3_1_8b_instruct_q40 |
24.80 | 24.80 | 33.32 | 45.33 🚀 |