Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
0.13.5 source code.tar.gz | 2025-04-19 | 1.3 MB | |
0.13.5 source code.zip | 2025-04-19 | 1.4 MB | |
README.md | 2025-04-19 | 657 Bytes | |
Totals: 3 Items | 2.7 MB | 0 |
The Vulkan matmul shader (matmul-forward-q80-q40-f32.comp
) was optimized.
Tested on NVIDIA Tesla T4 16 GB using the llama3_1_8b_instruct_q40
model with --buffer-float-type q80
.
Before:
🔶 Pred 151 ms Sync 0 ms | Sent 0 kB Recv 0 kB | )
🔶 Pred 151 ms Sync 0 ms | Sent 0 kB Recv 0 kB | to
🔶 Pred 153 ms Sync 0 ms | Sent 0 kB Recv 0 kB | be
This version:
🔶 Pred 97 ms Sync 0 ms | Sent 0 kB Recv 0 kB | obtained
🔶 Pred 96 ms Sync 0 ms | Sent 0 kB Recv 0 kB | by
🔶 Pred 96 ms Sync 0 ms | Sent 0 kB Recv 0 kB | pert