Download Latest Version 0.13.9 source code.tar.gz (1.3 MB)
Email in envelope

Get an email when there's a new version of Distributed Llama

Home / v0.13.5
Name Modified Size InfoDownloads / Week
Parent folder
0.13.5 source code.tar.gz 2025-04-19 1.3 MB
0.13.5 source code.zip 2025-04-19 1.4 MB
README.md 2025-04-19 657 Bytes
Totals: 3 Items   2.7 MB 0

The Vulkan matmul shader (matmul-forward-q80-q40-f32.comp) was optimized.

Tested on NVIDIA Tesla T4 16 GB using the llama3_1_8b_instruct_q40 model with --buffer-float-type q80.

Before:

🔶 Pred  151 ms Sync    0 ms | Sent     0 kB Recv     0 kB | )
🔶 Pred  151 ms Sync    0 ms | Sent     0 kB Recv     0 kB |  to
🔶 Pred  153 ms Sync    0 ms | Sent     0 kB Recv     0 kB |  be

This version:

🔶 Pred   97 ms Sync    0 ms | Sent     0 kB Recv     0 kB |  obtained
🔶 Pred   96 ms Sync    0 ms | Sent     0 kB Recv     0 kB |  by
🔶 Pred   96 ms Sync    0 ms | Sent     0 kB Recv     0 kB |  pert
Source: README.md, updated 2025-04-19