Distributed Llama Files

Connect home devices into a powerful cluster to accelerate LLM

This is an exact mirror of the Distributed Llama project, hosted at https://github.com/b4rtaz/distributed-llama. SourceForge is not affiliated with Distributed Llama. For more information, see the SourceForge Open Source Mirror Directory.

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
0.15.4 source code.tar.gz	2025-08-20	1.3 MB	0
0.15.4 source code.zip	2025-08-20	1.4 MB	0
README.md	2025-08-20	421 Bytes	0
Totals: 3 Items		2.7 MB	0

This version brings another speedup in Vulkan inference.

Prediction (--steps 128)

RTX 3090 24GB, AMD EPYC 7313 16-Core Processor https://github.com/b4rtaz/distributed-llama/pull/252

Model	Tokens/s (version 0.15.1)	Tokens/s (version 0.15.2)	Tokens/s (version 0.15.3)	Tokens/s (This version)
`llama3_1_8b_instruct_q40`	24.80	24.80	33.32	45.33 🚀

Source: README.md, updated 2025-08-20

Other Useful Business Software

Orchestrate Your AI Agents with Zenflow

The multi-agent workflow engine for modern teams. Zenflow executes coding, testing, and verification with deep repo awareness

Zenflow orchestrates AI agents like a real engineering system. With parallel execution, spec-driven workflows, and deep multi-repo understanding, agents plan, implement, test, and verify end-to-end. Upgrade to AI workflows that work the way your team does.

Try free now

Our Free Plans just got better! | Auth0

With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now

Recommended Projects

Intel Extension for Transformers
Build your chatbot within minutes on your favorite device
vLLM
A high-throughput and memory-efficient inference and serving engine
Genv
GPU environment management and cluster orchestration
DeepSparse
Sparsity-aware deep learning inference runtime for CPUs
TorchRec
Pytorch domain library for recommendation systems

Distributed Llama Files

Connect home devices into a powerful cluster to accelerate LLM

Get an email when there's a new version of Distributed Llama

Prediction (--steps 128)