shimmy
Python-free Rust inference server
The shimmy project is a lightweight local inference server designed to run large language models with minimal overhead. Written primarily in Rust, the tool provides a small standalone binary that exposes an API compatible with the OpenAI interface, allowing existing applications to interact with local models without significant code changes. This compatibility enables developers to replace remote AI services with locally hosted models while keeping their existing software architecture...