Lucebox
Fast LLM speculative inference server for consumer hardware
Lucebox is a local LLM inference server built for fast generation on consumer hardware. It focuses on custom kernels, speculative prefill, speculative decoding, and model-specific optimizations rather than a generic one-size-fits-all runtime. The project includes a native C++ HTTP server with an OpenAI-compatible API, making it usable with tools that already speak the Chat Completions format.