Lucebox is a local LLM inference server built for fast generation on consumer hardware. It focuses on custom kernels, speculative prefill, speculative decoding, and model-specific optimizations rather than a generic one-size-fits-all runtime. The project includes a native C++ HTTP server with an OpenAI-compatible API, making it usable with tools that already speak the Chat Completions format. It supports CUDA and ROCm workflows, with Docker images for NVIDIA and AMD GPU setups. The repository also includes harnesses for testing compatibility with clients such as Claude Code, Codex, OpenCode, Hermes, Pi, OpenClaw, and Open WebUI. It is most useful for developers and AI enthusiasts who want to run optimized local models with lower latency, faster token generation, and hardware-aware inference behavior.

Features

  • Local LLM inference server
  • OpenAI-compatible HTTP API
  • Speculative prefill and decoding
  • CUDA and ROCm GPU support
  • Docker deployment workflows
  • Client harness compatibility testing

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow Lucebox

Lucebox Web Site

Other Useful Business Software
Atera - an All-in-one platform for IT management Icon
Atera - an All-in-one platform for IT management

Ideal for IT departments and MSPs (managed service providers)

Your IT essentials, integrated & elevated. Take your IT management from automated to autonomous, download Atera's agent to start your free trial!
Try Atera now
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Lucebox!

Additional Project Details

Programming Language

C++

Related Categories

C++ Artificial Intelligence Software

Registered

21 hours ago