Lucebox is a local LLM inference server built for fast generation on consumer hardware. It focuses on custom kernels, speculative prefill, speculative decoding, and model-specific optimizations rather than a generic one-size-fits-all runtime. The project includes a native C++ HTTP server with an OpenAI-compatible API, making it usable with tools that already speak the Chat Completions format. It supports CUDA and ROCm workflows, with Docker images for NVIDIA and AMD GPU setups. The repository also includes harnesses for testing compatibility with clients such as Claude Code, Codex, OpenCode, Hermes, Pi, OpenClaw, and Open WebUI. It is most useful for developers and AI enthusiasts who want to run optimized local models with lower latency, faster token generation, and hardware-aware inference behavior.

Features

  • Local LLM inference server
  • OpenAI-compatible HTTP API
  • Speculative prefill and decoding
  • CUDA and ROCm GPU support
  • Docker deployment workflows
  • Client harness compatibility testing

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow Lucebox

Lucebox Web Site

Other Useful Business Software
Compliant and Reliable File Transfers Backed by Top Security Certifications Icon
Compliant and Reliable File Transfers Backed by Top Security Certifications

Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.
Start Free Trial
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Lucebox!

Additional Project Details

Programming Language

C++

Related Categories

C++ Artificial Intelligence Software

Registered

18 hours ago