Secret Llama is a privacy-first large-language-model chatbot that runs entirely inside your web browser, meaning no server is required and your conversation data never leaves your device. It focuses on open-source model support, letting you load families like Llama and Mistral directly in the client for fully local inference. Because everything happens in-browser, it can work offline once models are cached, which is helpful for air-gapped environments or travel. The interface mirrors the modern chat UX you’d expect—streaming responses, markdown, and a clean layout—so there’s no usability tradeoff to gain privacy. Under the hood it uses a web-native inference engine to accelerate model execution with GPU/WebGPU when available, keeping responses responsive even without a backend. It’s a great option for developers and teams who want to prototype assistants or handle sensitive text without sending prompts to external APIs.
Features
- Fully local, in-browser inference with no server dependency
- Support for popular open-source LLMs and quantized variants
- Works offline once models are loaded into the browser cache
- Modern chat UI with streaming output and markdown rendering
- WebGPU-accelerated execution for faster responses on capable machines
- Simple import and configuration flow for swapping models or parameters