FastKoko is a self-hosted text-to-speech server built around the Kokoro-82M model and exposed through a FastAPI backend. It is designed to be easy to deploy via Docker, with separate CPU and GPU images so that users can choose between pure CPU inference and NVIDIA GPU acceleration. The project exposes an OpenAI-compatible speech endpoint, which means existing code that talks to the OpenAI audio API can often be pointed at a Kokoro-FastAPI instance with minimal changes. It supports multiple languages and voicepacks and allows phoneme based generation for more accurate pronunciation and prosody. The server also offers per-word timestamped captions, which makes it useful for creating subtitles or aligning audio with text. A built in web UI, API documentation, and debug endpoints for monitoring system status help users explore voices, test requests, and integrate the service into larger systems.

Features

  • OpenAI compatible speech API for drop in replacement of cloud TTS
  • Docker images for both CPU and NVIDIA GPU with multi arch support
  • Web UI and interactive API docs hosted on the same FastAPI server
  • Multi language voicepacks with phoneme based synthesis and captions
  • Voice mixing with weighted combinations and reusable custom voicepacks
  • Simple quick start scripts and Docker Compose setups for local hosting

Project Samples

Project Activity

See All Activity >

Categories

Text to Speech

License

Apache License V2.0

Follow FastKoko

FastKoko Web Site

Other Useful Business Software
Fully Managed MySQL, PostgreSQL, and SQL Server Icon
Fully Managed MySQL, PostgreSQL, and SQL Server

Automatic backups, patching, replication, and failover. Focus on your app, not your database.

Cloud SQL handles your database ops end to end, so you can focus on your app.
Try Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of FastKoko!

Additional Project Details

Operating Systems

Linux, Mac, Windows

Programming Language

Python

Related Categories

Python Text to Speech Software

Registered

2025-11-28