stt is a standalone speech recognition tool that locally converts spoken content in audio or video files into textual formats without requiring internet access, giving users control over their data and reducing reliance on external APIs. It leverages open-source speech models such as Faster-Whisper to recognize and transcribe human speech into plain text, structured JSON objects, or subtitle files with time codes, making it suitable for both personal and professional transcription tasks. The project is designed to be easy to deploy: you can run a local Python server that exposes an HTTP API for uploading audio/video files and retrieving transcriptions in different formats. It supports GPU acceleration if available, enabling faster processing on compatible hardware but still offers reliable performance on CPUs alone.

Features

  • Offline speech-to-text transcription
  • Outputs text, JSON, and SRT subtitle formats
  • Local HTTP API for easy integration
  • Supports multiple model sizes
  • Optional GPU acceleration
  • Standalone desktop deployment

Project Samples

Project Activity

See All Activity >

License

GNU General Public License version 3.0 (GPLv3)

Follow stt

stt Web Site

Other Useful Business Software
Go From AI Idea to AI App Fast Icon
Go From AI Idea to AI App Fast

One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
Try Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of stt!

Additional Project Details

Operating Systems

Linux, Mac, Windows

Programming Language

Python

Related Categories

Python Artificial Intelligence Software

Registered

2026-02-17