Speech to Text Software for Linux

View 11 business solutions

Browse free open source Speech to Text software and projects for Linux below. Use the toggles on the left to filter open source Speech to Text software by OS, license, language, programming language, and project status.

  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 1
    sherpa-onnx

    sherpa-onnx

    Speech-to-text, text-to-speech, and speaker recognition

    Speech-to-text, text-to-speech, and speaker recognition using next-gen Kaldi with onnxruntime without an Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter.
    Downloads: 341 This Week
    Last Update:
    See Project
  • 2
    CMU Sphinx

    CMU Sphinx

    Speech Recognition Toolkit

    Thank you for visiting! ----> Maintenance and improvement work has MOVED to https://cmusphinx.github.io/ Please go there for the most recent software and documentation. <---- CMUSphinx is a speaker-independent large vocabulary continuous speech recognizer released under BSD style license. It is also a collection of open source tools and resources that allows researchers and developers to build speech recognition systems.
    Leader badge
    Downloads: 398 This Week
    Last Update:
    See Project
  • 3
    Handy STT

    Handy STT

    A free, open source, and extensible speech-to-text application

    Handy is a free, open-source, offline speech-to-text application built for privacy, accessibility, and extensibility. Developed using Tauri (Rust + React/TypeScript), it runs natively across Windows, macOS, and Linux while performing local speech recognition without sending any audio to cloud servers. Handy allows users to start transcription instantly using a configurable keyboard shortcut—press to record, release to transcribe—and automatically pastes the resulting text into any active text field. Its backend leverages OpenAI’s Whisper models for GPU-accelerated speech recognition and Parakeet V3 for efficient CPU-only transcription with automatic language detection. To further refine accuracy and responsiveness, Handy integrates Silero’s Voice Activity Detection (VAD) for silence filtering, ensuring only speech segments are processed.
    Downloads: 71 This Week
    Last Update:
    See Project
  • 4
    Whisper

    Whisper

    Robust Speech Recognition via Large-Scale Weak Supervision

    OpenAI Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification. A Transformer sequence-to-sequence model is trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. These tasks are jointly represented as a sequence of tokens to be predicted by the decoder, allowing a single model to replace many stages of a traditional speech-processing pipeline. The multitask training format uses a set of special tokens that serve as task specifiers or classification targets.
    Downloads: 50 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    SpeechRecognition

    SpeechRecognition

    Speech recognition module for Python

    Library for performing speech recognition, with support for several engines and APIs, online and offline. Recognize speech input from the microphone, transcribe an audio file, save audio data to an audio file. Show extended recognition results, calibrate the recognizer energy threshold for ambient noise levels (see recognizer_instance.energy_threshold for details). Listening to a microphone in the background, various other useful recognizer features. The easiest way to install this is using pip install SpeechRecognition. The first software requirement is Python 2.6, 2.7, or Python 3.3+. This is required to use the library. PyAudio is required if and only if you want to use microphone input (Microphone). PyAudio version 0.2.11+ is required, as earlier versions have known memory management bugs when recording from microphones in certain situations. To hack on this library, first make sure you have all the requirements listed in the "Requirements" section.
    Downloads: 19 This Week
    Last Update:
    See Project
  • 6
    TTS Voice Wizard

    TTS Voice Wizard

    Speech to Text to Speech, sends text as OSC messages

    Speech to Text to Speech. Song now playing. Sends text as OSC messages to VRChat to display on avatar. (STTTS) (Speech to TTS) (VRC STT System) Use TTS Voice Wizard's accessibility features to improve your VRChat experience (it works outside of VRChat too!) You can convert your Speech-to-Text and back to Speech through various Speech Recognition and Text-to-Speech methods. You can send what you say as OSC messages to VRChat to be displayed on your avatar using KillFrenzyAvatarText or VRChats Chatbox. The app can translate your speech from one language to over 20 other support languages. There are 100+ different voices with various customization options so you can pick a voice that best suits you. Display the current song you are listening to on Spotify or via your browser. Display tracker and controller battery life in conjunction with XSOverlay. Use in conjunction with HRtoVRChat_OSC to enable you to display your heartrate in VRChat's Chatbox.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 7
    Whishper

    Whishper

    Transcribe any audio to text, translate and edit subtitles 100% locall

    Open-source, local-first audio transcription and subtitling suite with a simple web UI. Thanks to open-source technologies, Whishper can run 100% offline. Your data never leaves your computer. Whishper allows you to translate your transcriptions to and from more than 60 languages thanks to Argos Translate and LibreTranslate. Download the transcriptions in many formats (json, txt, vtt, srt). Easily edit your subtitles right in the Web-UI.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 8
    Vibe

    Vibe

    Transcribe on your own

    Vibe is an open-source project by thewh1teagle designed to deliver a collaborative and interactive social application experience, though its specifics depend on its evolving community scope; its development often focuses on connecting users through dynamic features that can include chat, shared spaces, and immersive interactions. The repository typically includes backend logic, frontend integration, and real-time communication stacks to support live user engagement, performance optimizations, and modular features that adapt to social workflows. Because open-source social platforms benefit from transparency and community contribution, Vibe’s codebase allows developers to experiment with new social features, customize existing components, and build integrations with popular services for authentication, media sharing, and notifications. Projects like Vibe often emphasize scalability, responsive design, and extensibility so that communities of users can grow without major rewrites.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 9
    Coqui STT

    Coqui STT

    The deep learning toolkit for speech-to-text

    Coqui STT is a fast, open-source, multi-platform, deep-learning toolkit for training and deploying speech-to-text models. Coqui STT is battle-tested in both production and research. Multiple possible transcripts, each with an associated confidence score. Experience the immediacy of script-to-performance. With Coqui text-to-speech, production times go from months to minutes. With Coqui, the post is a pleasure. Effortlessly clone the voices of your talent and have the clone handle the problems in post. With Coqui, dubbing is a delight. Effortlessly clone the voice of your talent into another language and let the clone do the dub. With text-to-speech, experience the immediacy of script-to-performance. Cast from a wide selection of high-quality, directable, emotive voices or clone a voice to suit your needs. With Coqui text-to-speech, production times go from months to minutes.
    Downloads: 5 This Week
    Last Update:
    See Project
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    Faster Whisper

    Faster Whisper

    Faster Whisper transcription with CTranslate2

    Faster Whisper is an optimized implementation of the Whisper speech recognition model designed to deliver significantly faster inference while maintaining comparable accuracy. It leverages efficient inference engines and optimized computation strategies to reduce latency and resource consumption. The system is particularly useful for real-time or large-scale transcription tasks where performance is critical. It supports multiple model sizes, allowing users to balance speed and accuracy based on their needs. The architecture is designed to run efficiently on both CPUs and GPUs, making it accessible across different environments. It also includes support for streaming and batch processing, enabling flexible deployment scenarios. Overall, faster-whisper makes state-of-the-art speech recognition more practical for production use cases by improving speed and efficiency without sacrificing quality.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 11
    Translate-Subtitle-File

    Translate-Subtitle-File

    Subtitle Creation Assistant

    Subtitle group machine translation assistant - [Function 1: Translate subtitle file] .srt .ass .vtt [Function 2: Voice to text] (Drag in video or audio to recognize subtitles) (The latest version v4.1.0 Update time 2021 2 May 23) 12 translation service providers can be configured, such as Google, Baidu, Tencent, Caiyun, IBM, Azure, Amazon, etc. (6 voice service providers can be configured: Alibaba Cloud, Xunfei, Tencent Cloud, IBM, Azure, Amazon ) Advantages: 1. You can use multiple service providers, 2. You can configure your own API Key to use your own account's free quota, such as Tencent's free translation quota of 5 million characters per month, IBM's 500-minute speech-to-text free quota (tern. best The domain name has expired and I don't want to renew it.) Azure speech-to-text and DeepL free version have problems, it is normal to not use it, please wait for the next version to fix. Machine translation of subtitle files, use machine translation to process files.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    DeepSpeech

    DeepSpeech

    Open source embedded speech-to-text engine

    DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers. DeepSpeech is an open-source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Speech research paper. Project DeepSpeech uses Google's TensorFlow to make the implementation easier. A pre-trained English model is available for use and can be downloaded following the instructions in the usage docs. If you want to use the pre-trained English model for performing speech-to-text, you can download it (along with other important inference material) from the DeepSpeech releases page.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    WhisperX

    WhisperX

    Automatic Speech Recognition with Word-level Timestamps

    WhisperX is an advanced speech recognition system built on top of OpenAI’s Whisper model, designed to improve transcription accuracy and timing precision for long-form audio. It addresses key limitations of standard Whisper implementations by introducing voice activity detection and forced alignment techniques to produce word-level timestamps. The system enables batched inference, significantly increasing transcription speed while maintaining high accuracy. It is particularly effective for long recordings, where traditional approaches may suffer from drift, repetition, or misalignment. whisperx also supports speaker diarization, allowing identification of different speakers within a conversation. Its architecture combines multiple components to enhance both performance and usability in real-world transcription tasks. Overall, whisperx provides a more robust and scalable solution for high-quality speech-to-text applications.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    Speech Recognition in English & Polish

    Speech Recognition in English & Polish

    Speech recognition software for English & Polish languages

    Software for speech recognition in English & Polish languages. Basic versions of SkryBot: 1. SkryBot Home Speech (English Language) - https://sourceforge.net/projects/skrybotdomowy/files/ReleasesEnglish/InstalatorSkryBotHomeSpeechDemo-2.6.9.18117.exe/download 2. SkryBot DoMowy (Polish Language) - https://sourceforge.net/projects/skrybotdomowy/files/ReleasesPolish/InstalatorSkryBotDoMowyDemo-2.4.9.18117.exe/download More help: https://sourceforge.net/p/skrybotdomowy/wiki/ Domain advanced versions (Polish Language) 1. SkryBot Prawo - for judicial professionals. 2. SkryBot Administracyjny - for civil and government administration. 3. SkryBot Medycyna Rodzinna - for physicians Professional version of SkryBot (commercial) offers you: 1. Audio conversion and cutting sound files into smaller ones. 2. Searching for words or phrases in sound files (recognized by SkryBot). 3. Editing sound files and automatic cutting off long silence parts in audio file.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    Anthromorphic Scribe

    Anthromorphic Scribe

    Provides speech to text gui to sphinx4

    It provides an interactive speech to text application that uses sphinx 4. With this you can use pre-recorded audio, record your own voice and convert incompatible audio/video to be compatible with sphinx 4. It currently supports U.S English by using hub4 acoustic and language model.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Conversations

    Conversations

    App in java for chatting to a generative A.I. (involving tts and stt)

    Java application for chatting to generative AI Llama3. * The user can speak into the microphone (speechToText), edit the recognized text and send it to the AI. * The AI ​​responds and the server returns that response in real time, and the sentences converted to audio (textToSpeech), and the application broadcasts them through the speaker. The application is prepared so that only one user occupies the server's resources, so if the server is busy, in theory it will not let you connect. There is a demo video that shows how it works: https://frojasg1.com:8443/resource_counter/resourceCounter?operation=countAndForward&url=https%3A%2F%2Ffrojasg1.com%2Fdemos%2Faplicaciones%2Fchat%2F20240815.Demo.Chat.mp4%3Forigin%3Dsourceforge&origin=web More about it at this web site: https://www.frojasg1.com:8443/downloads_web/web/html/conversaciones.html?origin=sourceforge
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    GoodByeCatpcha

    GoodByeCatpcha

    Solver ReCaptcha v2 Free

    An async Python library to automate solving ReCAPTCHA v2 by images/audio using Mozilla's DeepSpeech, PocketSphinx, Microsoft Azure’s, Google Speech and Amazon's Transcribe Speech-to-Text API. Also image recognition to detect the object suggested in the captcha. Built with Pyppeteer for Chrome automation framework and similarities to Puppeteer, PyDub for easily converting MP3 files into WAV, aiohttp for async minimalistic web-server, and Python’s built-in AsyncIO for convenience.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    ILA - teachable voice assistant

    ILA - teachable voice assistant

    ILA is a fully customizable and teachable voice assistant for Java

    ILA stands for (kind of) intelligent, learning assistant and is a speech recognition system aka voice assistant very similar to Siri, Google Now and Cortana. ILA is fully customizable and you can teach her/him/it new things by yourself like executing system commands, opening web pages, programs and apps or just some basic conversation :-) ILA runs on Java und thus is compatible to Windows, Mac and Linux. It is designed to integrate with your home enviroment and for example build up your own, free and open Amazon Echo replacement ;-) Right now the key components of ILA are the open source speech recognition CMU Sphinx-4, Google (Speech Recognition/Text-To-Speech) and MaryTTS (Text-To-Speech). The goal is to make ILA completely free of Google by improving all aspects of the open source systems. Since version 3.3 users can also write own add-ons to extend ILA. ILA's successor is the SEPIA Framework: https://sepia-framework.github.io/ Hope you enjoy ILA - Florian
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    kabook audiobook mp3 player, text to speech block text editor. Bookmark & link to searchable associated text. Resume play where quit. For all audio learners, dyslectic, blind, and barrier free linux accessibility . Foot pedal transcription of dictation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Mice MX OS speech to text Voice Control

    Mice MX OS speech to text Voice Control

    Mice speech to text with MX Cinnamon OS ISO

    Note about this image This image contains a system based on Linux MX, which was created to improve accessibility within the Linux environment. The distribution uses the Cinnamon desktop interface, which is configured to be operated using voice commands and outputs. The user interface and the control of your own devices and home automation systems can be customized and extended. The voice control program MiceStTM.py was developed to enable easy adaptation to other languages. However, only German settings are currently implemented. category: System commands comment: Screen grid trigger: Display screen (Ras.*|Grid)* terminal_command: /opt/micesttm/read-aloud/screen_grid.py & sleep 1 && xdotool search --name "screen grid" windowactivate intern_command: tts: Screen grid for the mouse click was selected.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21

    Mice TTM

    mice stt tts

    Dieses Tool wird speziell für die Barrierefreiheit unter Linux entwickelt. Es ermöglicht das umwandeln/konvertieren/parsen von Texten die aus einer Spracherkennung stammen, in Diktate sowie das Ausführen von Makros. Dies funktioniert ohne Internet, da die Spracherkennung auf dem PC selbst erfolgt. Mausbewegungen auf benannte Wörter und dann entsprechend auswählen oder per Sprachbefehl klicken. Außerdem können Textpassagen z.B. unter Libreoffice Wirter per Sprachbefehl entsprechend ausgewählt und bearbeitet werden. Springe zum Satzende und so weiter. Hausautomatisierungen können realiesiert werden. Nach jedem Befehl kann ein Feedback per Sprachausgabe erfolgen.(tts) Makros können das Licht einschalten oder den Fernseher per Broadlink steuern. Je nach dem welche Idee man hat.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    RealtimeSTT

    RealtimeSTT

    A robust, efficient, low-latency speech-to-text library

    RealtimeSTT is a Python-based realtime speech-to-text engine emphasizing low latency, wake-word detection, voice activity detection, and automatic speech segmentation. It provides asynchronous callbacks, nanosecond-precision timestamps, and CLI tools, suitable for building voice assistants, meeting transcribers, or live caption systems.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23

    Sluh

    creating user interface for converting speech to text and voice contro

    Проект по изучению пригодности\простоты к использованию CMUSphinx, pocketsphinx в пользовательских целях. Попытка создания программного интерфейса по распознаванию русской речи (преобразованию в текст) на базе CMUSphinx, pocketsphinx для голосового управления ПО и прочее.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    This software convert speech to text using Java and Android application. With this software you can also search for text in Google. You can use offline speech to text with this application if you don't have Internet, you can find the steps in guide file. How to use: ----------------- 1- Install a software to convert the PC as router (EX: My Wifi Router) then connect your mobile with PC via wifi. 2- Install Smart Text to Speech.apk file on your phone. 3- Open "Smart Speech to Text.jar" java application on PC. 4- Launch Smart Speech to Text on your phone. 5- Click on "Speak Now" button in java application. 6- After you speak click on red circle button on your phone to stop speaking and to convert it to text or you can wait few seconds notice: --------- Speech that will converted relied on the language that installed on your phone. © Copyright Khalid Kareem 2014, All Rights Reserved E-mail: khaled.atroshy@gmail.com Enjoy!
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Speechalyzer

    Speechalyzer

    Process large speech data wrt transcription, labeling and annotation

    Speechalyzer: a tool for the daily work of a 'speech worker' It is optimized to process large speech data sets with respect to transcription, labeling and annotation. It is implemented as a client server based framework in Java and interfaces software for speech recognition, synthesis, speech classification and quality evaluation. The application is mainly the processing of training data for speech recognition and classification models and performing benchmarking tests on speech-to-text, text-to-speech and speech classification software systems.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB