Search Results for "open source speech to text software"

Sort By:

Showing 268 open source projects for "open source speech to text software"

View related business solutions

Multimedia Linux Clear Filters & Widen Search

Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
Forever Free Full-Stack Observability | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
1

PersonaPlex

PersonaPlex code

PersonaPlex is an open-source real-time conversational speech AI model that goes beyond traditional text chat by providing full-duplex speech-to-speech interaction, meaning it can listen and talk at the same time instead of waiting for you to finish speaking before responding. This architectural approach eliminates awkward pauses and makes conversations feel much more human-like, with natural behaviors such as overlapping speech, interruptions, and fluent turn-taking, traits that traditional AI assistants typically lack. ...

Downloads: 2 This Week

Last Update: 2026-03-02
See Project
2

Omilo - a text to speech application

Omilo is a simple text to speech application

Omilo is a simple text to speech application for Windows and Linux using Festival, Flite, Marytts and Piper voices.

3 Reviews

Downloads: 199 This Week

Last Update: 2024-09-20
See Project
3

Translate-Subtitle-File

Subtitle Creation Assistant

Subtitle group machine translation assistant - [Function 1: Translate subtitle file] .srt .ass .vtt [Function 2: Voice to text] (Drag in video or audio to recognize subtitles) (The latest version v4.1.0 Update time 2021 2 May 23) 12 translation service providers can be configured, such as Google, Baidu, Tencent, Caiyun, IBM, Azure, Amazon, etc. (6 voice service providers can be configured: Alibaba Cloud, Xunfei, Tencent Cloud, IBM, Azure, Amazon ) Advantages: 1. You can use multiple service...

Downloads: 16 This Week

Last Update: 3 days ago
See Project
4

RHVoice

Free open source speech synthesizer for Russian and other languages

RHVoice is a free and open-source multilingual speech synthesizer. Its developers hope to give more visually impaired people the ability to use a good free synthesis voice reading in their native language with their screen reader. We are especially interested in supporting those languages for which there are currently no good voices that could be used with a screen reader. The creator of RHVoice, Olga Yakovleva, is blind herself. Many of the contributors to the RHVoice project, both...

Downloads: 45 This Week

Last Update: 2026-03-31
See Project
Train ML Models With SQL You Already Know
BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.

Try Free
5

eGuideDog free software for the blind

eGuideDog project develops free software for the blind. Currently, we focus on WebSpeech, Ekho TTS and WebAnywhere.

16 Reviews

Downloads: 219 This Week

Last Update: 1 day ago
See Project
6

Speakr

Speakr is a personal, self-hosted web application

Speakr is an open-source, real-time text-to-speech (TTS) web application that allows users to convert written text into natural-sounding speech in just a few clicks. It provides a clean, user-friendly interface where users can input text, choose a voice style or language, and immediately hear the output, making it ideal for accessibility, content creation, and learning applications.

Downloads: 2 This Week

Last Update: 11 hours ago
See Project
7

Koodo Reader

A modern ebook manager and reader with sync and backup

Koodo Reader is an all-in-one ebook reader that can help you better manage and study your ebooks. It's free and open-source. Save your data to Dropbox or Webdav. Customize the source folder and synchronize among multiple devices using OneDrive, iCloud, Dropbox, etc. Single-column, two-column, or continuous scrolling layouts. Text-to-speech, translation, progress slider, touch screen support, batch import. Add bookmarks, notes, highlights to your books. ...

Downloads: 20 This Week

Last Update: 2026-04-19
See Project
8

Moshi

A speech-text foundation model for real time dialogue

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec. Mimi processes 24 kHz audio, down to a 12.5 Hz representation with a bandwidth of 1.1 kbps, in a fully streaming manner (latency of 80ms, the frame size), yet performs better than existing, non-streaming, codecs like SpeechTokenizer (50 Hz, 4kbps), or SemantiCodec (50 Hz, 1.3kbps). Moshi models two streams of audio: one corresponds to Moshi, and...

Downloads: 0 This Week

Last Update: 2024-11-05
See Project
9

annyang!

Speech recognition for your site

annyang is a tiny javascript library that lets your visitors control your site with voice commands. annyang supports multiple languages, has no dependencies, weighs just 2kb and is free to use. annyang understands commands with named variables, splats, and optional words. Use named variables for one word arguments in your command. Use splats to capture multi-word text at the end of your command (greedy). Use optional words or phrases to define a part of the command as optional. annyang plays...

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.

Start Free
10

SCAIL

Towards Studio-Grade Character Animation via In-Context Learning of 3D

SCAIL is a project developed by the ZAI Organization, focusing on AI-driven research initiatives. While specific documentation about SCAIL’s exact goals and implementation is limited from the repository context alone, the project appears to be part of a collection of machine learning and AI research tools that facilitate scalable model development, evaluation, or application workflows. Given its listing alongside other ZAI projects like speech recognition and text-to-speech systems, SCAIL...

Downloads: 0 This Week

Last Update: 2026-01-30
See Project
11

Podcastfy.ai

Transforming Multimodal Content into Captivating Multilingual Audio

Podcastfy is an open-source Python package that transforms multi-modal content (text, images) into engaging, multi-lingual audio conversations using GenAI. Input content includes websites, PDFs, youtube videos as well as images. Unlike UI-based tools focused primarily on note-taking or research synthesis (e.g. NotebookLM), Podcastfy focuses on the programmatic and bespoke generation of engaging, conversational transcripts and audio from a multitude of multi-modal sources enabling...

Downloads: 3 This Week

Last Update: 2024-11-16
See Project
12

Mermaid

Diagram and flowchart generation from text similar to markdown

Mermaid is a JavaScript-based diagram and flowchart generating tool that uses markdown-inspired text for fast and easy generation of diagrams and charts. Forget about using heavy tools to explain your code. Mermaid greatly simplifies documentation with its simple markdown-like script language, and offers a great range of diagram and chart options. The latest version of Mermaid comes with a number of bug fixes and enhancements, as well as a new diagram type, entity relationship diagrams....

Downloads: 91 This Week

Last Update: 2026-04-01
See Project
13

Anx Reader

Featuring powerful AI capabilities and supporting e-book formats

Anx Reader is a polished, feature-rich e-book reader built with Flutter, designed for seamless reading across mobile and desktop. It supports major formats (EPUB, MOBI, AZW3, FB2, TXT) and integrates powerful AI tools for summarizing and intelligent navigation via OpenAI, Claude, Gemini, and DeepSeek. Anx also syncs progress, notes, and highlights over WebDAV, and offers rich analytics—including heatmaps and exportable reading summaries. UI customization and text-to-speech enhance the...

Downloads: 22 This Week

Last Update: 2026-03-19
See Project
14

rich

Rich is a Python library for rich text and beautiful formatting

The Rich API makes it easy to add color and style to terminal output. Rich can also render pretty tables, progress bars, markdown, syntax highlighted source code, tracebacks, and more, out of the box. Rich is a Python library for rich text and beautiful formatting in the terminal. Rich works with Linux, OSX, and Windows. True color/emoji works with new Windows Terminal, classic terminal is limited to 16 colors. Rich requires Python 3.7 or later. Effortlessly add rich output to your...

Downloads: 4 This Week

Last Update: 2026-04-12
See Project
15

D2

D2 is a modern diagram scripting language that turns text to diagrams

D2 is a diagram scripting language that turns text to diagrams. It stands for Declarative Diagramming. Declarative, as in, you describe what you want diagrammed, it generates the image. As well, the functioning of the install script is described in detail to alleviate any concern of its use. We recommend using your OS's package manager directly instead for improved security but the install script is by no means insecure. D2 includes a variety of official themes to style your diagrams...

Downloads: 12 This Week

Last Update: 2025-08-19
See Project
16

CadQuery

A python parametric CAD scripting framework based on OCCT

CadQuery is an intuitive, easy-to-use Python library for building parametric 3D CAD models. It has several goals. Build models with scripts that are as close as possible to how you’d describe the object to a human, using a standard, already established programming language. Create parametric models that can be very easily customized by end users. Output high-quality CAD formats like STEP and AMF in addition to traditional STL. Provide a non-proprietary, plain text model format that can be...

Downloads: 42 This Week

Last Update: 2026-02-13
See Project
17

ESP8266Audio

Arduino library to play MOD, WAV, FLAC, MIDI, RTTTL, MP3

Arduino library for parsing and decoding MOD, WAV, MP3, FLAC, MIDI, AAC, and RTTL files and playing them on an I2S DAC or even using a software-simulated delta-sigma DAC with dynamic 32x-128x oversampling. ESP8266 is fully supported and most mature, but ESP32 is also mostly there with built-in DAC as well as external ones. For real-time, autonomous speech synthesis, check out ESP8266SAM, a library that uses this one and a port of an ancient format-based synthesis program to allow your...

Downloads: 4 This Week

Last Update: 2025-10-23
See Project
18

AudioCraft

Audiocraft is a library for audio processing and generation

AudioCraft is a PyTorch library for text-to-audio and text-to-music generation, packaging research models and tooling for training and inference. It includes MusicGen for music generation conditioned on text (and optionally melody) and AudioGen for text-conditioned sound effects and environmental audio. Both models operate over discrete audio tokens produced by a neural codec (EnCodec), which acts like a tokenizer for waveforms and enables efficient sequence modeling. The repo provides...

Downloads: 1 This Week

Last Update: 2025-10-13
See Project
19

Flameshot

Powerful yet simple to use screenshot software 🖥️ 📸

Flameshot is a powerful yet simple-to-use open-source screenshot software designed for efficiency and flexibility. It is a free and open-source, cross-platform tool that helps users capture screenshots with ease. Licensed under GPL v3, Flameshot provides a wide range of built-in features that save time during screen capturing and editing. The software offers a clean, straightforward interface that makes taking and annotating screenshots quick and intuitive. Users can customize the...

1 Review

Downloads: 30 This Week

Last Update: 2025-10-29
See Project
20

PhotoEditor

A Photo Editor library with simple, easy support for image editing

A Photo Editor library with simple, easy support for image editing using Paints, Text, Filters, Emoji and Sticker like stories. Drawing on the image with the option to change its Brush's Color, Size, Opacity, Erasing and basic shapes. Apply Filter Effect on the image using MediaEffect. Adding/Editing Text with the option to change its Color with Custom Fonts. Adding Emoji with Custom Emoji Fonts. Adding Images/Stickers. Pinch to Scale and Rotate views. Undo and Redo for Brush and Views....

Downloads: 1 This Week

Last Update: 2026-03-12
See Project
21

Typed.js

A JavaScript typing animation library

Typed.js is a library that types. Enter in any string, and watch it type at the speed you've set, backspace what it's typed, and begin a new sentence for however many strings you've set. Rather than using the strings array to insert strings, you can place an HTML div on the page and read from it. This allows bots and search engines, as well as users with JavaScript disabled, to see your text on the page. You can pause in the middle of a string for a given amount of time by including an...

Downloads: 3 This Week

Last Update: 2026-01-24
See Project
22

Fabric.js

Javascript Canvas Library and SVG-to-Canvas Parser

Fabric.js is a simple yet powerful Javascript HTML5 canvas library that allows you to easily work with HTML5 canvas element in various ways. It is also an SVG-to-canvas (and vice versa) parser. Fabric provides an interactive object model on top of canvas element, so you can create and populate objects on canvas; manipulate the size, position and rotation of these objects; modify properties such as color, transparency and more. You could also group these objects together with just a simple...

Downloads: 9 This Week

Last Update: 2026-04-28
See Project
23

Emoji

A library to add Emoji support to your Android / JVM Application

A Kotlin Multiplatform library to add Emoji support to your Android App / JVM Backend. Check out the sample jvm module for text parsing/searching functionality. PopupWindow which overlays over the soft keyboard. Normal view which is used by EmojiPopup and can also be used as a standalone to select emojis via categories. The library has 4 different sprites providers to choose from (iOS, Google, Facebook & Twitter). The emoji's are packaged as pictures and loaded at runtime. If you want to use...

Downloads: 1 This Week

Last Update: 2026-03-23
See Project
24

evol-colorpicker

jQuery UI widget for color picking with web colors, theme colors, etc.

evol-colorpicker is a web color picker that looks like the one in Microsoft Office 2010. It can be used inline or as a popup bound to a text box. It comes with several color palettes, can track selection history, and supports "transparent" colors. It is a full jQuery UI widget, supporting various configurations and themes.

Downloads: 0 This Week

Last Update: 2024-12-13
See Project
25

Radegast

Lightweight client for connecting to Second Life and OpenSim

Radegast is a virtual world client compatible with Second Life and OpenSimulator. Its main purpose is to provide an alternative client to Linden Lab-derived virtual world viewers. There is a strong focus on accessibility and non-3D interaction. Given the current nature of changes in Second Life, I felt it was prudent to take on another abandoned text-focused viewer. Introducing MEGAbolt, a fork of the METAbolt viewer which was abandoned by its author almost eight years ago. Keep in mind,...

3 Reviews

Downloads: 16 This Week

Last Update: 2026-03-04
See Project