Showing 268 open source projects for "open source speech to text software"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    PersonaPlex

    PersonaPlex

    PersonaPlex code

    PersonaPlex is an open-source real-time conversational speech AI model that goes beyond traditional text chat by providing full-duplex speech-to-speech interaction, meaning it can listen and talk at the same time instead of waiting for you to finish speaking before responding. This architectural approach eliminates awkward pauses and makes conversations feel much more human-like, with natural behaviors such as overlapping speech, interruptions, and fluent turn-taking, traits that traditional AI assistants typically lack. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2

    Omilo - a text to speech application

    Omilo is a simple text to speech application

    Omilo is a simple text to speech application for Windows and Linux using Festival, Flite, Marytts and Piper voices.
    Leader badge
    Downloads: 199 This Week
    Last Update:
    See Project
  • 3
    Translate-Subtitle-File

    Translate-Subtitle-File

    Subtitle Creation Assistant

    Subtitle group machine translation assistant - [Function 1: Translate subtitle file] .srt .ass .vtt [Function 2: Voice to text] (Drag in video or audio to recognize subtitles) (The latest version v4.1.0 Update time 2021 2 May 23) 12 translation service providers can be configured, such as Google, Baidu, Tencent, Caiyun, IBM, Azure, Amazon, etc. (6 voice service providers can be configured: Alibaba Cloud, Xunfei, Tencent Cloud, IBM, Azure, Amazon ) Advantages: 1. You can use multiple service...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 4
    RHVoice

    RHVoice

    Free open source speech synthesizer for Russian and other languages

    RHVoice is a free and open-source multilingual speech synthesizer. Its developers hope to give more visually impaired people the ability to use a good free synthesis voice reading in their native language with their screen reader. We are especially interested in supporting those languages for which there are currently no good voices that could be used with a screen reader. The creator of RHVoice, Olga Yakovleva, is blind herself. Many of the contributors to the RHVoice project, both...
    Downloads: 45 This Week
    Last Update:
    See Project
  • Train ML Models With SQL You Already Know Icon
    Train ML Models With SQL You Already Know

    BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

    Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.
    Try Free
  • 5
    eGuideDog free software for the blind
    eGuideDog project develops free software for the blind. Currently, we focus on WebSpeech, Ekho TTS and WebAnywhere.
    Leader badge
    Downloads: 219 This Week
    Last Update:
    See Project
  • 6
    Speakr

    Speakr

    Speakr is a personal, self-hosted web application

    Speakr is an open-source, real-time text-to-speech (TTS) web application that allows users to convert written text into natural-sounding speech in just a few clicks. It provides a clean, user-friendly interface where users can input text, choose a voice style or language, and immediately hear the output, making it ideal for accessibility, content creation, and learning applications.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    Koodo Reader

    Koodo Reader

    A modern ebook manager and reader with sync and backup

    Koodo Reader is an all-in-one ebook reader that can help you better manage and study your ebooks. It's free and open-source. Save your data to Dropbox or Webdav. Customize the source folder and synchronize among multiple devices using OneDrive, iCloud, Dropbox, etc. Single-column, two-column, or continuous scrolling layouts. Text-to-speech, translation, progress slider, touch screen support, batch import. Add bookmarks, notes, highlights to your books. ...
    Downloads: 20 This Week
    Last Update:
    See Project
  • 8
    Moshi

    Moshi

    A speech-text foundation model for real time dialogue

    Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec. Mimi processes 24 kHz audio, down to a 12.5 Hz representation with a bandwidth of 1.1 kbps, in a fully streaming manner (latency of 80ms, the frame size), yet performs better than existing, non-streaming, codecs like SpeechTokenizer (50 Hz, 4kbps), or SemantiCodec (50 Hz, 1.3kbps). Moshi models two streams of audio: one corresponds to Moshi, and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    annyang!

    annyang!

    Speech recognition for your site

    annyang is a tiny javascript library that lets your visitors control your site with voice commands. annyang supports multiple languages, has no dependencies, weighs just 2kb and is free to use. annyang understands commands with named variables, splats, and optional words. Use named variables for one word arguments in your command. Use splats to capture multi-word text at the end of your command (greedy). Use optional words or phrases to define a part of the command as optional. annyang plays...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.
    Start Free
  • 10
    SCAIL

    SCAIL

    Towards Studio-Grade Character Animation via In-Context Learning of 3D

    SCAIL is a project developed by the ZAI Organization, focusing on AI-driven research initiatives. While specific documentation about SCAIL’s exact goals and implementation is limited from the repository context alone, the project appears to be part of a collection of machine learning and AI research tools that facilitate scalable model development, evaluation, or application workflows. Given its listing alongside other ZAI projects like speech recognition and text-to-speech systems, SCAIL...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Podcastfy.ai

    Podcastfy.ai

    Transforming Multimodal Content into Captivating Multilingual Audio

    Podcastfy is an open-source Python package that transforms multi-modal content (text, images) into engaging, multi-lingual audio conversations using GenAI. Input content includes websites, PDFs, youtube videos as well as images. Unlike UI-based tools focused primarily on note-taking or research synthesis (e.g. NotebookLM), Podcastfy focuses on the programmatic and bespoke generation of engaging, conversational transcripts and audio from a multitude of multi-modal sources enabling...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 12
    Mermaid

    Mermaid

    Diagram and flowchart generation from text similar to markdown

    Mermaid is a JavaScript-based diagram and flowchart generating tool that uses markdown-inspired text for fast and easy generation of diagrams and charts. Forget about using heavy tools to explain your code. Mermaid greatly simplifies documentation with its simple markdown-like script language, and offers a great range of diagram and chart options. The latest version of Mermaid comes with a number of bug fixes and enhancements, as well as a new diagram type, entity relationship diagrams....
    Downloads: 91 This Week
    Last Update:
    See Project
  • 13
    Anx Reader

    Anx Reader

    Featuring powerful AI capabilities and supporting e-book formats

    Anx Reader is a polished, feature-rich e-book reader built with Flutter, designed for seamless reading across mobile and desktop. It supports major formats (EPUB, MOBI, AZW3, FB2, TXT) and integrates powerful AI tools for summarizing and intelligent navigation via OpenAI, Claude, Gemini, and DeepSeek. Anx also syncs progress, notes, and highlights over WebDAV, and offers rich analytics—including heatmaps and exportable reading summaries. UI customization and text-to-speech enhance the...
    Downloads: 22 This Week
    Last Update:
    See Project
  • 14
    rich

    rich

    Rich is a Python library for rich text and beautiful formatting

    The Rich API makes it easy to add color and style to terminal output. Rich can also render pretty tables, progress bars, markdown, syntax highlighted source code, tracebacks, and more, out of the box. Rich is a Python library for rich text and beautiful formatting in the terminal. Rich works with Linux, OSX, and Windows. True color/emoji works with new Windows Terminal, classic terminal is limited to 16 colors. Rich requires Python 3.7 or later. Effortlessly add rich output to your...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 15
    D2

    D2

    D2 is a modern diagram scripting language that turns text to diagrams

    D2 is a diagram scripting language that turns text to diagrams. It stands for Declarative Diagramming. Declarative, as in, you describe what you want diagrammed, it generates the image. As well, the functioning of the install script is described in detail to alleviate any concern of its use. We recommend using your OS's package manager directly instead for improved security but the install script is by no means insecure. D2 includes a variety of official themes to style your diagrams...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 16
    CadQuery

    CadQuery

    A python parametric CAD scripting framework based on OCCT

    CadQuery is an intuitive, easy-to-use Python library for building parametric 3D CAD models. It has several goals. Build models with scripts that are as close as possible to how you’d describe the object to a human, using a standard, already established programming language. Create parametric models that can be very easily customized by end users. Output high-quality CAD formats like STEP and AMF in addition to traditional STL. Provide a non-proprietary, plain text model format that can be...
    Downloads: 42 This Week
    Last Update:
    See Project
  • 17
    ESP8266Audio

    ESP8266Audio

    Arduino library to play MOD, WAV, FLAC, MIDI, RTTTL, MP3

    Arduino library for parsing and decoding MOD, WAV, MP3, FLAC, MIDI, AAC, and RTTL files and playing them on an I2S DAC or even using a software-simulated delta-sigma DAC with dynamic 32x-128x oversampling. ESP8266 is fully supported and most mature, but ESP32 is also mostly there with built-in DAC as well as external ones. For real-time, autonomous speech synthesis, check out ESP8266SAM, a library that uses this one and a port of an ancient format-based synthesis program to allow your...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 18
    AudioCraft

    AudioCraft

    Audiocraft is a library for audio processing and generation

    AudioCraft is a PyTorch library for text-to-audio and text-to-music generation, packaging research models and tooling for training and inference. It includes MusicGen for music generation conditioned on text (and optionally melody) and AudioGen for text-conditioned sound effects and environmental audio. Both models operate over discrete audio tokens produced by a neural codec (EnCodec), which acts like a tokenizer for waveforms and enables efficient sequence modeling. The repo provides...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    Flameshot

    Flameshot

    Powerful yet simple to use screenshot software 🖥️ 📸

    Flameshot is a powerful yet simple-to-use open-source screenshot software designed for efficiency and flexibility. It is a free and open-source, cross-platform tool that helps users capture screenshots with ease. Licensed under GPL v3, Flameshot provides a wide range of built-in features that save time during screen capturing and editing. The software offers a clean, straightforward interface that makes taking and annotating screenshots quick and intuitive. Users can customize the...
    Downloads: 30 This Week
    Last Update:
    See Project
  • 20
    PhotoEditor

    PhotoEditor

    A Photo Editor library with simple, easy support for image editing

    A Photo Editor library with simple, easy support for image editing using Paints, Text, Filters, Emoji and Sticker like stories. Drawing on the image with the option to change its Brush's Color, Size, Opacity, Erasing and basic shapes. Apply Filter Effect on the image using MediaEffect. Adding/Editing Text with the option to change its Color with Custom Fonts. Adding Emoji with Custom Emoji Fonts. Adding Images/Stickers. Pinch to Scale and Rotate views. Undo and Redo for Brush and Views....
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    Typed.js

    Typed.js

    A JavaScript typing animation library

    Typed.js is a library that types. Enter in any string, and watch it type at the speed you've set, backspace what it's typed, and begin a new sentence for however many strings you've set. Rather than using the strings array to insert strings, you can place an HTML div on the page and read from it. This allows bots and search engines, as well as users with JavaScript disabled, to see your text on the page. You can pause in the middle of a string for a given amount of time by including an...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 22
    Fabric.js

    Fabric.js

    Javascript Canvas Library and SVG-to-Canvas Parser

    Fabric.js is a simple yet powerful Javascript HTML5 canvas library that allows you to easily work with HTML5 canvas element in various ways. It is also an SVG-to-canvas (and vice versa) parser. Fabric provides an interactive object model on top of canvas element, so you can create and populate objects on canvas; manipulate the size, position and rotation of these objects; modify properties such as color, transparency and more. You could also group these objects together with just a simple...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 23
    Emoji

    Emoji

    A library to add Emoji support to your Android / JVM Application

    A Kotlin Multiplatform library to add Emoji support to your Android App / JVM Backend. Check out the sample jvm module for text parsing/searching functionality. PopupWindow which overlays over the soft keyboard. Normal view which is used by EmojiPopup and can also be used as a standalone to select emojis via categories. The library has 4 different sprites providers to choose from (iOS, Google, Facebook & Twitter). The emoji's are packaged as pictures and loaded at runtime. If you want to use...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    evol-colorpicker

    evol-colorpicker

    jQuery UI widget for color picking with web colors, theme colors, etc.

    evol-colorpicker is a web color picker that looks like the one in Microsoft Office 2010. It can be used inline or as a popup bound to a text box. It comes with several color palettes, can track selection history, and supports "transparent" colors. It is a full jQuery UI widget, supporting various configurations and themes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Radegast

    Radegast

    Lightweight client for connecting to Second Life and OpenSim

    Radegast is a virtual world client compatible with Second Life and OpenSimulator. Its main purpose is to provide an alternative client to Linden Lab-derived virtual world viewers. There is a strong focus on accessibility and non-3D interaction. Given the current nature of changes in Second Life, I felt it was prudent to take on another abandoned text-focused viewer. Introducing MEGAbolt, a fork of the METAbolt viewer which was abandoned by its author almost eight years ago. Keep in mind,...
    Downloads: 16 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB