Showing 34 open source projects for "open source speech to text software"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    Open-LLM-VTuber

    Open-LLM-VTuber

    Open source AI VTuber platform with voice chat and Live2D avatars

    Open-LLM-VTuber is an open source platform designed to create AI-powered VTuber characters that can interact with users through voice and animated avatars. It enables hands-free conversations with large language models by combining speech recognition, language processing, and text-to-speech synthesis into a single system. Users can speak directly to the AI character, and the system can respond with a generated voice while animating a Live2D avatar to simulate a talking virtual personality. ...
    Downloads: 18 This Week
    Last Update:
    See Project
  • 2
    Pot Desktop

    Pot Desktop

    A cross-platform software for text translation and recognition

    Pot-Desktop is a cross-platform productivity tool aimed at helping users quickly translate, perform OCR (optical character recognition), and synthesize speech for selected text or images — all with minimal friction. It supports picking text via mouse selection (“highlight-and-translate”), clipboard listening, or screenshot-based OCR; this makes it ideal for reading webpages, documents, images — or any on-screen text — and instantly getting translations or text extraction. The tool supports...
    Downloads: 22 This Week
    Last Update:
    See Project
  • 3
    eGuideDog free software for the blind
    eGuideDog project develops free software for the blind. Currently, we focus on WebSpeech, Ekho TTS and WebAnywhere.
    Leader badge
    Downloads: 219 This Week
    Last Update:
    See Project
  • 4
    FastRTC

    FastRTC

    The python library for real-time communication

    FastRTC is a Python library designed to simplify real-time communication (RTC), especially for audio and video streaming applications. It abstracts away much of the complexity that typically comes with implementing WebRTC by providing a simple interface — e.g. a Stream class — that can be mounted within a web backend (for example a FastAPI application). This makes it particularly well suited for building real-time voice (or video) interfaces for applications such as AI assistants, live chat,...
    Downloads: 3 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 5
    comfyui-mixlab-nodes

    comfyui-mixlab-nodes

    Workflow and speech recognition app

    comfyui-mixlab-nodes is a large collection of custom nodes for ComfyUI that turns workflows into interactive apps and adds real-time multimedia, LLM, and TTS capabilities. It introduces a “Workflow-to-APP” concept, where a ComfyUI graph can be transformed into a Web App through an AppInfo node, complete with categories, batch prompts, and editable configurations. The project also brings Real-time Design features like screen capture and floating video nodes, enabling creative pipelines that...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    Supertonic

    Supertonic

    Lightning-fast, on-device TTS, running natively via ONNX

    Supertonic is a lightning-fast, on-device text-to-speech system built around ONNX Runtime for maximum speed and portability. It focuses on running entirely locally, eliminating the need for cloud APIs and providing low latency and strong privacy guarantees, even on constrained devices like Raspberry Pi boards and e-readers. The core model is highly compact at around 66 million parameters, yet benchmarks show it can generate speech up to 167× faster than real time on modern consumer hardware...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Node.js Client For NLP Cloud

    Node.js Client For NLP Cloud

    NLP Cloud serves high performance pre-trained or custom models

    This is the Node.js client (with Typescript types) for the NLP Cloud API. NLP Cloud serves high-performance pre-trained or custom models for NER, sentiment analysis, classification, summarization, dialogue summarization, paraphrasing, intent classification, product description and ad generation, chatbot, grammar and spelling correction, keywords and keyphrases extraction, text generation, image generation, blog post generation, text generation, question answering, automatic speech...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    annyang!

    annyang!

    Speech recognition for your site

    annyang is a tiny javascript library that lets your visitors control your site with voice commands. annyang supports multiple languages, has no dependencies, weighs just 2kb and is free to use. annyang understands commands with named variables, splats, and optional words. Use named variables for one word arguments in your command. Use splats to capture multi-word text at the end of your command (greedy). Use optional words or phrases to define a part of the command as optional. annyang plays...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Buster

    Buster

    Captcha solver extension for humans

    Save time by asking Buster to solve captchas for you. Buster is a Firefox extension which helps you to solve difficult captchas by completing reCAPTCHA audio challenges using speech recognition. Challenges are solved by clicking on the extension button at the bottom of the reCAPTCHA widget. It is not guaranteed that challenges are always solved, the limitations of the technology need to be considered. The continued development of Buster is made possible thanks to the support of awesome...
    Downloads: 29 This Week
    Last Update:
    See Project
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 10
    FAY

    FAY

    Framework for building AI-powered interactive digital humans and agent

    ...Its architecture allows developers to combine different AI components such as speech recognition, text-to-speech, and large language models to create conversational digital agents. Fay provides multiple interfaces for text, voice, and digital human control, enabling developers to build interactive assistants, virtual presenters, or automated service agents. It also supports custom knowledge bases and configurable behaviors so developers can tailor the personality and responses of the digital human.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 11
    Deep Chat

    Deep Chat

    Customizable AI chat component for websites with API support

    Deep Chat is a highly customizable web component designed to simplify the integration of AI-powered chat interfaces into websites. It allows developers to embed a fully functional chatbot using minimal setup, while still offering extensive control over behavior, appearance, and integrations. Deep Chat supports connections to a wide range of AI services as well as custom backends, enabling flexible deployment for different use cases. It is built as a framework-agnostic solution, meaning it...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Agili Hacker Podcast

    Agili Hacker Podcast

    AI tool that turns Hacker News posts into daily podcast updates

    ...As an open-source tool, it also encourages community contributions and customization for developers who want to adapt or extend its workflow for similar AI-driven content pipelines.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    PyGPT

    PyGPT

    Open source personal AI Assistant for Linux, Windows and Mac

    PyGPT is a desktop application that allows you to talk to OpenAI's LLM models such as GPT4 and GPT3 using your own computer and OpenAI API. It allows you to talk in chat mode and in completion mode, as well as generate images using DALL-E 2. PyGPT also adds access to the Internet for GPT via Google Custom Search API and Wikipedia API and includes voice synthesis using Microsoft Azure Text-to-Speech API. Moreover, the application has implemented context memory support, context storage,...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 14
    AUTOMATIC1111 Stable Diffusion web UI
    AUTOMATIC1111's stable-diffusion-webui is a powerful, user-friendly web interface built on the Gradio library that allows users to easily interact with Stable Diffusion models for AI-powered image generation. Supporting both text-to-image (txt2img) and image-to-image (img2img) generation, this open-source UI offers a rich feature set including inpainting, outpainting, attention control, and multiple advanced upscaling options. With a flexible installation process across Windows, Linux, and...
    Downloads: 300 This Week
    Last Update:
    See Project
  • 15
    Bot Framework Web Chat

    Bot Framework Web Chat

    A highly-customizable web-based client for Azure Bot Services

    This repository contains code for the Bot Framework Web Chat component. The Bot Framework Web Chat component is a highly-customizable web-based client for the Bot Framework V4 SDK. The Bot Framework SDK v4 enables developers to model conversation and build sophisticated bot applications. This repo is part of the Microsoft Bot Framework, a comprehensive framework for building enterprise-grade conversational AI experiences. Create a bot with the ability to speak, listen, understand, and learn...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    Venom

    Venom

    Venom is the most complete javascript library for Whatsapp

    Venom is a high-performance system developed with JavaScript to create a bot for WhatsApp, support for creating any interaction, such as customer service, media sending, sentence recognition based on artificial intelligence and all types of design architecture for WhatsApp. It's a high-performance alternative API to whatzapp, you can send, text messages, files, images, videos and more. Remember, the API was developed on a platform called RESTful Web services, providing interoperability...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 17
    KoboldCpp

    KoboldCpp

    Run GGUF models easily with a UI or API. One File. Zero Install.

    KoboldCpp is an easy-to-use AI text-generation software for GGML and GGUF models, inspired by the original KoboldAI. It's a single self-contained distributable that builds off llama.cpp and adds many additional powerful features.
    Leader badge
    Downloads: 1,210 This Week
    Last Update:
    See Project
  • 18
    Node.js Telegram Bot API

    Node.js Telegram Bot API

    Telegram Bot API for NodeJS

    TelegramBot is an EventEmitter that emits several events. Message, received a new incoming Message of any kind. Depending on the properties of the Message, one of these events may ALSO be emitted, text, audio, document, photo, sticker, video, voice, contact, location, new_chat_members, left_chat_member, new_chat_title, new_chat_photo, delete_chat_photo, group_chat_created, game, pinned_message, poll, dice, migrate_from_chat_id, migrate_to_chat_id, channel_chat_created,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    MCP UI

    MCP UI

    SDK for building interactive UI components over MCP for AI tools

    mcp-ui is a software development kit designed to bring interactive user interface capabilities to applications built on the Model Context Protocol (MCP). It enables developers to create rich, dynamic UI components that can be delivered from an MCP server and rendered seamlessly by a compatible client. Instead of returning only text responses, tools can provide structured UI resources such as HTML or remote-rendered components, allowing more engaging and functional interactions. mcp-ui...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    Voice Accounting For Blind & Mute People

    Voice Accounting For Blind & Mute People

    Free & Easy AI Voice Accounting Software For Blind & Speechless People

    Just download the above zip file, extract it and then open the index.html file on internet browsers like Firefox ( preferable ) or Google Chrome. Also, please view and download my full collection of softwares for people with disabilities, here : https://sourceforge.net/projects/softwares-for-disabled-people/ This full collection also includes the Voice Accounting Software as well.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Flow Teleprompter

    Flow Teleprompter

    A Windows first teleprompter with voice tracking and AI drafting.

    Flow is a high performance desktop teleprompter built with Tauri and Rust. It is designed for creators who need a clean reading surface without sacrificing advanced features like native speech tracking and app wide voice commands. It features five playback styles including scroll and voice sync, local first privacy, and built in script editing. Version 1.6.1 introduces a rebuilt Vosk engine for faster recognition and significantly lower RAM usage. Whether you are using the Groq powered AI...
    Downloads: 21 This Week
    Last Update:
    See Project
  • 22
    Softwares For Blind, Deaf, Handicap

    Softwares For Blind, Deaf, Handicap

    Easy AI Softwares for Blind, Deaf, Handicapped, Disabled People

    Just download the above zip file, extract it first and then open the index.html file on internet browsers like Firefox ( preferable ) or Google Chrome. Also, keep NumLock ON while using the Numeric Keypad of any Keyboard. Can also attach an external USB keyboard, with seperate Numeric Keypad, if required. I have added some general guidelines for students, using these softwares, on the Wiki Page of this website. Please refer them for more instructions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Livrarium A.I PROJECT

    Livrarium A.I PROJECT

    A Text Editor for Lazy Writers.

    Hello, well to keep it simple, I wanted a simple text writer software to be able to write as much things as I want, and to load it whenever I wanted to edit it. Well, thanks to ChatGPT and the Web2Exe software I was able to just obtain this result. You can use the config.ini file in order to change the font of the text.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    General Knowledge Machine Project

    General Knowledge Machine Project

    Intellect Modeling Kit: assisting research, diagnostics, consulting

    We humans are bound by intellectual abilities. All knowledge is far beyond power of any person. The only way to apply knowledge is to build machines able to present it human way but not limited by volume. Intellect Modeling Kit (IMK) is intended to build knowledge machines (KM) assisting experts on the steps of activity: * Observation; * Producing propositions based on knowledge; * Elimination of impossible propositions; * Selection and verification of the most appropriate...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 25
    Read Aloud

    Read Aloud

    An awesome browser extension that reads aloud webpage content

    Read Aloud is a browser extension for Chrome, Firefox, and other Chromium-based browsers that converts webpage text to audio using text-to-speech technology. It is designed to work on a wide variety of sites, including news, blogs, online textbooks, course materials, fanfiction, and more. The extension targets users who prefer listening over reading, as well as people with dyslexia, other learning disabilities, or eye strain, and children learning to read. Read Aloud lets users choose from...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB