Showing 34 open source projects for "speech text"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • 1
    Pot Desktop

    Pot Desktop

    A cross-platform software for text translation and recognition

    Pot-Desktop is a cross-platform productivity tool aimed at helping users quickly translate, perform OCR (optical character recognition), and synthesize speech for selected text or images — all with minimal friction. It supports picking text via mouse selection (“highlight-and-translate”), clipboard listening, or screenshot-based OCR; this makes it ideal for reading webpages, documents, images — or any on-screen text — and instantly getting translations or text extraction. The tool supports external plugin extensions, which means its functionality can be expanded far beyond the built-in options: you can add translation engines, OCR backends, TTS engines, vocabulary export (e.g. for language learning), and more. ...
    Downloads: 25 This Week
    Last Update:
    See Project
  • 2
    Supertonic

    Supertonic

    Lightning-fast, on-device TTS, running natively via ONNX

    Supertonic is a lightning-fast, on-device text-to-speech system built around ONNX Runtime for maximum speed and portability. It focuses on running entirely locally, eliminating the need for cloud APIs and providing low latency and strong privacy guarantees, even on constrained devices like Raspberry Pi boards and e-readers. The core model is highly compact at around 66 million parameters, yet benchmarks show it can generate speech up to 167× faster than real time on modern consumer hardware and significantly outpace popular cloud TTS APIs in throughput and real-time factor. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    Kokoro

    Kokoro

    An inference library for Kokoro-82M

    Kokoro is an open-weight text-to-speech model and inference library built around the lightweight Kokoro-82M model. It is designed to generate high-quality speech from text while staying fast, compact, and cost-efficient compared with larger TTS systems. The project is useful for developers who want deployable speech synthesis without depending on a closed platform.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    comfyui-mixlab-nodes

    comfyui-mixlab-nodes

    Workflow and speech recognition app

    comfyui-mixlab-nodes is a large collection of custom nodes for ComfyUI that turns workflows into interactive apps and adds real-time multimedia, LLM, and TTS capabilities. It introduces a “Workflow-to-APP” concept, where a ComfyUI graph can be transformed into a Web App through an AppInfo node, complete with categories, batch prompts, and editable configurations. The project also brings Real-time Design features like screen capture and floating video nodes, enabling creative pipelines that...
    Downloads: 6 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    Koodo Reader

    Koodo Reader

    A modern ebook manager and reader with sync and backup

    ...Customize the source folder and synchronize among multiple devices using OneDrive, iCloud, Dropbox, etc. Single-column, two-column, or continuous scrolling layouts. Text-to-speech, translation, progress slider, touch screen support, batch import. Add bookmarks, notes, highlights to your books. Adjust font size, font family, line-spacing, paragraph spacing, background color, text color, margins, and brightness. Night mode and theme color. Text highlight, underline, boldness, italics and shadow. Adjust font size, font family, line-spacing, paragraph spacing, background color, text color, margins, and brightness.
    Downloads: 25 This Week
    Last Update:
    See Project
  • 6
    FastRTC

    FastRTC

    The python library for real-time communication

    ...FastRTC also integrates nicely with UI frameworks (e.g. via a web demo using Gradio), so developers can rapidly prototype and deploy real-time streaming applications without deep knowledge of low-level WebRTC internals. Because voice-enabled AI agents often involve many moving parts (speech-to-text, text processing, text-to-speech, streaming, session/chat management), FastRTC helps by handling the streaming aspect, leaving the rest to be plugged in modularly.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    annyang!

    annyang!

    Speech recognition for your site

    ...You can easily add a GUI for the user to interact with Speech Recognition using Speech KITT. Speech KITT is fully customizable and comes with many different themes, and instructions on how to create your own designs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Node.js Client For NLP Cloud

    Node.js Client For NLP Cloud

    NLP Cloud serves high performance pre-trained or custom models

    ...NLP Cloud serves high-performance pre-trained or custom models for NER, sentiment analysis, classification, summarization, dialogue summarization, paraphrasing, intent classification, product description and ad generation, chatbot, grammar and spelling correction, keywords and keyphrases extraction, text generation, image generation, blog post generation, text generation, question answering, automatic speech recognition, machine translation, language detection, semantic search, semantic similarity, tokenization, POS tagging, embeddings, and dependency parsing. It is ready for production, and served through a REST API. You can either use the NLP Cloud pre-trained models, fine-tune your own models, or deploy your own models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Open-LLM-VTuber

    Open-LLM-VTuber

    Open source AI VTuber platform with voice chat and Live2D avatars

    Open-LLM-VTuber is an open source platform designed to create AI-powered VTuber characters that can interact with users through voice and animated avatars. It enables hands-free conversations with large language models by combining speech recognition, language processing, and text-to-speech synthesis into a single system. Users can speak directly to the AI character, and the system can respond with a generated voice while animating a Live2D avatar to simulate a talking virtual personality. Open-LLM-VTuber is modular, allowing developers to swap or configure different language models, speech recognition engines, and voice synthesis systems depending on their needs. ...
    Downloads: 17 This Week
    Last Update:
    See Project
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 10
    Deep Chat

    Deep Chat

    Customizable AI chat component for websites with API support

    Deep Chat is a highly customizable web component designed to simplify the integration of AI-powered chat interfaces into websites. It allows developers to embed a fully functional chatbot using minimal setup, while still offering extensive control over behavior, appearance, and integrations. Deep Chat supports connections to a wide range of AI services as well as custom backends, enabling flexible deployment for different use cases. It is built as a framework-agnostic solution, meaning it...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    FAY

    FAY

    Framework for building AI-powered interactive digital humans and agent

    ...Fay supports various types of digital humans, including 2.5D and 3D avatars, and can be integrated with applications running on mobile devices, PCs, web platforms, and embedded systems. Its architecture allows developers to combine different AI components such as speech recognition, text-to-speech, and large language models to create conversational digital agents. Fay provides multiple interfaces for text, voice, and digital human control, enabling developers to build interactive assistants, virtual presenters, or automated service agents. It also supports custom knowledge bases and configurable behaviors so developers can tailor the personality and responses of the digital human.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    Operit AI

    Operit AI

    Powerful Android AI agent with tools, automation, and Linux shell

    Operit is a full-featured AI assistant and agent platform designed specifically for Android devices, aiming to go far beyond traditional chat-based interfaces. It integrates deep system-level capabilities with a wide range of tools, allowing the AI to perform real tasks such as file management, automation, and system control directly on the device. A standout aspect of the project is its built-in Ubuntu 24 environment, which enables users to run Linux commands, scripts, and development tools...
    Downloads: 29 This Week
    Last Update:
    See Project
  • 13
    Agili Hacker Podcast

    Agili Hacker Podcast

    AI tool that turns Hacker News posts into daily podcast updates

    ...This creates a hands-free way to stay updated on tech, startups, and developer discussions without reading long threads. Hacker Podcast combines content aggregation, natural language processing, and text-to-speech to deliver clear and digestible updates. Users can listen through web interfaces or podcast platforms, while also accessing written summaries for deeper reading. Built with modern web technologies, the project focuses on automation, speed, and accessibility. It supports continuous updates, allowing listeners to receive fresh insights daily. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    PyGPT

    PyGPT

    Open source personal AI Assistant for Linux, Windows and Mac

    ...It allows you to talk in chat mode and in completion mode, as well as generate images using DALL-E 2. PyGPT also adds access to the Internet for GPT via Google Custom Search API and Wikipedia API and includes voice synthesis using Microsoft Azure Text-to-Speech API. Moreover, the application has implemented context memory support, context storage, history of contexts, which can be restored at any time and e.g. continue the conversation from point in history, and also has a convenient and intuitive system of presets that allows you to quickly and pleasantly create and manage your prompts. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 15
    eGuideDog free software for the blind
    eGuideDog project develops free software for the blind. Currently, we focus on WebSpeech, Ekho TTS and WebAnywhere.
    Leader badge
    Downloads: 151 This Week
    Last Update:
    See Project
  • 16
    KoboldCpp

    KoboldCpp

    Run GGUF models easily with a UI or API. One File. Zero Install.

    KoboldCpp is an easy-to-use AI text-generation software for GGML and GGUF models, inspired by the original KoboldAI. It's a single self-contained distributable that builds off llama.cpp and adds many additional powerful features.
    Leader badge
    Downloads: 388 This Week
    Last Update:
    See Project
  • 17
    Transcrib
    Transcrib — десктопное Electron-приложение для перевода аудио и видео в текстовый формат. Программа работает на базе мощной модели Whisper от OpenAI и Node.js. «Из коробки» в дистрибутив уже вшиты две легкие модели — `tiny` и `base`. Остальные (`small`, `medium`, `large`) можно скачать и обновлять прямо внутри приложения по мере необходимости. Логика простая: чем сложнее модель, тем выше качество текста, но тем дольше идет обработка.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18
    Ohod Quiz Game

    Ohod Quiz Game

    quiz game with spin wheel

    - quiz game - work in any system - تعمل على جميع الانظمة - multi language utf-8 - متعددة اللغات
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19
    Softwares For Blind, Deaf, Handicap

    Softwares For Blind, Deaf, Handicap

    Easy AI Softwares for Blind, Deaf, Handicapped, Disabled People

    Just download the above zip file, extract it first and then open the index.html file on internet browsers like Firefox ( preferable ) or Google Chrome. Also, keep NumLock ON while using the Numeric Keypad of any Keyboard. Can also attach an external USB keyboard, with seperate Numeric Keypad, if required. I have added some general guidelines for students, using these softwares, on the Wiki Page of this website. Please refer them for more instructions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    anchorcastapp

    anchorcastapp

    Free AI-powered church presentation & live sermon display app

    AnchorCast is a free, open-source AI-powered church presentation desktop app for Windows and MacOS, built with Electron. Features: - Live Sermon Transcription — real-time speech-to-text via Whisper AI - AI Bible Verse Detection — automatically detects and displays verses from live sermons - Song Manager — display song lyrics on projection - Media Playback — images and video on projection screen - NDI Output — stream projection over local network - Remote Control — control presentation from any phone via Wi-Fi - Timer Display — countdown and service timers - Presentation Editor — create and manage slide decks - Theme Manager — customize projection appearance - Sermon Intelligence & Analytics Free for all churches. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 21
    Voice Accounting For Blind & Mute People

    Voice Accounting For Blind & Mute People

    Free & Easy AI Voice Accounting Software For Blind & Speechless People

    Just download the above zip file, extract it and then open the index.html file on internet browsers like Firefox ( preferable ) or Google Chrome. Also, please view and download my full collection of softwares for people with disabilities, here : https://sourceforge.net/projects/softwares-for-disabled-people/ This full collection also includes the Voice Accounting Software as well.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Read Aloud

    Read Aloud

    An awesome browser extension that reads aloud webpage content

    Read Aloud is a browser extension for Chrome, Firefox, and other Chromium-based browsers that converts webpage text to audio using text-to-speech technology. It is designed to work on a wide variety of sites, including news, blogs, online textbooks, course materials, fanfiction, and more. The extension targets users who prefer listening over reading, as well as people with dyslexia, other learning disabilities, or eye strain, and children learning to read.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    XZVoice

    XZVoice

    Free and open source text-to-speech software

    Text-to-speech software developed by Electron + vue + ElementUI + js. The high-fidelity and flexible configuration of speech synthesis products opens up the closed loop of human-computer interaction and enables applications to sound realistically. A variety of timbres are available, and functions such as adjusting speech rate, intonation, and volume are provided.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Free Queue Manager

    Free Queue Manager

    Web based python-flask Queue management system

    A web based management system developed for the purpose of easing the process of orgnizing queues and lines. Like many other (QMS)s Queue Management Systems, FQM does provide a basic dashboard to allow the users of the system and customers alike to interact with the system via a basic yet simple user interface . Brief user guide can be found on https://fqms.github.io/images/user_guide.pdf
    Leader badge
    Downloads: 4 This Week
    Last Update:
    See Project
  • 25
    chatbot_chung
    chatbot chung is a keywords based probabilities algorythm simple entertainment chatbot with 3D talking openGL avatars written in freebasic. Can import aiml simple question/answer or question/random/answers or single star/ multi srai data saved from "AIML_chung" open source application . Online html5 javascript version with 44 languages multilingual auto detection available on the website (source included in the zip file). SORT gentext text generation algorythm option added (desktop version) .
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
Auth0 Logo