Search Results for "open source speech to text software" - Page 10

Showing 542 open source projects for "open source speech to text software"

View related business solutions
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Access competitive interest rates on your digital assets.

    Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 1

    pyLogos

    Qualitative content analysis software.

    pyLogos is a program to support text content analysis. Documents (imported from txt and docx files) are stored in a database, and may have marked text segments associated with codes. It is possible to retrieve these segments in various ways, generate word clouds, tabulate frequency of codes and words, among other outputs. pyLogos é um programa de apoio à análise de conteúdo de textos. Documentos (importados de arquivos txt e docx) são armazenados numa base de dados, podendo ter segmentos...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Ailice

    Ailice

    AIlice is a fully autonomous, general-purpose AI agent

    AIlice is an open-source autonomous AI agent framework built to function as a general-purpose assistant that can plan, decompose, and execute complex tasks through a structured multi-agent architecture. The project presents itself as a standalone assistant powered by open-source language models, with an internal design that treats user requests almost like executable programs rather than simple chat prompts. Its core IACT architecture allows the system to break large goals into smaller...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3

    dockserver-talk

    Python module to interface with a Slocum glider dockserver

    Dockserver-talk is a python module that provides an easy way to build custom clients that communicate with Teledyne Webb Research's dockserver software, used to control "Slocum" underwater gliders. The package comes with two applications built on top of dockserverTalk: BUGS: a program to monitor and display the position of gliders (as well as buoys and ships) with respect to a given GPS location (from an external source). Surfalarm: a program that can send text messages or make...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    bitfarm-Archiv Document Management - DMS
    bitfarm-Archiv is a powerful Document Management (DMS), Enterprise Content Management (ECM) and Knowledge Management System (KMS) with Workflow Components. Help us! As we live in the internet age, the best thing, you can help, is to write a short statement about your scenario and your use of the DMS, along with your experiences and put it on your own website or in a blog or forum. It would help us best, if you can also add a hyperlink to our site http://www.bitfarm-archiv.com. By this...
    Downloads: 12 This Week
    Last Update:
    See Project
  • Add Two Lines of Code. Get Full APM. Icon
    Add Two Lines of Code. Get Full APM.

    AppSignal installs in minutes and auto-configures dashboards, alerts, and error tracking.

    Works out of the box for Rails, Django, Express, Phoenix, and more. Monitoring exceptions and performance in no time.
    Start Free
  • 5
    Coqui TTS

    Coqui TTS

    A deep learning toolkit for Text-to-Speech, battle-tested in research

    TTS is a library for advanced Text-to-Speech generation. It's built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed and quality. TTS comes with pre-trained models, tools for measuring dataset quality and is already used in 20+ languages for products and research projects. High-performance Deep Learning models for Text2Speech tasks. Text2Spec models (Tacotron, Tacotron2, Glow-TTS, SpeedySpeech). Speaker Encoder to compute speaker embeddings...
    Downloads: 30 This Week
    Last Update:
    See Project
  • 6
    VALL-E X

    VALL-E X

    Open source implementation of Microsoft's VALL-E X zero-shot TTS model

    VALL-E-X is an open-source implementation of Microsoft’s VALL-E X zero-shot text-to-speech model, focused on multilingual, cross-lingual voice cloning. It is capable of synthesizing speech in English, Chinese, and Japanese from text while mimicking the voice characteristics of a speaker given only a short 3–10 second prompt. The model attempts to match not just timbre, but also tone, pitch, emotion, and prosody of the reference audio, resulting in highly personalized output. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    vits_chinese

    vits_chinese

    Best practice TTS based on BERT and VITS

    vits_chinese is an implementation of the VITS end-to-end text-to-speech (TTS) architecture tailored for Chinese (and possibly multilingual) speech synthesis. VITS is a model combining variational autoencoders (VAEs), normalizing flows, adversarial learning, and a stochastic duration predictor — a design that enables generation of natural, expressive speech, capturing variations in rhythm and prosody. By customizing or porting VITS for Chinese, this project aims to produce high-quality TTS...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Transformers4Rec

    Transformers4Rec

    Transformers4Rec is a flexible and efficient library

    Transformers4Rec is an advanced recommendation system library that leverages Transformer models for sequential and session-based recommendations. The library works as a bridge between natural language processing (NLP) and recommender systems (RecSys) by integrating with one of the most popular NLP frameworks, Hugging Face Transformers (HF). Transformers4Rec makes state-of-the-art transformer architectures available for RecSys researchers and industry practitioners. Traditional recommendation...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    Maia
    MAIA (MyApp Intelligence Artificial) is designed to provide a foundation for building your own voice-controlled assistant with Python. It uses various libraries and modules for speech recognition, text-to-speech synthesis, and custom functionality.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 10
    Parallel WaveGAN

    Parallel WaveGAN

    Unofficial Parallel WaveGAN

    Parallel WaveGAN is an unofficial PyTorch implementation of several state-of-the-art non-autoregressive neural vocoders, centered on Parallel WaveGAN but also including MelGAN, Multiband-MelGAN, HiFi-GAN, and StyleMelGAN. Its main goal is to provide a real-time neural vocoder that can turn mel spectrograms into high-quality speech audio efficiently. The repository is designed to work hand-in-hand with ESPnet-TTS and NVIDIA Tacotron2-style front ends, so you can build complete TTS or singing...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    towhee

    towhee

    Framework that is dedicated to making neural data processing

    Towhee is an open-source machine-learning pipeline that helps you encode your unstructured data into embeddings. You can use our Python API to build a prototype of your pipeline and use Towhee to automatically optimize it for production-ready environments. From images to text to 3D molecular structures, Towhee supports data transformation for nearly 20 different unstructured data modalities. We provide end-to-end pipeline optimizations, covering everything from data decoding/encoding, to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    MahaKurawa.My.ID URL Extractor

    MahaKurawa.My.ID URL Extractor

    MahaKurawa.My.ID URL Extractor is Simple Tool to extract unique URL

    MahaKurawa.My.ID URL Extractor is Simple Tool to extract unique URL from any text content in instant. It's useful when you lazy enough to identify and copy-paste URL from your content one by one by yourself.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    SoftVC VITS Singing Voice Conversion

    SoftVC VITS Singing Voice Conversion

    SoftVC VITS Singing Voice Conversion

    SoftVC VITS Singing Voice Conversion is a deep learning project focused on singing voice conversion, allowing users to transform one voice into another while preserving melody and timing. Unlike traditional text-to-speech systems, it specializes specifically in singing scenarios and does not provide general TTS functionality. The project leverages neural network architectures derived from VITS and SoftVC research to achieve high-quality voice transformation. It is commonly used in creative...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    eCxx

    eCxx

    A C++ library for AVR and NodeMCU

    NOTE: This project is marked with 'Status: Abandoned' on SourceForge because not enough time can be dedicated to this project. However it may still get sporadic commits to the repository. eCxx is a library for AVR and NodeMCU tailored for micro LED displays and lighting effects. eCxx is utilizing Makefile build system. Java and Python based applications/tools are also included to ease the development and debugging process using the host PC. On one side, eCxx supports the original...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    LibreOffice

    LibreOffice

    A free and powerful office suite

    LibreOffice is a free and powerful office suite, and a successor to OpenOffice. Its clean interface and feature-rich tools help you unleash your creativity and enhance your productivity. LibreOffice is Free and Open Source Software (FOSS) – development is open to new talent and new ideas, and our software is tested and used daily by a large and devoted user community. Your documents will look professional and clean, regardless of their purpose: a letter, a master thesis, a brochure,...
    Leader badge
    Downloads: 699 This Week
    Last Update:
    See Project
  • 16
    wukong-robot

    wukong-robot

    Chinese voice dialogue robot/smart speaker project

    wukong-robot is a Chinese voice assistant / smart speaker project built to let makers and hackers design highly customizable voice-controlled devices. It combines wake-word detection, automatic speech recognition, natural language understanding, and text-to-speech into a single framework aimed at the Chinese-speaking ecosystem. The project is positioned as a simple, flexible, and elegant platform that can run on devices like Raspberry Pi and other Linux-based boards, making it suitable for...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    Audio Webui

    Audio Webui

    A webui for different audio related Neural Networks

    Audio Webui is a Gradio-based web user interface that unifies a wide range of audio-related neural networks under a single, accessible front end. It is designed as an “all-in-one” environment where users can experiment with text-to-speech, voice cloning, generative music, and other neural audio models without writing boilerplate code. The project supports multiple back-end models and toolchains (such as Bark, RVC, AudioLDM, Audiocraft, and other text-to-audio or voice-cloning tools),...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Parveshdhull AutoTyper

    Parveshdhull AutoTyper

    A Data Entry Tool for Windows and Linux

    Sometimes we have to write content in programs where copy-paste is not allowed, like in data entry software Notepad RT. There are many tools available online but almost all of them only provide trial versions. And requires big payment for continued access. And even if they are free, it is not wise to give complete access to a keyboard to any third-party software. So I wrote this simple-short python script that reads content from a text file then simulates keyboard typing. This Script works...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 19
    StoryTeller

    StoryTeller

    Multimodal AI Story Teller, built with Stable Diffusion, GPT, etc.

    A multimodal AI story teller, built with Stable Diffusion, GPT, and neural text-to-speech (TTS). Given a prompt as an opening line of a story, GPT writes the rest of the plot; Stable Diffusion draws an image for each sentence; a TTS model narrates each line, resulting in a fully animated video of a short story, replete with audio and visuals. To develop locally, install dev dependencies and install pre-commit hooks. This will automatically trigger linting and code quality checks before each...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 20
    script-server

    script-server

    Web UI for your scripts with execution management

    Script-server is a Web UI for scripts. As an administrator, you add your existing scripts into Script server and other users would be able to execute them via a web interface. The UI is very straightforward and can be used by non-tech people. No script modifications are needed - you configure each script in Script server and it creates the corresponding UI with parameters and takes care of validation, execution, etc.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 21
    VATSG

    VATSG

    Video automatic transcribe and translated subtitle generator

    It generates srt format subtitle from videofile which can be any source language that whisper support , and then make translated subtitle file of your target language which deepl support. This is the subtitle generator(VATSG) which use [moviepy](https://github.com/Zulko/moviepy) to generate mp3 and then use [faster-whisper](https://github.com/guillaumekln/faster-whisper) to get text recognition and then use deepl-api to generate your target language subtitle file(srt format) If you...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 22
    auto-subtitle

    auto-subtitle

    Automatically generate and overlay subtitles for any video

    auto-subtitle is a Python-based command-line tool that automatically generates and overlays subtitles on video files using AI-driven speech recognition. It combines FFmpeg with OpenAI’s Whisper model to transcribe spoken audio into text and synchronize it with video playback. The tool processes video input, extracts audio, and produces subtitle files that can be either exported separately or burned directly into the final video output. It supports multiple transcription models with varying...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 23
    UartVide

    UartVide

    A flat UI RS232 serial port communication utility.

    Mainly designed for embedded & software engineers, UartVide is a flat-UI and straightforward and lightweight RS232 serial port communication utility that allows you to configure the connection parameters and communicate via the port. UartVide runs on all platforms supported by PySide2 including Windows, Linux. (MyTerm was renamed to UartVide from version 2.4)
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    doccano client

    doccano client

    A simple client for doccano API

    doccano-client is a simple client wrapper for the doccano API. We're introducing a newly revamped Doccano API Client that features more Pythonic interaction as well as more testing and documentation. It also adds more regulated compatibility with specific Doccano release versions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    VALL-E

    VALL-E

    PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)

    We introduce a language modeling approach for text to speech synthesis (TTS). Specifically, we train a neural codec language model (called VALL-E) using discrete codes derived from an off-the-shelf neural audio codec model, and regard TTS as a conditional language modeling task rather than continuous signal regression as in previous work. During the pre-training stage, we scale up the TTS training data to 60K hours of English speech which is hundreds of times larger than existing systems....
    Downloads: 3 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB