Showing 142 open source projects for "text voice"

View related business solutions
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • Auth0 B2B Essentials: SSO, MFA, and RBAC Built In Icon
    Auth0 B2B Essentials: SSO, MFA, and RBAC Built In

    Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

    Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.
    Sign Up Free
  • 1
    Parakeet

    Parakeet

    PAddle PARAllel text-to-speech toolKIT

    PAddle PARAllel text-to-speech toolKIT (supporting Tacotron2, Transformer TTS, FastSpeech2/FastPitch, SpeedySpeech, WaveFlow and Parallel WaveGAN) Parakeet aims to provide a flexible, efficient and state-of-the-art text-to-speech toolkit for the open-source community. It is built on PaddlePaddle dynamic graph and includes many influential TTS models. In order to facilitate exploiting the existing TTS models directly and developing the new ones, Parakeet selects typical models and provides...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    VoiceFixer

    VoiceFixer

    General Speech Restoration

    VoiceFixer is a machine-learning framework for “speech restoration”: given a degraded or distorted audio recording — with noise, clipping, low sampling rate, reverberation, or other artifacts — it attempts to recover high-fidelity, clean speech. The architecture works in two stages: first an analysis stage that tries to extract “clean” intermediate features from the noisy audio (e.g. removing noise, denoising, dereverberation, upsampling), and then a neural vocoder-based synthesis stage that...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    TensorFlowTTS

    TensorFlowTTS

    Real-Time State-of-the-art Speech Synthesis for Tensorflow 2

    ...With integrated vocoder + mel-spectrogram generation pipelines, pre-trained models, and fairly flexible architecture, TensorFlowTTS is a great off-the-shelf and extensible TTS engine for applications ranging from voice assistants to content generation or accessibility tools.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    PaddlePaddle models

    PaddlePaddle models

    Pre-trained and Reproduced Deep Learning Models

    Pre-trained and Reproduced Deep Learning Models ("Flying Paddle" official model library, including a variety of academic frontier and industrial scene verification of deep learning models) Flying Paddle's industrial-level model library includes a large number of mainstream models that have been polished by industrial practice for a long time and models that have won championships in international competitions; it provides many scenarios for semantic understanding, image classification,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • 5
    Multilingual Speech Synthesis

    Multilingual Speech Synthesis

    An implementation of Tacotron 2 that supports multilingual experiments

    This repository provides synthesized samples, training and evaluation data, source code, and parameters for the paper One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech. It contains an implementation of Tacotron 2 that supports multilingual experiments and that implements different approaches to encoder parameter sharing. It presents a model combining ideas from Learning to speak fluently in a foreign language: Multilingual speech synthesis and cross-language voice cloning, End-to-End Code-Switched TTS with Mix of Monolingual Recordings, and Contextual Parameter Generation for Universal Neural Machine Translation. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Snips NLU

    Snips NLU

    Snips Python library to extract meaning from text

    Snips NLU is a Natural Language Understanding python library that allows to parse sentences written in natural language, and extract structured information. It’s the library that powers the NLU engine used in the Snips Console that you can use to create awesome and private-by-design voice assistants. The exact output is a bit richer, the point here is to give a glimpse on what kind of information can be extracted. Behind every chatbot and voice assistant lies a common piece of technology:...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Dragonfire

    Dragonfire

    The open-source virtual assistant for Ubuntu based Linux distributions

    ...Dragonfire uses Mozilla DeepSpeech to understand your voice commands and Festival Speech Synthesis System to handle text-to-speech tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Resemblyzer

    Resemblyzer

    A python package to analyze and compare voices with deep learning

    Resemblyzer is a Python package for analyzing and comparing voices with deep learning. It works by turning speech audio into a compact voice embedding that represents the speaker’s vocal characteristics. These embeddings can then be used for speaker similarity, clustering, diarization experiments, voice comparison, and audio dataset exploration. The project is useful for researchers and developers who need a practical way to reason about speaker identity without building a voice encoder from...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    Rasa Core

    Rasa Core

    Rasa Core is now part of the Rasa repo

    Rasa is an open source machine learning framework to automate text and voice-based conversations. With Rasa, you can build contextual assistants. Rasa helps you build contextual assistants capable of having layered conversations with lots of back-and-forth. In order for a human to have a meaningful exchange with a contextual assistant, the assistant needs to be able to use context to build on things that were previously discussed – Rasa enables you to build assistants that can do this in a scalable way.
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    Deepvoice3_pytorch

    Deepvoice3_pytorch

    PyTorch implementation of convolutional neural networks

    An open source implementation of Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    JAVT - Just Another Voice Transformer

    JAVT - Just Another Voice Transformer

    Just Another Speech Recognition and Text to Speech software.

    JAVT or Just Another Voice Transformer (formerly, it is called Just Another Video Transcriber) is a Speech Recognition software that also support text to Speech and simple media conversion. JAVT allows you to convert from video files to audio wav file using ffmpeg, and then transcribe the audio file to text using either Microsoft SAPI or CMU Sphinx. You can also open a text file and allow JAVT to read it out for you through text to speech conversion.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    This program uses crontab to set dates, times, etc... for this alarm clock. The GUI interfaces was built using easygui for Python, and the script is activated using a bash script. This program was built for Mac.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Lioness (Languages Interop Framework)
    Framework for making Windows applications that are one .exe file in AutoHotKey_L,C++,C#, VB.NET,Java,Groovy,Common Lisp,Nemerle,Ruby,Python,PHP,Lua,Tcl,Perl,Jint,S#,WSH VBScript,HTML/JavaScript/CSS,COM, PowerShell without compiling . For .NET 4.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    A framework for creating IRC bots and clients using Python 2 or 3, focusing on ease-of-use and intuition. IRC a low level of raw events and event objects, or a higher level w channel/user objects, as well as thread and commonly used IRC client features.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Voice Conference Manager uses VoiceXML and CCXML to control speech recognition, text to speech, and voice biometrics for a telephone conference service. Say the names or numbers of people and VCM places them into the call. Can be hosted on public servers
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Sayz Me is a text-to-speech application for Windows. Text can be typed in or read from clipboard. Words are highlighted when spoken. Select voice, adjust reading speed, voice pitch, font and color. Simple and easy to use.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 17
    Dia-1.6B

    Dia-1.6B

    Dia-1.6B generates lifelike English dialogue and vocal expressions

    Dia-1.6B is a 1.6 billion parameter text-to-speech model by Nari Labs that generates high-fidelity dialogue directly from transcripts. Designed for realistic vocal performance, Dia supports expressive features like emotion, tone control, and non-verbal cues such as laughter, coughing, or sighs. The model accepts speaker conditioning through audio prompts, allowing limited voice cloning and speaker consistency across generations.
    Downloads: 0 This Week
    Last Update:
    See Project
Auth0 Logo