Showing 23 open source projects for "natural language processing"

View related business solutions
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • Stop Storing Third-Party Tokens in Your Database Icon
    Stop Storing Third-Party Tokens in Your Database

    Auth0 Token Vault handles secure token storage, exchange, and refresh for external providers so you don't have to build it yourself.

    Rolling your own OAuth token storage can be a security liability. Token Vault securely stores access and refresh tokens from federated providers and handles exchange and renewal automatically. Connected accounts, refresh exchange, and privileged worker flows included.
    Try Auth0 for Free
  • 1
    NVIDIA NeMo

    NVIDIA NeMo

    Toolkit for conversational AI

    NVIDIA NeMo, part of the NVIDIA AI platform, is a toolkit for building new state-of-the-art conversational AI models. NeMo has separate collections for Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS) models. Each collection consists of prebuilt modules that include everything needed to train on your data. Every module can easily be customized, extended, and composed to create new conversational AI model architectures. Conversational AI architectures are typically large and require a lot of data and compute for training. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    VoxCPM2

    VoxCPM2

    Tokenizer-Free TTS for Multilingual Speech Generation

    ...It also introduces voice design capabilities, allowing users to generate entirely new voices from natural language descriptions without requiring reference audio.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 3
    IMS Toucan

    IMS Toucan

    Controllable and fast Text-to-Speech for over 7000 languages

    IMS-Toucan is a toolkit for training, using, and teaching state-of-the-art text-to-speech systems, built at the Institute for Natural Language Processing (IMS), University of Stuttgart. It is the official home of ToucanTTS, a massively multilingual TTS system designed to support over 7,000 languages with a single unified framework. The toolkit focuses on being fast and controllable while not requiring huge amounts of compute, making it practical for research labs and smaller teams. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    Qwen3-TTS

    Qwen3-TTS

    Qwen3-TTS is an open-source series of TTS models

    Qwen3-TTS is an open-source text-to-speech (TTS) project built around the Qwen3 large language model family, focused on generating high-quality, natural-sounding speech from plain text input. It provides researchers and developers with tools to transform text into expressive, intelligible audio, supporting multiple languages and voice characteristics tuned for clarity and fluidity. The project includes pre-trained models and inference scripts that let users synthesize speech locally or integrate TTS into larger pipelines such as voice assistants, accessibility tools, or multimedia generation workflows. ...
    Downloads: 13 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    Speech Note

    Speech Note

    Speech Note Linux app. Note taking, reading and translating

    Speech Note is a Linux desktop and Sailfish OS application for taking, reading, and translating notes with integrated offline speech technology. It combines speech-to-text, text-to-speech, and machine translation in a single interface, allowing users to dictate notes, listen back to them, and translate them without ever sending data to the cloud. All processing is done locally, which means audio, text, and translations never leave the device, emphasizing strong privacy guarantees. The...
    Downloads: 25 This Week
    Last Update:
    See Project
  • 6
    Orpheus TTS

    Orpheus TTS

    Towards Human-Sounding Speech

    Orpheus TTS is a state-of-the-art open-source text-to-speech system built on a Llama-3B backbone, treating speech synthesis as a large language model problem instead of a traditional TTS pipeline. It is designed to produce human-like speech with natural intonation, emotion, and rhythm, targeting quality comparable to or better than many closed-source systems. The project ships both pretrained and finetuned English models, as well as a family of multilingual models released as a research preview, and includes data-processing scripts so users can train or finetune their own variants. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    abogen

    abogen

    Generate audiobooks from EPUBs, PDFs and text with captions

    abogen is a tool designed to generate audiobooks (or speech narrations) from textual sources such as EPUBs, PDFs, or plain text, with synchronized captions. In other words, it automates the pipeline of reading a digital book (or document), converting its text into speech via a TTS engine, and packaging the result into an audiobook format — likely along with timestamped captions or subtitles that align with the spoken audio. This can be very useful for accessibility, content consumption on...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    ESPnet

    ESPnet

    End-to-end speech processing toolkit

    ESPnet is a comprehensive end-to-end speech processing toolkit covering a wide spectrum of tasks, including automatic speech recognition (ASR), text-to-speech (TTS), speech translation (ST), speech enhancement, speaker diarization, and spoken language understanding. It uses PyTorch as its deep learning engine and adopts a Kaldi-style data processing pipeline for features, data formats, and experimental recipes.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    OmniVoice

    OmniVoice

    High-Quality Voice Cloning TTS for 600+ Languages

    The OmniVoice project is a cutting-edge multilingual text-to-speech system designed to generate high-quality speech across more than 600 languages. Built on a diffusion language model-style architecture, it combines scalability with strong performance, enabling both natural-sounding voice synthesis and efficient inference speeds. One of its most notable capabilities is zero-shot voice cloning, allowing users to replicate a speaker’s voice using only a short reference audio clip. In addition, it supports voice design through configurable attributes such as gender, accent, pitch, and speaking style, giving users fine-grained control over generated speech. ...
    Downloads: 16 This Week
    Last Update:
    See Project
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • 10
    Bailing

    Bailing

    Bailing is a voice dialogue robot similar to GPT-4o

    Bailing is an open-source voice-dialogue assistant designed to deliver natural voice-based conversations by combining automatic speech recognition (ASR), voice activity detection (VAD), a large language model (LLM), and text-to-speech (TTS) in a single pipeline. Its goal is to offer a “voice-first” chat experience similar to what one might expect from a system like GPT-4o, but fully open and deployable by users.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Readest

    Readest

    Readest is a modern, feature-rich ebook reader

    Readest is a project meant to facilitate reading, studying, or consuming content by integrating reading tools with AI-powered assistance. Although the repository is not as widely documented or popular as some, the idea is that Readest supports features to help with reading comprehension — likely combining OCR / text retrieval, translation, note-taking, or summarization for reading materials (eBooks, articles, PDFs). The goal appears to be to let users feed in arbitrary reading material and...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 12
    Spark TTS

    Spark TTS

    Spark-TTS Inference Code

    Spark TTS is an open-source, PyTorch-based text-to-speech inference system that leverages large language models to produce highly natural, intelligible speech from text input. It uses an efficient single-stream architecture where speech tokens are directly reconstructed from the predictions of an LLM, removing the need for external acoustic models or complex vocoders and making the generation pipeline cleaner and faster. The project supports zero-shot voice cloning, meaning it can imitate a new speaker’s voice without dedicated training for that specific voice, and works across languages, including English and Chinese, even in cross-lingual code-switching scenarios. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    FireRedTTS-2

    FireRedTTS-2

    Long-form streaming TTS system for multi-speaker dialogue generation

    ...FireRedTTS2 supports multilingual output and speaker flexibility, enabling scenarios that involve language switching, cross-lingual voice cloning, and expressive dialogue generation that maintains consistency over longer utterances.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    GLM-TTS

    GLM-TTS

    Controllable & emotion-expressive zero-shot TTS

    GLM-TTS is an advanced text-to-speech synthesis system built on large language model technologies that focuses on producing high-quality, expressive, and controllable spoken output, including features like emotion modulation and zero-shot voice cloning. It uses a two-stage architecture where a generative LLM first converts text into intermediate speech token sequences and then a Flow-based neural model converts those tokens into natural audio waveforms, enabling rich prosody and voice character even for unseen speakers. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Supertonic

    Supertonic

    Lightning-fast, on-device TTS, running natively via ONNX

    Supertonic is a lightning-fast, on-device text-to-speech system built around ONNX Runtime for maximum speed and portability. It focuses on running entirely locally, eliminating the need for cloud APIs and providing low latency and strong privacy guarantees, even on constrained devices like Raspberry Pi boards and e-readers. The core model is highly compact at around 66 million parameters, yet benchmarks show it can generate speech up to 167× faster than real time on modern consumer hardware...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    Open Vision Agents by Stream

    Open Vision Agents by Stream

    Build Vision Agents quickly with any model or video provider

    Open Vision Agents by Stream is an open source framework from Stream for building real time, multimodal AI agents that watch, listen, and respond to live video streams. It focuses on combining video understanding models, such as YOLO and Roboflow based detectors, with real time large language models like OpenAI Realtime and Gemini Live to create interactive experiences. The framework uses Stream’s ultra low latency edge network so agents can join sessions quickly and maintain very low audio and video latency while processing frames and generating responses. Developers work with an agent abstraction that connects video edge providers, LLMs, and processors into pipelines, making it easier to orchestrate tasks like object detection, pose estimation, and conversational guidance. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    AudioBC

    AudioBC

    Offline desktop app to convert EPUB to MP3 using Kokoro-82M neural TTS

    ...Unlike most Text-to-Speech (TTS) tools that require expensive cloud API subscriptions or an active internet connection, AudioBC runs entirely on your local machine. Powered by the state-of-the-art Kokoro-82M neural engine, AudioBC produces natural, human-like speech that rivals premium cloud services. It is built with a focus on privacy and simplicity, offering a streamlined three-step workflow: Extract, Edit, and Convert. Key Features: Neural Quality TTS: Uses the compact yet powerful Kokoro-82M model for high-fidelity, expressive voice synthesis. Privacy-First & Offline: After a one-time initial model download, all processing happens on your CPU. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    StyleTTS 2

    StyleTTS 2

    Towards Human-Level Text-to-Speech through Style Diffusion

    StyleTTS2 is a state-of-the-art text-to-speech system that aims for human-level naturalness by combining style diffusion, adversarial training, and large speech language models. It extends the original StyleTTS idea by introducing a style diffusion model that can sample rich, realistic speaking styles conditioned on reference speech, allowing highly expressive and diverse prosody. The architecture uses a two-stage training process and leverages an auxiliary speech language model to guide generation toward more natural and coherent utterances. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 19
    Amica

    Amica

    Amica is an open source interface for interactive communication

    Amica is an open source interface for interacting with fully animated 3D characters that combine voice chat, vision, and an emotion engine into a single experience. It lets you hold natural conversations with AI characters that can see, listen, and speak, while expressing emotional states through facial expressions and body language. Users can import VRM character models, adjust their appearance, tune the voice to match the character, and define behavior using different large language models and TTS backends. Under the hood, Amica leverages modern web and desktop technologies: three.js and three-vrm for 3D rendering, Transformers.js for running models in the browser, Whisper and Silero VAD for speech recognition and voice-activity detection, and a variety of LLM backends such as llama.cpp servers, ChatGPT-compatible APIs, Ollama, KoboldCpp, and others. ...
    Downloads: 15 This Week
    Last Update:
    See Project
  • 20
    wukong-robot

    wukong-robot

    Chinese voice dialogue robot/smart speaker project

    wukong-robot is a Chinese voice assistant / smart speaker project built to let makers and hackers design highly customizable voice-controlled devices. It combines wake-word detection, automatic speech recognition, natural language understanding, and text-to-speech into a single framework aimed at the Chinese-speaking ecosystem. The project is positioned as a simple, flexible, and elegant platform that can run on devices like Raspberry Pi and other Linux-based boards, making it suitable for DIY smart speakers and home-automation hubs. It supports multi-turn conversational capabilities powered by ChatGPT or other large language models, letting users have continuous dialogues rather than one-shot commands. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    PaddlePaddle models

    PaddlePaddle models

    Pre-trained and Reproduced Deep Learning Models

    Pre-trained and Reproduced Deep Learning Models ("Flying Paddle" official model library, including a variety of academic frontier and industrial scene verification of deep learning models) Flying Paddle's industrial-level model library includes a large number of mainstream models that have been polished by industrial practice for a long time and models that have won championships in international competitions; it provides many scenarios for semantic understanding, image classification,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Dragonfire

    Dragonfire

    The open-source virtual assistant for Ubuntu based Linux distributions

    Dragonfire is the open-source virtual assistant project for Ubuntu-based Linux distributions. Her main objective is to serve as a command and control interface to the helmet user. So that you will be able to give orders just by using your voice commands and your eye movements. That makes the helmet handsfree. We are planning to ship Dragonfire as a preinstalled software package on DragonOS Linux Distribution. DragonOS will be a Linux distribution specially designed for the helmet. It will...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23

    Bermuda Text-to-Speech

    This project includes basic NLP and DSP techniques for Text-to-Speech

    See TTS demo at: http://rslp.racai.ro/index.php?page=tts This is an entirely written in JAVA project which includes a set of tools and methods designed to enable Multilingual Text-to-Speech (TTS) synthesis. We currently support English and Romanian but we will soon train more models and make them available for download. If you want to read more about our other NLP and TTS tools check out http://nlptools.racai.ro.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
Auth0 Logo