Showing 87 open source projects for "void-linux"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Outgrown Windows Task Scheduler? Icon
    Outgrown Windows Task Scheduler?

    Free diagnostic identifies where your workflow is breaking down—with instant analysis of your scheduling environment.

    Windows Task Scheduler wasn't built for complex, cross-platform automation. Get a free diagnostic that shows exactly where things are failing and provides remediation recommendations. Interactive HTML report delivered in minutes.
    Download Free Tool
  • 1
    ChatTTS_colab

    ChatTTS_colab

    One-click deployment (including offline integration package)

    ChatTTS_colab is a wrapper project around the ChatTTS model that focuses on “one-click” deployment, especially in Google Colab. It provides an integrated offline bundle and scripts for Windows and macOS so users can run ChatTTS locally without wrestling with complex environment setup. The repository includes Colab notebooks that launch a Gradio-based web UI and expose streaming TTS, making it possible to listen to generated audio as it is produced. A distinctive feature is the “voice gacha”...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Dia

    Dia

    A TTS model capable of generating ultra-realistic dialogue

    Dia is a neural text-to-speech model designed specifically for generating ultra-realistic dialogue in a single pass. Instead of focusing on isolated sentences or flat narration, it is optimized for conversational audio, complete with natural turn-taking, prosody, and pacing. The model can be conditioned on a reference audio sample, allowing you to control emotion, tone, and other stylistic aspects of the speech. It can also produce nonverbal vocalizations like laughter, coughs, clearing the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Bailing

    Bailing

    Bailing is a voice dialogue robot similar to GPT-4o

    Bailing is an open-source voice-dialogue assistant designed to deliver natural voice-based conversations by combining automatic speech recognition (ASR), voice activity detection (VAD), a large language model (LLM), and text-to-speech (TTS) in a single pipeline. Its goal is to offer a “voice-first” chat experience similar to what one might expect from a system like GPT-4o, but fully open and deployable by users. The project is modular: each core function — ASR, VAD, LLM, TTS — exists as a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    NVIDIA NeMo Framework

    NVIDIA NeMo Framework

    Scalable generative AI framework built for researchers and developers

    NVIDIA NeMo is a scalable, cloud-native generative AI framework aimed at researchers and PyTorch developers working on large language models, multimodal models, and speech AI (ASR and TTS), with growing support for computer vision. It provides collections of domain-specific modules and reference implementations that make it easier to pre-train, fine-tune, and deploy very large models on multi-GPU and multi-node infrastructure. NeMo 2.0 introduces a Python-based configuration system,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 5
    MARS5

    MARS5

    MARS5 speech model (TTS) from CAMB.AI

    MARS5-TTS is CAMB.AI’s open-source English speech model designed for high-quality text-to-speech and voice emulation. It uses a two-stage architecture that combines an autoregressive (AR) model with a non-autoregressive (NAR) model, giving it both expressiveness and speed. The model is built to handle prosodically challenging content such as sports commentary, anime dialogue, and other high-energy or highly varied speech patterns with realistic rhythm and intonation. To control speaker...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    RealtimeTTS

    RealtimeTTS

    Converts text to speech in realtime

    RealtimeTTS is a low-latency text-to-speech library built for real-time applications such as voice chat with LLMs, assistants, and interactive tools. It is designed around a streaming model: you can feed it text incrementally (for example, as an LLM responds) and get audio output almost immediately, which keeps end-to-end latency very low. The library is engine-agnostic and plugs into a wide range of cloud and local TTS systems, including OpenAI, ElevenLabs, Azure, Coqui, Piper, StyleTTS2,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    VideoChat

    VideoChat

    Real-time voice interactive digital human

    VideoChat is a real-time voice-interactive “digital human” system that combines automatic speech recognition, large language models, text-to-speech, and talking-head generation into a single conversational pipeline. It supports both pure end-to-end voice solutions based on multimodal large language models (GLM-4-Voice feeding directly into talking-head generation) and a more traditional cascaded pipeline using ASR → LLM → TTS → talking head. It is built as a Gradio Python demo, exposing a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Style-Bert-VITS2

    Style-Bert-VITS2

    Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles

    Style-Bert-VITS2 is a text-to-speech system based on Bert-VITS2 that focuses on highly controllable voice styles and emotional expression. It takes the original Bert-VITS2 v2.1 and its Japanese-Extra variant and extends them so you can control emotion and speaking style with fine-grained intensity, not just choose a generic tone. The project targets both power users and beginners: Windows users without Git or Python can install and run it using bundled .bat scripts, while advanced users can...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Auto Synced & Translated Dubs

    Auto Synced & Translated Dubs

    Automatically translates the text of a video based on a subtitle file

    Auto-Synced-Translated-Dubs is a toolchain that automatically translates and re-dubs videos using AI voices while keeping the new speech aligned to the original timing via subtitle files. It assumes you have a human-made SRT (or similar) subtitle file; the script then uses translation services such as Google Cloud or DeepL to generate translated subtitle tracks in one or more target languages. Using the timestamps of each subtitle line, it computes the required duration of each spoken...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Atera all-in-one platform IT management software with AI agents Icon
    Atera all-in-one platform IT management software with AI agents

    Ideal for internal IT departments or managed service providers (MSPs)

    Atera’s AI agents don’t just assist, they act. From detection to resolution, they handle incidents and requests instantly, taking your IT management from automated to autonomous.
    Learn More
  • 10
    EmotiVoice

    EmotiVoice

    Multi-Voice and Prompt-Controlled TTS Engine

    EmotiVoice is a multi-voice, prompt-controlled text-to-speech engine designed to generate highly expressive speech across thousands of voices. It supports both English and Chinese and ships with over 2,000 preset voices, making it suitable for everything from characters and virtual anchors to narration and dialogue. The core idea is prompt-based emotional and style control: you can ask the engine to speak “happy,” “sad,” “excited,” or with other high-level style prompts that shape prosody,...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 11
    CSM (Conversational Speech Model)

    CSM (Conversational Speech Model)

    A Conversational Speech Generation Model

    The CSM (Conversational Speech Model) is a speech generation model developed by Sesame AI that creates RVQ audio codes from text and audio inputs. It uses a Llama backbone and a smaller audio decoder to produce audio codes for realistic speech synthesis. The model has been fine-tuned for interactive voice demos and is hosted on platforms like Hugging Face for testing. CSM offers a flexible setup and is compatible with CUDA-enabled GPUs for efficient execution.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Mice MX OS speech to text Voice Control

    Mice MX OS speech to text Voice Control

    Mice speech to text with MX Cinnamon OS ISO

    Note about this image This image contains a system based on Linux MX, which was created to improve accessibility within the Linux environment. The distribution uses the Cinnamon desktop interface, which is configured to be operated using voice commands and outputs. The user interface and the control of your own devices and home automation systems can be customized and extended. The voice control program MiceStTM.py was developed to enable easy adaptation to other languages. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Parallel WaveGAN

    Parallel WaveGAN

    Unofficial Parallel WaveGAN

    Parallel WaveGAN is an unofficial PyTorch implementation of several state-of-the-art non-autoregressive neural vocoders, centered on Parallel WaveGAN but also including MelGAN, Multiband-MelGAN, HiFi-GAN, and StyleMelGAN. Its main goal is to provide a real-time neural vocoder that can turn mel spectrograms into high-quality speech audio efficiently. The repository is designed to work hand-in-hand with ESPnet-TTS and NVIDIA Tacotron2-style front ends, so you can build complete TTS or singing...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    SpeakLogPSU
    SpeakLogPSU can speak chat messages with an individual voice if the NPC or player was configured or with a default one. You will never miss if someone talks to you. Voice cloning can be accomplished with Coqui in less than five minutes without GPU. The result is archived and can be used the next time in game. Some TTS projects already started to add tag support to speak text with emotions or sing it. If a game designer has that in mind with a good chat log she can voiced her...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    VALL-E X

    VALL-E X

    Open source implementation of Microsoft's VALL-E X zero-shot TTS model

    VALL-E-X is an open-source implementation of Microsoft’s VALL-E X zero-shot text-to-speech model, focused on multilingual, cross-lingual voice cloning. It is capable of synthesizing speech in English, Chinese, and Japanese from text while mimicking the voice characteristics of a speaker given only a short 3–10 second prompt. The model attempts to match not just timbre, but also tone, pitch, emotion, and prosody of the reference audio, resulting in highly personalized output. VALL-E-X...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 16
    wukong-robot

    wukong-robot

    Chinese voice dialogue robot/smart speaker project

    ...It combines wake-word detection, automatic speech recognition, natural language understanding, and text-to-speech into a single framework aimed at the Chinese-speaking ecosystem. The project is positioned as a simple, flexible, and elegant platform that can run on devices like Raspberry Pi and other Linux-based boards, making it suitable for DIY smart speakers and home-automation hubs. It supports multi-turn conversational capabilities powered by ChatGPT or other large language models, letting users have continuous dialogues rather than one-shot commands. The project emphasizes extensibility: there is a plugin ecosystem (wukong-contrib) where developers can add new skills such as controlling smart-home devices, querying services, or performing custom actions.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    Audio Webui

    Audio Webui

    A webui for different audio related Neural Networks

    Audio Webui is a Gradio-based web user interface that unifies a wide range of audio-related neural networks under a single, accessible front end. It is designed as an “all-in-one” environment where users can experiment with text-to-speech, voice cloning, generative music, and other neural audio models without writing boilerplate code. The project supports multiple back-end models and toolchains (such as Bark, RVC, AudioLDM, Audiocraft, and other text-to-audio or voice-cloning tools),...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    DiffSinger

    DiffSinger

    Singing Voice Synthesis via Shallow Diffusion Mechanism

    DiffSinger is an open-source PyTorch implementation of a diffusion-based acoustic model for singing-voice synthesis (SVS) and also text-to-speech (TTS) in a related variant. The core idea is to view generation of a sung voice (mel-spectrogram) as a diffusion process: starting from noise, the model iteratively “denoises” while being conditioned on a music score (lyrics, pitch, musical timing). This avoids some of the typical problems of prior SVS models — like over-smoothing or unstable GAN...
    Downloads: 42 This Week
    Last Update:
    See Project
  • 19
    WaveRNN

    WaveRNN

    WaveRNN Vocoder + TTS

    WaveRNN is a PyTorch implementation of DeepMind’s WaveRNN vocoder, bundled with a Tacotron-style TTS front end to form a complete text-to-speech stack. As a vocoder, WaveRNN models raw audio with a compact recurrent neural network that can generate high-quality waveforms more efficiently than many traditional autoregressive models. The repository includes scripts and code for preprocessing datasets such as LJSpeech, training Tacotron to produce mel spectrograms, training WaveRNN on those...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Mocking Bird

    Mocking Bird

    Clone a voice in 5 seconds to generate arbitrary speech in real-time

    MockingBird is an open-source voice cloning and real-time speech generation toolkit that lets you clone a speaker’s voice from a short audio sample (reportedly as little as 5 seconds) and then synthesize arbitrary speech in that voice. It builds on deep-learning based TTS / voice-cloning technology (in the lineage of projects such as Real-Time-Voice-Cloning), but extends it with support for Mandarin Chinese and multiple Chinese speech datasets — broadening its applicability beyond English....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    VoiceFixer

    VoiceFixer

    General Speech Restoration

    VoiceFixer is a machine-learning framework for “speech restoration”: given a degraded or distorted audio recording — with noise, clipping, low sampling rate, reverberation, or other artifacts — it attempts to recover high-fidelity, clean speech. The architecture works in two stages: first an analysis stage that tries to extract “clean” intermediate features from the noisy audio (e.g. removing noise, denoising, dereverberation, upsampling), and then a neural vocoder-based synthesis stage that...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    TensorFlowTTS

    TensorFlowTTS

    Real-Time State-of-the-art Speech Synthesis for Tensorflow 2

    TensorFlowTTS is a state-of-the-art, open-source speech synthesis library built on TensorFlow 2. It offers a variety of architectures for text-to-speech, including classic and modern models such as Tacotron‑2, FastSpeech / FastSpeech2, and neural vocoders like MelGAN and Multiband‑MelGAN. Because it’s based on TensorFlow 2, it can leverage optimizations such as fake-quantization aware training and pruning — which allow models to run faster than real time and to be deployable on mobile or...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    VITS

    VITS

    Conditional Variational Autoencoder with Adversarial Learning

    VITS is a foundational research implementation of “VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech,” a well-known neural TTS architecture. Unlike traditional two-stage systems that separately train an acoustic model and a vocoder, VITS trains an end-to-end model that maps text directly to waveform using a conditional variational autoencoder combined with normalizing flows and adversarial training. This architecture enables parallel generation...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Transformer TTS

    Transformer TTS

    Implementation of a Transformer based neural network

    TransformerTTS is an implementation of a non-autoregressive Transformer-based neural network for text-to-speech, built with TensorFlow 2. It takes inspiration from architectures like FastSpeech, FastSpeech 2, FastPitch, and Transformer TTS, and extends them with its own aligner and forward models. The system separates alignment learning and acoustic modeling: an autoregressive Transformer is used as an aligner to extract phoneme-to-frame durations, while a non-autoregressive...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    PaddlePaddle models

    PaddlePaddle models

    Pre-trained and Reproduced Deep Learning Models

    Pre-trained and Reproduced Deep Learning Models ("Flying Paddle" official model library, including a variety of academic frontier and industrial scene verification of deep learning models) Flying Paddle's industrial-level model library includes a large number of mainstream models that have been polished by industrial practice for a long time and models that have won championships in international competitions; it provides many scenarios for semantic understanding, image classification,...
    Downloads: 0 This Week
    Last Update:
    See Project