qr-code-generator free download

WhisperSpeech

An Open Source text-to-speech system built by inverting Whisper

WhisperSpeech is an open-source text-to-speech system created by “inverting” OpenAI’s Whisper, reusing its strengths as a semantic audio model to generate speech instead of only transcribing it. The project aims to be for speech what Stable Diffusion is for images: powerful, hackable, and safe for commercial use, with code under Apache-2.0/MIT and models trained only on properly licensed data. Its architecture follows a token-based, multi-stage pipeline inspired by AudioLM and SPEAR-TTS: Whisper is used to produce semantic tokens, EnCodec compresses the waveform into acoustic tokens, and Vocos reconstructs high-fidelity audio from those tokens. The repository includes notebooks and scripts for inference, long-form synthesis, and finetuning, as well as pre-trained models and converted datasets hosted on Hugging Face. ...

Downloads: 2 This Week

Last Update: 2025-11-28

See Project

CosyVoice

Multi-lingual large voice generation model, providing inference

...It is designed for zero-shot voice cloning and cross-lingual or mix-lingual scenarios, so a single reference voice can be used to synthesize speech across languages and in code-switching contexts. CosyVoice 2.0 significantly improves on version 1.0 by boosting accuracy, stability, speed, and overall speech quality, making it more suitable for production environments. The repository contains training recipes, inference pipelines, deployment scripts, and integration examples, positioning it as a comprehensive toolkit rather than just a set of model weights.

Downloads: 9 This Week

Last Update: 2025-11-30

See Project

OpenVoice

Instant voice cloning by MIT and MyShell. Audio foundation model

...Architecturally, OpenVoice separates “tone color” cloning from style control, which makes it easier to keep a consistent identity while flexibly changing prosody or language. The project provides open-weight models, inference code, and examples, making it suitable both for research and for building production voice experiences. It is actively developed by MyShell, which also integrates OpenVoice into broader agent and entertainment workflows.

Downloads: 53 This Week

Last Update: 2025-11-28

See Project

Fish Speech

SOTA Open Source TTS

Fish Speech is a state-of-the-art open-source text-to-speech project that has evolved into the OpenAudio series of advanced TTS models. The repository hosts the code and tooling for training, fine-tuning, and serving high-quality TTS, while the current flagship models (OpenAudio-S1 and S1-mini) are distributed via Fish Audio’s playground and Hugging Face. The models are evaluated with Seed TTS metrics and achieve exceptionally low word and character error rates, indicating strong intelligibility and alignment between text and audio. ...

Downloads: 30 This Week

Last Update: 2026-03-10

See Project

Dia2

TTS model capable of streaming conversational audio in realtime

...The model supports audio conditioning, allowing generated speech to follow a reference voice or conversational style more naturally. Dia2 provides 1B and 2B model checkpoints along with inference code for research and experimentation. It currently focuses on English generation and supports up to two minutes of generated audio. Its main value is enabling low-latency, dialogue-oriented TTS workflows where timing, turn-taking, and natural conversation matter.

Downloads: 0 This Week

Last Update: 2026-06-08

See Project

MOSS-TTS Family

MOSS‑TTS Family open‑source speech and sound generation model

...The project is designed for complex real-world use cases where a single speech model may not be enough. Its flagship model focuses on stable long speech generation, multilingual and code-switched synthesis, pronunciation control, and zero-shot voice cloning. The broader family also includes dialogue generation, prompt-based voice creation, streaming voice-agent support, and a unified audio tokenizer. It is especially useful for developers building dubbing, podcasts, audiobooks, voice assistants, character voices, and creative audio tools.

Downloads: 3 This Week

Last Update: 5 days ago

See Project

FastKoko

Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model

...It is designed to be easy to deploy via Docker, with separate CPU and GPU images so that users can choose between pure CPU inference and NVIDIA GPU acceleration. The project exposes an OpenAI-compatible speech endpoint, which means existing code that talks to the OpenAI audio API can often be pointed at a Kokoro-FastAPI instance with minimal changes. It supports multiple languages and voicepacks and allows phoneme based generation for more accurate pronunciation and prosody. The server also offers per-word timestamped captions, which makes it useful for creating subtitles or aligning audio with text. ...

Downloads: 1 This Week

Last Update: 2026-06-06

See Project

FireRedTTS-2

Long-form streaming TTS system for multi-speaker dialogue generation

FireRedTTS2 is a next-generation open-source text-to-speech (TTS) system focused on long-form, streaming speech synthesis for multi-speaker dialogue, delivering stable natural speech with context-aware prosody and reliable speaker transitions that support real-time and conversational applications. It features a specialized streaming speech tokenizer and a dual-transformer architecture that enables low latency and high-quality synthesis, making it suitable for interactive systems like...

Downloads: 0 This Week

Last Update: 2026-02-16

See Project

Dia

A TTS model capable of generating ultra-realistic dialogue

...It can also produce nonverbal vocalizations like laughter, coughs, clearing the throat, and similar sounds, which are crucial for making synthetic conversations feel human. Dia is released with pretrained checkpoints and inference code, with weights hosted on Hugging Face, so researchers and developers can quickly try it or integrate it into pipelines. The base model currently targets English and has around 1.6 billion parameters, offering a strong balance between realism and computational cost, while the ecosystem also includes Dia2.

Downloads: 0 This Week

Last Update: 2025-11-28

See Project

Step-Audio-EditX

LLM-based Reinforcement Learning audio edit model

Step-Audio-EditX is an open-source, 3 billion-parameter audio model from StepFun AI designed to make expressive and precise editing of speech and audio as easy as text editing. Rather than treating audio editing as low-level waveform manipulation, this model converts speech into a sequence of discrete “audio tokens” (via a dual-codebook tokenizer) — combining a linguistic token stream and a semantic (prosody/emotion/style) token stream — thereby abstracting audio editing into high-level...

Downloads: 0 This Week

Last Update: 2026-04-09

See Project

TITTSE

Two Integrated Text To Speech Engines uses MMS & Silero

TITTSE is a Python Application that allows you to easily and quickly convert text to speech in 15 different languages (or add more easily) using Two TTS Engines. All you need is a text file ending in the tittse extension with 4 header lines including the TITTSE language code (see documentation for your language), the 'base' file name for the audio files TITTSE creates, voice gender (girl or boy), offset (file numbers added to base file name start at this number). After those first four lines, every paragraph is created as a single audio file. Install_TITTSE.sh is a Bash script that installs Python 3.1 and all needed dependencies in a virtual environment on your Linux system. ...

Downloads: 8 This Week

Last Update: 2026-06-14

See Project

StoryTeller

Multimodal AI Story Teller, built with Stable Diffusion, GPT, etc.

...Given a prompt as an opening line of a story, GPT writes the rest of the plot; Stable Diffusion draws an image for each sentence; a TTS model narrates each line, resulting in a fully animated video of a short story, replete with audio and visuals. To develop locally, install dev dependencies and install pre-commit hooks. This will automatically trigger linting and code quality checks before each commit. The final video will be saved as /out/out.mp4, alongside other intermediate images, audio files, and subtitles. For more advanced use cases, you can also directly interface with Story Teller in Python code.

Downloads: 1 This Week

Last Update: 2023-08-22

See Project

VALL-E

PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)

We introduce a language modeling approach for text to speech synthesis (TTS). Specifically, we train a neural codec language model (called VALL-E) using discrete codes derived from an off-the-shelf neural audio codec model, and regard TTS as a conditional language modeling task rather than continuous signal regression as in previous work. During the pre-training stage, we scale up the TTS training data to 60K hours of English speech which is hundreds of times larger than existing systems....

Downloads: 0 This Week

Last Update: 2023-04-14

See Project

VITS

Conditional Variational Autoencoder with Adversarial Learning

...The repository provides training and inference pipelines for common datasets such as LJ Speech (single-speaker) and VCTK (multi-speaker), including filelists, configs, and preprocessing scripts. It also includes monotonic alignment search code and g2p preprocessing, which are crucial components for aligning text and speech in an end-to-end setup.

Downloads: 0 This Week

Last Update: 2025-11-28

See Project

Search Results for "qr-code-generator"

Showing 14 open source projects for "qr-code-generator"

WhisperSpeech

CosyVoice

OpenVoice

Fish Speech

Dia2

MOSS-TTS Family

FastKoko

FireRedTTS-2

Dia

Step-Audio-EditX

TITTSE

StoryTeller

VALL-E

VITS

Search Results for "qr-code-generator"

Showing 14 open source projects for "qr-code-generator"

WhisperSpeech

CosyVoice

OpenVoice

Fish Speech

Dia2

MOSS-TTS Family

FastKoko

FireRedTTS-2

Dia

Step-Audio-EditX

TITTSE

StoryTeller

VALL-E

VITS

Related Searches

Related Categories