Code for running inference and finetuning with SAM 3 model
Contexts Optical Compression
Code for openai.fm, a demo for the OpenAI Speech API
A Family of Open Sourced Music Foundation Models
A Powerful Native Multimodal Model for Image Generation
Qwen3-TTS is an open-source series of TTS models
Official inference repo for FLUX.2 models
Robust Speech Recognition via Large-Scale Weak Supervision
A lightweight text-to-speech model with zero-shot voice cloning
Python library and CLI tool to interface with Google Translate
Use Microsoft Edge's online text-to-speech service from Python
Image generation model with single-stream diffusion transformer
Open source text-to-speech tool, supports extra-long text
A high-quality rapid TTS voice cloning model
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Towards Human-Level Text-to-Speech through Style Diffusion
Audiocraft is a library for audio processing and generation
A robust, efficient, low-latency speech-to-text library
Speech-AI-Forge is a project developed around TTS generation model
Industrial-level controllable zero-shot text-to-speech system
CLIP, Predict the most relevant text snippet given an image
GLM-Image: Auto-regressive for Dense-knowledge and High-fidelity Image
Controllable & emotion-expressive zero-shot TTS
TTS with kokoro and onnx runtime
Synchronized Translation for Videos