Code for running inference and finetuning with SAM 3 model
Contexts Optical Compression
Code for openai.fm, a demo for the OpenAI Speech API
Qwen3-TTS is an open-source series of TTS models
A Family of Open Sourced Music Foundation Models
A Powerful Native Multimodal Model for Image Generation
Official inference repo for FLUX.2 models
Robust Speech Recognition via Large-Scale Weak Supervision
A lightweight text-to-speech model with zero-shot voice cloning
Open source text-to-speech tool, supports extra-long text
Python library and CLI tool to interface with Google Translate
Use Microsoft Edge's online text-to-speech service from Python
Image generation model with single-stream diffusion transformer
A high-quality rapid TTS voice cloning model
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Towards Human-Level Text-to-Speech through Style Diffusion
A robust, efficient, low-latency speech-to-text library
Industrial-level controllable zero-shot text-to-speech system
GLM-Image: Auto-regressive for Dense-knowledge and High-fidelity Image
TTS with kokoro and onnx runtime
Offline inference engine for art, real-time voice conversations
CLIP, Predict the most relevant text snippet given an image
Audiocraft is a library for audio processing and generation
Synchronized Translation for Videos
A nearly-live implementation of OpenAI's Whisper