Official inference repo for FLUX.1 models
Multimodal-Driven Architecture for Customized Video Generation
A Family of Open Sourced Music Foundation Models
super expressive prompting model based on ltx2.3
Industrial-level controllable zero-shot text-to-speech system
GLM-Image: Auto-regressive for Dense-knowledge and High-fidelity Image
Controllable & emotion-expressive zero-shot TTS
Collection of Gemma 3 variants that are trained for performance
Official Python inference and LoRA trainer package
Open-source multi-speaker long-form text-to-speech model
Large-language-model & vision-language-model based on Linear Attention
A Multi-Modal World Model for Reconstructing, Generating, Simulation
tiktoken is a fast BPE tokeniser for use with OpenAI's models
General-purpose image editing model that delivers high-fidelity
Ultra-Efficient LLMs on End Device
Generate Any 3D Scene in Seconds
Memory-efficient and performant finetuning of Mistral's models
Block Diffusion for Ultra-Fast Speculative Decoding
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Towards Real-World Vision-Language Understanding
High-Resolution Image Synthesis with Latent Diffusion Models
Encoder of greater-than-word length text trained on a variety of data
Official PyTorch Implementation of "Scalable Diffusion Models"
Dual LSTM Encoder for Dialog Response Generation
React app for inspecting, building and debugging with the Realtime API