Open Source Speech Language Model
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Wan2.2: Open and Advanced Large-Scale Video Generative Model
LTX-Video Support for ComfyUI
Tiny vision language model
The most powerful local music generation model
Release for Improved Denoising Diffusion Probabilistic Models
High-Fidelity and Controllable Generation of Textured 3D Assets
A PyTorch library for implementing flow matching algorithms
Official inference repo for FLUX.1 models
Official inference repo for FLUX.2 models
Lets make video diffusion practical
Inference script for Oasis 500M
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
Audio foundation model excelling in audio understanding
Open-source framework for intelligent speech interaction
Official implementation of DreamCraft3D
Sharp Monocular Metric Depth in Less Than a Second
4M: Massively Multimodal Masked Modeling
Controllable & emotion-expressive zero-shot TTS
Dataset of GPT-2 outputs for research in detection, biases, and more
Official repo for consistency models
Official PyTorch Implementation of "Scalable Diffusion Models"
PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)
Implementation of model parallel autoregressive transformers on GPUs