Wan2.2: Open and Advanced Large-Scale Video Generative Model
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Awesome multilingual OCR toolkits based on PaddlePaddle
Industrial-level controllable zero-shot text-to-speech system
State-of-the-art TTS model under 25MB
High-Resolution 3D Assets Generation with Large Scale Diffusion Models
Generating Immersive, Explorable, and Interactive 3D Worlds
Qwen3-TTS is an open-source series of TTS models
Hunyuan Translation Model Version 1.5
Official inference repo for FLUX.2 models
Multimodal Diffusion with Representation Alignment
Miso TTS is an 8 billion, highly emotive text-to-speech model
The most powerful local music generation model
HY-Motion model for 3D character animation generation
Official inference repo for FLUX.1 models
Agentic, Reasoning, and Coding (ARC) foundation models
Powerful AI language model (MoE) optimized for efficiency/performance
Official repository for LTX-Video
Repo for SeedVR2 & SeedVR
Tooling for the Common Objects In 3D dataset
Advanced language and coding AI model
High-Fidelity and Controllable Generation of Textured 3D Assets
Image generation model with single-stream diffusion transformer
Code for running inference and finetuning with SAM 3 model
Reference PyTorch implementation and models for DINOv3