Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
Release for Improved Denoising Diffusion Probabilistic Models
A Systematic Framework for Interactive World Modeling
Repo for SeedVR2 & SeedVR
Tongyi Deep Research, the Leading Open-source Deep Research Agent
A Pragmatic VLA Foundation Model
OpenTinker is an RL-as-a-Service infrastructure for foundation models
High-resolution models for human tasks
Video understanding codebase from FAIR for reproducing video models
State-of-the-art (SoTA) text-to-video pre-trained model
Official DeiT repository
A Family of Open Sourced Music Foundation Models
ICLR2024 Spotlight: curation/training code, metadata, distribution
Hackable and optimized Transformers building blocks
Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI
Open-source multi-speaker long-form text-to-speech model
Tooling for the Common Objects In 3D dataset
DeepSeek Coder: Let the Code Write Itself
Diversity-driven optimization and large-model reasoning ability
Open-source framework for intelligent speech interaction
Chat & pretrained large audio language model proposed by Alibaba Cloud
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
VMZ: Model Zoo for Video Modeling
Open-weight, large-scale hybrid-attention reasoning model
Large-language-model & vision-language-model based on Linear Attention