Powerful AI language model (MoE) optimized for efficiency/performance
A Multi-Modal World Model for Reconstructing, Generating, Simulation
Models for object and human mesh reconstruction
Official repository for LTX-Video
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
Official inference repo for FLUX.2 models
Open-source multi-speaker long-form text-to-speech model
Wan2.2: Open and Advanced Large-Scale Video Generative Model
FAIR Sequence Modeling Toolkit 2
MOSS‑TTS Family open‑source speech and sound generation model
CodeGeeX2: A More Powerful Multilingual Code Generation Model
Qwen2.5-VL is the multimodal large language model series
Long-form streaming TTS system for multi-speaker dialogue generation
Diffusion Transformer with Fine-Grained Chinese Understanding
High-Resolution 3D Assets Generation with Large Scale Diffusion Models
New family of code large language models (LLMs)
Advanced language and coding AI model
Multimodal-Driven Architecture for Customized Video Generation
A Customizable Image-to-Video Model based on HunyuanVideo
Reference PyTorch implementation and models for DINOv3
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Phi-3.5 for Mac: Locally-run Vision and Language Models
Industrial-level controllable zero-shot text-to-speech system
OpenTinker is an RL-as-a-Service infrastructure for foundation models