A Multi-Modal World Model for Reconstructing, Generating, Simulation
Powerful AI language model (MoE) optimized for efficiency/performance
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
ChatGPT interface with better UI
Official repository for LTX-Video
Official inference repo for FLUX.2 models
Models for object and human mesh reconstruction
OpenTinker is an RL-as-a-Service infrastructure for foundation models
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
Open-source multi-speaker long-form text-to-speech model
Long-form streaming TTS system for multi-speaker dialogue generation
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
MOSS‑TTS Family open‑source speech and sound generation model
Reference PyTorch implementation and models for DINOv3
General-purpose image editing model that delivers high-fidelity
Diffusion Transformer with Fine-Grained Chinese Understanding
Multimodal-Driven Architecture for Customized Video Generation
A Customizable Image-to-Video Model based on HunyuanVideo
Wan2.2: Open and Advanced Large-Scale Video Generative Model
New family of code large language models (LLMs)
High-Resolution 3D Assets Generation with Large Scale Diffusion Models
CodeGeeX2: A More Powerful Multilingual Code Generation Model
Qwen-Image is a powerful image generation foundation model
Phi-3.5 for Mac: Locally-run Vision and Language Models
A state-of-the-art open visual language model