Official implementation of DreamCraft3D
Towards Real-World Vision-Language Understanding
From Images to High-Fidelity 3D Assets
Text and image to video generation: CogVideoX and CogVideo
Z80-μLM is a 2-bit quantized language model
Official inference repo for FLUX.2 models
Qwen3-TTS is an open-source series of TTS models
Research code artifacts for Code World Model (CWM)
Open-source large language model family from Tencent Hunyuan
The official repo of Qwen chat & pretrained large language model
gpt-oss-120b and gpt-oss-20b are two open-weight language models
Industrial-level controllable zero-shot text-to-speech system
Models for object and human mesh reconstruction
Open-source deep-learning framework
Sharp Monocular Metric Depth in Less Than a Second
CLIP, Predict the most relevant text snippet given an image
The most powerful local music generation model
Large-language-model & vision-language-model based on Linear Attention
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Diffusion Transformer with Fine-Grained Chinese Understanding
OpenTinker is an RL-as-a-Service infrastructure for foundation models
Implementation of "MobileCLIP" CVPR 2024
Python inference and LoRA trainer package for the LTX-2 audio–video
Pretrained time-series foundation model developed by Google Research
Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI