GLM-4-Voice | End-to-End Chinese-English Conversational Model
Repo of Qwen2-Audio chat & pretrained large audio language model
Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
GLM-4 series: Open Multilingual Multimodal Chat LMs
Audio foundation model excelling in audio understanding
Chinese and English multimodal conversational language model
GLM-4.5: Open-source LLM for intelligent agents by Z.ai
Visual Causal Flow
DeepSeek Coder: Let the Code Write Itself
Implementation of the Surya Foundation Model for Heliophysics
Inference script for Oasis 500M
HY-Motion model for 3D character animation generation
Agentic, Reasoning, and Coding (ARC) foundation models
tiktoken is a fast BPE tokeniser for use with OpenAI's models
CLIP, Predict the most relevant text snippet given an image
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
LLM-based Reinforcement Learning audio edit model
Multimodal embedding and reranking models built on Qwen3-VL
Personalize Any Characters with a Scalable Diffusion Transformer
Miso TTS is an 8 billion, highly emotive text-to-speech model
Sharp Monocular Metric Depth in Less Than a Second
Bidirectional token-classification model for identifiable info
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
General-purpose image editing model that delivers high-fidelity