Open-source multi-speaker long-form text-to-speech model
MOSS‑TTS Family open‑source speech and sound generation model
Clean and efficient FP8 GEMM kernels with fine-grained scaling
Agentic, Reasoning, and Coding (ARC) foundation models
Open-source, high-performance AI model with advanced reasoning
Long-form streaming TTS system for multi-speaker dialogue generation
Production-tested AI infrastructure tools
GLM-4-Voice | End-to-End Chinese-English Conversational Model
A Production-ready Reinforcement Learning AI Agent Library
OCR expert VLM powered by Hunyuan's native multimodal architecture
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Large Multimodal Models for Video Understanding and Editing
Di♪♪Rhythm: Blazingly Fast & Simple End-to-End Song Generation
Open Multilingual Multimodal Chat LMs
Software that can generate photos from paintings
Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 201
Powerful 14B LLM with strong instruction and long-text handling
Multimodal Transformer for document image understanding and layout
QwQ-32B is a reasoning-focused language model for complex tasks
Efficient MoE reasoning model for coding and math workloads
Qwen2.5-VL-3B-Instruct: Multimodal model for chat, vision & video
Multimodal 7B model for image, video, and text understanding tasks
Jan-v1-edge: efficient 1.7B reasoning model optimized for edge devices
Efficient 8B multimodal model tuned for advanced reasoning tasks.
High-precision 14B multimodal model built for advanced reasoning tasks