Open-source multi-speaker long-form text-to-speech model
MOSS‑TTS Family open‑source speech and sound generation model
Clean and efficient FP8 GEMM kernels with fine-grained scaling
Agentic, Reasoning, and Coding (ARC) foundation models
Open-source, high-performance AI model with advanced reasoning
Long-form streaming TTS system for multi-speaker dialogue generation
Production-tested AI infrastructure tools
GLM-4-Voice | End-to-End Chinese-English Conversational Model
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Large Multimodal Models for Video Understanding and Editing
Di♪♪Rhythm: Blazingly Fast & Simple End-to-End Song Generation
Open Multilingual Multimodal Chat LMs
Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 201
Powerful 14B LLM with strong instruction and long-text handling
Multimodal Transformer for document image understanding and layout
QwQ-32B is a reasoning-focused language model for complex tasks
Efficient MoE reasoning model for coding and math workloads
Qwen2.5-VL-3B-Instruct: Multimodal model for chat, vision & video
Multimodal 7B model for image, video, and text understanding tasks
Jan-v1-edge: efficient 1.7B reasoning model optimized for edge devices
Efficient 8B multimodal model tuned for advanced reasoning tasks.
High-precision 14B multimodal model built for advanced reasoning tasks
Efficient 14B multimodal instruct model with edge deployment and FP8
Compact 3B-param multimodal model for efficient on-device reasoning
Versatile 8B-base multimodal LLM, flexible foundation for custom AI