Open-source industrial-grade ASR models
Qwen3-ASR is an open-source series of ASR models
OpenTinker is an RL-as-a-Service infrastructure for foundation models
High-resolution models for human tasks
Video understanding codebase from FAIR for reproducing video models
Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI
Generate Any 3D Scene in Seconds
Memory-efficient and performant finetuning of Mistral's models
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
ChatGPT interface with better UI
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Towards Real-World Vision-Language Understanding
A Conversational Speech Generation Model
Qwen2.5-Coder is the code version of Qwen2.5, the large language model
Stable Diffusion with Core ML on Apple Silicon
Di♪♪Rhythm: Blazingly Fast & Simple End-to-End Song Generation
Open-source, high-performance Mixture-of-Experts large language model
Pushing the Limits of Mathematical Reasoning in Open Language Models
Chat & pretrained large vision language model
Powerful open source image generation model
High-Resolution Image Synthesis with Latent Diffusion Models
Chat & pretrained large audio language model proposed by Alibaba Cloud
Open Multilingual Multimodal Chat LMs
AI Suite for upscaling, interpolating & restoring images/videos