Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI
Tiny vision language model
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Multimodal-Driven Architecture for Customized Video Generation
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI
4M: Massively Multimodal Masked Modeling
Repo of Qwen2-Audio chat & pretrained large audio language model
High-Fidelity and Controllable Generation of Textured 3D Assets
Multi-modal large language model designed for audio understanding
Capable of understanding text, audio, vision, video
Chat & pretrained large audio language model proposed by Alibaba Cloud
Chat & pretrained large vision language model
High-Resolution Image Synthesis with Latent Diffusion Models
Open-source, high-performance Mixture-of-Experts large language model
Towards Ultimate Expert Specialization in Mixture-of-Experts Language
Let us control diffusion models
GLIDE: a diffusion-based text-conditional image synthesis model
Reproduces results of "Fixing the train-test resolution discrepancy"
Model that fuses instruct, reasoning and agentic skills