A suite of advanced multi-modal LLMs
A Multi-Modal World Model for Reconstructing, Generating, Simulation
A tool to use the Ai2 Open Coding Agents Soft-Verified Agents
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA
A Family of Open Sourced Music Foundation Models
Phi-3.5 for Mac: Locally-run Vision and Language Models
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Benchmarking Multimodal Agents for Open-Ended Tasks
Photorealistic Synthetic Dataset for Holistic Indoor Scene
Multimodal embedding and reranking models built on Qwen3-VL
High-resolution models for human tasks
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Free Tailwind CSS UI component library for modern web interfaces
Code and models for ICML 2024 paper, NExT-GPT
Collection of CVPR 2025 papers and open source projects
A blazing fast AI Gateway with integrated guardrails
Foundational Models for State-of-the-Art Speech and Text Translation
A Pioneering Open-Source Alternative to GPT-4o
Virtual AI anchor that combines state-of-the-art technology
Transformers4Rec is a flexible and efficient library
Qwen2.5-VL-3B-Instruct: Multimodal model for chat, vision & video
Omnimodal AI model for agents, coding, and long-context tasks