Chinese and English multimodal conversational language model
Chat & pretrained large vision language model
State-of-the-art (SoTA) text-to-video pre-trained model
Chat & pretrained large audio language model proposed by Alibaba Cloud
GLM-4 series: Open Multilingual Multimodal Chat LMs
Capable of understanding text, audio, vision, video
Qwen3-omni is a natively end-to-end, omni-modal LLM
Inference code for scalable emulation of protein equilibrium ensembles
The Clay Foundation Model - An open source AI model and interface
Audio foundation model excelling in audio understanding
Phi-3.5 for Mac: Locally-run Vision and Language Models
Tiny vision language model
The official PyTorch implementation of Google's Gemma models
Real-time behaviour synthesis with MuJoCo, using Predictive Control
Foundation model for image generation
Fast-stable-diffusion + DreamBooth
A Pragmatic VLA Foundation Model
Multimodal embedding and reranking models built on Qwen3-VL
Collection of Gemma 3 variants that are trained for performance
Implementation of "MobileCLIP" CVPR 2024
High-resolution models for human tasks
Python SDK for Claude Agent
Ling is a MoE LLM provided and open-sourced by InclusionAI
A Unified Framework for Text-to-3D and Image-to-3D Generation
Multimodal-Driven Architecture for Customized Video Generation