Audio foundation model excelling in audio understanding
Collection of Gemma 3 variants that are trained for performance
ChatGLM-6B: An Open Bilingual Dialogue Language Model
Large Multimodal Models for Video Understanding and Editing
Phi-3.5 for Mac: Locally-run Vision and Language Models
An experimental version of DeepSeek model
Multimodal embedding and reranking models built on Qwen3-VL
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Multimodal Transformer for document image understanding and layout
Flexible text-to-text transformer model for multilingual NLP tasks
T5-Small: Lightweight text-to-text transformer for NLP tasks
Multimodal 7B model for image, video, and text understanding tasks