Contexts Optical Compression
Repo of Qwen2-Audio chat & pretrained large audio language model
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Unified Multimodal Understanding and Generation Models
Large Multimodal Models for Video Understanding and Editing
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Chat & pretrained large audio language model proposed by Alibaba Cloud
Pushing the Limits of Mathematical Reasoning in Open Language Models
Encoder of greater-than-word length text trained on a variety of data
Dataset of GPT-2 outputs for research in detection, biases, and more
T5-Small: Lightweight text-to-text transformer for NLP tasks
CTC-based forced aligner for audio-text in 158 languages
High-precision 14B multimodal model built for advanced reasoning tasks
Compact 3B-param multimodal model for efficient on-device reasoning
Powerful 14B-base multimodal model — flexible base for fine-tuning
Lightweight 24B agentic coding model with vision and long context
Quantized 675B multimodal instruct model optimized for NVFP4
Small 3B-base multimodal model ideal for custom AI on edge hardware