Layout-aware OCR model for multilingual document understanding
Qwen2.5-VL-3B-Instruct: Multimodal model for chat, vision & video
Lightweight multimodal translation model for 55 languages
NVFP4 DiffusionGemma model for fast multimodal text generation
Multimodal 7B model for image, video, and text understanding tasks
Google’s flagship dense multimodal model for coding and reasoning
Powerful 14B LLM with strong instruction and long-text handling
Multimodal Transformer for document image understanding and layout
Compact 3B-param multimodal model for efficient on-device reasoning