Multimodal 7B model for image, video, and text understanding tasks
AI-supported visual verification and tests you can actually trust.
Qwen2.5-VL-3B-Instruct: Multimodal model for chat, vision & video
Multimodal agent model for coding, orchestration, and autonomy
Open, non-commercial SDXL model for quality image generation
Text-to-image model optimized for artistic quality and safe generation
An advanced bilingual image editing with semantic control
Vision-language-action model for robot control via images and text
CLIP model fine-tuned for zero-shot fashion product classification
Small 3B-base multimodal model ideal for custom AI on edge hardware
Ultra-efficient 3B multimodal instruct model built for edge deployment