Inference script for Oasis 500M
Official Python inference and LoRA trainer package
A Customizable Image-to-Video Model based on HunyuanVideo
Multimodal Diffusion with Representation Alignment
Qwen2.5-VL is the multimodal large language model series
OCR expert VLM powered by Hunyuan's native multimodal architecture
Qwen2.5-VL-3B-Instruct: Multimodal model for chat, vision & video
Multimodal 7B model for image, video, and text understanding tasks
Google’s flagship dense multimodal model for coding and reasoning