Contexts Optical Compression
OCR expert VLM powered by Hunyuan's native multimodal architecture
Chat & pretrained large vision language model
Repo of Qwen2-Audio chat & pretrained large audio language model
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Capable of understanding text, audio, vision, video
Qwen3-omni is a natively end-to-end, omni-modal LLM
High-Resolution Image Synthesis with Latent Diffusion Models
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
VMZ: Model Zoo for Video Modeling
Multi-modal large language model designed for audio understanding
Language modeling in a sentence representation space
FAIR Sequence Modeling Toolkit 2