OCR expert VLM powered by Hunyuan's native multimodal architecture
High-resolution models for human tasks
An Efficient Agentic Model for Computer Use
Ultra-Efficient LLMs on End Device
Tongyi Deep Research, the Leading Open-source Deep Research Agent
GPT4V-level open-source multi-modal model based on Llama3-8B
HY-Motion model for 3D character animation generation
Chinese and English multimodal conversational language model
Tool for exploring and debugging transformer model behaviors
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Pretrained time-series foundation model developed by Google Research
Official implementation of DreamCraft3D
State-of-the-art (SoTA) text-to-video pre-trained model
Release for Improved Denoising Diffusion Probabilistic Models
Large-language-model & vision-language-model based on Linear Attention
Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1
General-purpose image editing model that delivers high-fidelity
Memory-efficient and performant finetuning of Mistral's models
Qwen3-omni is a natively end-to-end, omni-modal LLM
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Multi-modal large language model designed for audio understanding
Qwen2.5-Coder is the code version of Qwen2.5, the large language model
Open-source, high-performance Mixture-of-Experts large language model
High-Resolution Image Synthesis with Latent Diffusion Models