Motion-controllable Video Generation via Latent Trajectory Guidance
Multimodal embedding and reranking models built on Qwen3-VL
"Big Model" trains a visual multimodal VLM with 26M parameters
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
PaddlePaddle End-to-End Development Toolkit
Modular quant framework
Open multimodal web agent built by Ai2
Learning agent trained in a diffusion world model
General-purpose image editing model that delivers high-fidelity
Fast, powerful, git-native ticket tracking in a single bash script
Inference script for Oasis 500M
ICLR2024 Spotlight: curation/training code, metadata, distribution
[CVPR 2025 Best Paper Award] VGGT
Large-language-model & vision-language-model based on Linear Attention
Unifying 3D Mesh Generation with Language Models
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
GitLab automatic code review tool based on large models
Flexible Photo Recrafting While Preserving Your Identity
OCR expert VLM powered by Hunyuan's native multimodal architecture
Chat & pretrained large vision language model
airda(Air Data Agent
Virtual AI anchor that combines state-of-the-art technology
Visual Automation IDE — automate anything you see on screen
Plug-n-play module turning text-to-image models into animation