MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Claude code for everything except coding
ComfyUI wrapper nodes for HunyuanVideo
Flexible Photo Recrafting While Preserving Your Identity
Multilingual Document Layout Parsing in a Single Vision-Language Model
Handwritten Text Recognition (HTR) system implemented with TensorFlow
Marrying Grounding DINO with Segment Anything & Stable Diffusion
Motion-controllable Video Generation via Latent Trajectory Guidance
Multimodal embedding and reranking models built on Qwen3-VL
"Big Model" trains a visual multimodal VLM with 26M parameters
Modular quant framework
OCR expert VLM powered by Hunyuan's native multimodal architecture
Agent Skill for generating 2D sprite sheets and map, transparent PNG
Open multimodal web agent built by Ai2
Learning agent trained in a diffusion world model
General-purpose image editing model that delivers high-fidelity
Fast, powerful, git-native ticket tracking in a single bash script
ICLR2024 Spotlight: curation/training code, metadata, distribution
[CVPR 2025 Best Paper Award] VGGT
Unifying 3D Mesh Generation with Language Models
GitLab automatic code review tool based on large models
A Pioneering Open-Source Alternative to GPT-4o
Towards Real-World Vision-Language Understanding
Chat & pretrained large vision language model
airda(Air Data Agent