ICLR2024 Spotlight: curation/training code, metadata, distribution
[CVPR 2025 Best Paper Award] VGGT
OCR expert VLM powered by Hunyuan's native multimodal architecture
Gracefully face hCaptcha challenge with multimodal llms
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Flexible Photo Recrafting While Preserving Your Identity
Open-source framework for conversational voice AI agents
Lightning fast C++/CUDA neural network framework
Towards Real-World Vision-Language Understanding
Large-language-model & vision-language-model based on Linear Attention
Interactive Machine Learning experiments
airda(Air Data Agent
Chat & pretrained large vision language model
Create software using visual programming
Virtual AI anchor that combines state-of-the-art technology
Visual Automation IDE — automate anything you see on screen
Plug-n-play module turning text-to-image models into animation
dashAI: an interactive platform for training, evaluating and deploying
Visual Instruction Tuning: Large Language-and-Vision Assistant
computer vision projects | Fun AI projects related to computer vision
Guiding Instruction-based Image Editing via Multimodal Large Language
Open-source tool to visualise your RAG
CS2, Valorant, Fortnite, APEX, every game
Library of self-supervised methods for visual representation