GitLab automatic code review tool based on large models
From Addition, Subtraction, Multiplication, and Division to ML
Flock is a workflow-based low-code platform for building chatbots
Open multimodal web agent built by Ai2
Zero-code platform for building AI agents from natural language input
Open-source platform for building enterprise-grade agents
Multilingual Document Layout Parsing in a Single Vision-Language Model
An on-premises, OCR-free unstructured data extraction
Repository containing notebooks of my posts on Medium
Marrying Grounding DINO with Segment Anything & Stable Diffusion
Multimodal embedding and reranking models built on Qwen3-VL
"Big Model" trains a visual multimodal VLM with 26M parameters
Unifying 3D Mesh Generation with Language Models
Learning agent trained in a diffusion world model
Inference script for Oasis 500M
ICLR2024 Spotlight: curation/training code, metadata, distribution
PyTorch3D is FAIR's library of reusable components for deep learning
[CVPR 2025 Best Paper Award] VGGT
Gracefully face hCaptcha challenge with multimodal llms
From Paper to Presentation in One Click
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Flexible Photo Recrafting While Preserving Your Identity
Towards Real-World Vision-Language Understanding
Large-language-model & vision-language-model based on Linear Attention
Interactive Machine Learning experiments