PyTorch code and models for V-JEPA self-supervised learning from video
PyTorch code and models for VJEPA2 self-supervised learning from video
Ready-to-use OCR with 80+ supported languages
Agent Skill for generating 2D sprite sheets and map, transparent PNG
The beginning of scalable pixel-native search
RL research on Android devices
Marrying Grounding DINO with Segment Anything & Stable Diffusion
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
LISA: Reasoning Segmentation via Large Language Model
StarVector is a foundation model for SVG generation
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA
Powerful open source image generation model
Implementation of Recurrent Interface Network (RIN)
CoTracker is a model for tracking any point (pixel) on a video
High-Resolution 3D Human Digitization from A Single Image
min(DALL·E) is a fast, minimal port of DALL·E Mini to PyTorch
Code release for "Masked-attention Mask Transformer
A MNIST-like fashion product database
3D-aware GANs based on NeRF (arXiv)
Per-Pixel Classification is Not All You Need for Semantic Segmentation
A real-time approach for mapping all human pixels of 2D RGB images
Large-scale autoregressive pixel model for image generation by OpenAI
A starter agent that can solve a number of universe environments