GUI/CLI tool for downloading Xiaohongshu
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
The repository provides code for running inference with SAM 2
Clean and efficient FP8 GEMM kernels with fine-grained scaling
The official PyTorch implementation of Google's Gemma models
Photorealistic Synthetic Dataset for Holistic Indoor Scene
MemU is an open-source memory framework for AI companions
SwarmZero's SDK for building AI agents, swarms of agents and much more
A framework for open autonomous economic agent (AEA) development
A simple, secure MCP-to-OpenAPI proxy server
The most powerful Android RPA agent framework
Implementation of "MobileCLIP" CVPR 2024
A fast, powerful, and simple hierarchical vision transformer
Code release for Cut and Learn for Unsupervised Object Detection
High-resolution models for human tasks
Video understanding codebase from FAIR for reproducing video models
Towards Real-World Vision-Language Understanding
CLIP, Predict the most relevant text snippet given an image
Research code artifacts for Code World Model (CWM)
A Unified Framework for Text-to-3D and Image-to-3D Generation
Multimodal Diffusion with Representation Alignment
Implementation of Make-A-Video, new SOTA text to video generator
Official code for Style Aligned Image Generation via Shared Attention
A Model Context Protocol server for searching and analyzing arXiv
4M: Massively Multimodal Masked Modeling