Document Image Parsing via Heterogeneous Anchor Prompting”
Framework for building neural networks
StreamSpeech is a seamless model for offline speech recognition
Advanced techniques for RAG systems
Fast and Universal 3D reconstruction model for versatile tasks
Implementation of Vision Transformer, a simple way to achieve SOTA
The best ChatGPT that $100 can buy
A secure sandbox environment for malware developers and red teamers
A Model Context Protocol server for searching and analyzing arXiv
4M: Massively Multimodal Masked Modeling
Guiding Instruction-based Image Editing via Multimodal Large Language
This repository contains the official implementation of FastVLM
Refer and Ground Anything Anywhere at Any Granularity
Supercharge Your LLM with the Fastest KV Cache Layer
Agent toolkit providing semantic retrieval and editing capabilities
Open-source platform for building enterprise-grade agents
Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
ICLR2024 Spotlight: curation/training code, metadata, distribution
Official DeiT repository
Self-supervised visual learning using momentum contrast in PyTorch
ImageBind One Embedding Space to Bind Them All
PyTorch code and models for the DINOv2 self-supervised learning
Anthropic's Interactive Prompt Engineering Tutorial
Anthropic's educational courses
GLM-4-Voice | End-to-End Chinese-English Conversational Model