Open-weight, large-scale hybrid-attention reasoning model
Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI
VMZ: Model Zoo for Video Modeling
Official implementation of Watermark Anything with Localized Messages
High-resolution models for human tasks
Towards Real-World Vision-Language Understanding
CLIP, Predict the most relevant text snippet given an image
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
4M: Massively Multimodal Masked Modeling
FAIR Sequence Modeling Toolkit 2
A PyTorch library for implementing flow matching algorithms
Official DeiT repository
Hackable and optimized Transformers building blocks
PyTorch code and models for the DINOv2 self-supervised learning
Official implementation of DreamCraft3D
Repo of Qwen2-Audio chat & pretrained large audio language model
Controllable & emotion-expressive zero-shot TTS
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
DeepMind model for tracking arbitrary points across videos & robotics
code for Mesh R-CNN, ICCV 2019
Language modeling in a sentence representation space
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
The ChatGPT Retrieval Plugin lets you easily find personal documents
Designed for text embedding and ranking tasks
Inference framework for 1-bit LLMs