Language modeling in a sentence representation space
[CVPR 2025 Best Paper Award] VGGT
PyTorch code and models for the DINOv2 self-supervised learning
FAIR's research platform for object detection research
An AI-powered security review GitHub Action using Claude
text and image to video generation: CogVideoX (2024) and CogVideo
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
DeepSeek Coder: Let the Code Write Itself
kaldi-asr/kaldi is the official location of the Kaldi project
The Memory layer for AI Agents
Transformers4Rec is a flexible and efficient library
A system for quickly generating training data with weak supervision
Automate browser-based workflows with LLMs and Computer Vision
Standardized Serverless ML Inference Platform on Kubernetes
OpenDAN is an open source Personal AI OS
An AI personal assistant for your digital brain
MTEB: Massive Text Embedding Benchmark
The easiest way to use deep metric learning in your application
Implementation of Imagen, Google's Text-to-Image Neural Network
An Open Source package that allows video game creators
Capable of understanding text, audio, vision, video
Qwen2.5-VL is the multimodal large language model series
A neural network that transforms a design mock-up into static websites
Technical principles related to large models
Programmatic access to the AlphaGenome model