PyTorch extensions for fast R&D prototyping and Kaggle farming
Bringing BERT into modernity via both architecture changes and scaling
Usable Implementation of "Bootstrap Your Own Latent" self-supervised
This repository contains the official implementation of FastVLM
Unified Multimodal Understanding and Generation Models
Industrial-level controllable zero-shot text-to-speech system
Collection of Gemma 3 variants that are trained for performance
Video-based AI memory library. Store millions of text chunks in MP4
End-to-end speech processing toolkit
PyTorch code and models for V-JEPA self-supervised learning from video
Self-supervised visual learning using momentum contrast in PyTorch
PyTorch code and models for VJEPA2 self-supervised learning from video
Accurate × Fast × Comprehensive
Towards Real-World Vision-Language Understanding
Visual Causal Flow
Moonshot's most powerful AI model
A simple but complete full-attention transformer
Official inference repo for FLUX.2 models
Fast inference engine for Transformer models
Fast multimodal LLM for real-time voice interaction and AI apps
Encoder of greater-than-word length text trained on a variety of data
C++ Implementation of PyTorch Tutorials for Everyone
Retrieval and Retrieval-augmented LLMs
Qwen2.5-VL is the multimodal large language model series
Provides code for running inference with the SegmentAnything Model