PyTorch extensions for fast R&D prototyping and Kaggle farming
Bringing BERT into modernity via both architecture changes and scaling
Unified Multimodal Understanding and Generation Models
This repository contains the official implementation of FastVLM
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Industrial-level controllable zero-shot text-to-speech system
Collection of Gemma 3 variants that are trained for performance
Open-source industrial-grade ASR models
Usable Implementation of "Bootstrap Your Own Latent" self-supervised
End-to-end speech processing toolkit
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Video-based AI memory library. Store millions of text chunks in MP4
PyTorch code and models for V-JEPA self-supervised learning from video
Self-supervised visual learning using momentum contrast in PyTorch
Visual Causal Flow
PyTorch code and models for VJEPA2 self-supervised learning from video
Moonshot's most powerful AI model
Fast inference engine for Transformer models
Official inference repo for FLUX.2 models
Accurate × Fast × Comprehensive
Towards Real-World Vision-Language Understanding
Provides code for running inference with the SegmentAnything Model
Taming Stable Diffusion for Lip Sync
C++ Implementation of PyTorch Tutorials for Everyone
A simple but complete full-attention transformer