A trainable PyTorch reproduction of AlphaFold 3
State-of-the-art (SoTA) text-to-video pre-trained model
Qwen3-omni is a natively end-to-end, omni-modal LLM
High-Resolution Image Synthesis with Latent Diffusion Models
A series of math-specific large language models of our Qwen2 series
Robust Speech Recognition Across Languages, Dialects
Programmatic access to the AlphaGenome model
Hunyuan Translation Model Version 1.5
VMZ: Model Zoo for Video Modeling
Video understanding codebase from FAIR for reproducing video models
Bidirectional token-classification model for identifiable info
Achieving 3+ generation speedup on reasoning tasks
Ultra-Efficient LLMs on End Device
PyTorch code and models for the DINOv2 self-supervised learning
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
Global weather forecasting model using graph neural networks and JAX
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
The Clay Foundation Model - An open source AI model and interface
Open Source Speech Language Model
Open-source industrial-grade ASR models
Netease Youdao's open-source embedding and reranker models
GPT4V-level open-source multi-modal model based on Llama3-8B
A Pragmatic VLA Foundation Model
Collection of Gemma 3 variants that are trained for performance