Video understanding codebase from FAIR for reproducing video models
Official inference repo for FLUX.1 models
The official repo of Qwen chat & pretrained large language model
Qwen-Image is a powerful image generation foundation model
Memory-efficient and performant finetuning of Mistral's models
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Generating Immersive, Explorable, and Interactive 3D Worlds
Diffusion Transformer with Fine-Grained Chinese Understanding
Renderer for the harmony response format to be used with gpt-oss
Phi-3.5 for Mac: Locally-run Vision and Language Models
Collection of Gemma 3 variants that are trained for performance
Official repository for LTX-Video
gpt-oss-120b and gpt-oss-20b are two open-weight language models
Unified Multimodal Understanding and Generation Models
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
OCR expert VLM powered by Hunyuan's native multimodal architecture
ChatGPT interface with better UI
State-of-the-art (SoTA) text-to-video pre-trained model
Official implementation of DreamCraft3D
Tool for exploring and debugging transformer model behaviors
Controllable & emotion-expressive zero-shot TTS
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Open-source framework for intelligent speech interaction
AlphaFold 3 inference pipeline
A Production-ready Reinforcement Learning AI Agent Library