This repository contains the official implementation of FastVLM
Unified Multimodal Understanding and Generation Models
Industrial-level controllable zero-shot text-to-speech system
Collection of Gemma 3 variants that are trained for performance
Visual Causal Flow
Towards Real-World Vision-Language Understanding
Moonshot's most powerful AI model
Accurate × Fast × Comprehensive
Official inference repo for FLUX.2 models
Qwen2.5-VL is the multimodal large language model series
Encoder of greater-than-word length text trained on a variety of data
Multimodal model achieving SOTA performance
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
Text-to-3D & Image-to-3D & Mesh Exportation with NeRF + Diffusion
A latent text-to-image diffusion model
PyTorch implementation of MAE
Facebook AI Research Sequence-to-Sequence Toolkit
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)
Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 201
Dual LSTM Encoder for Dialog Response Generation
Compact 8B multimodal instruct model optimized for edge deployment
An advanced bilingual image editing with semantic control
Frontier-scale 675B multimodal base model for custom AI training
Speculative-decoding accelerator for the 675B Mistral Large 3
Quantized 675B multimodal instruct model optimized for NVFP4