Dataset of GPT-2 outputs for research in detection, biases, and more
High-Resolution Image Synthesis with Latent Diffusion Models
Fast and Universal 3D reconstruction model for versatile tasks
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
GLM-4 series: Open Multilingual Multimodal Chat LMs
Implementation of "MobileCLIP" CVPR 2024
Video understanding codebase from FAIR for reproducing video models
CLIP, Predict the most relevant text snippet given an image
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
Diffusion Transformer with Fine-Grained Chinese Understanding
ChatGPT interface with better UI
Global weather forecasting model using graph neural networks and JAX
Tooling for the Common Objects In 3D dataset
code for Mesh R-CNN, ICCV 2019
A state-of-the-art open visual language model
Renderer for the harmony response format to be used with gpt-oss
Capable of understanding text, audio, vision, video
Chat & pretrained large audio language model proposed by Alibaba Cloud
Qwen2.5-VL is the multimodal large language model series
Official code for Style Aligned Image Generation via Shared Attention
4M: Massively Multimodal Masked Modeling
This repository contains the official implementation of FastVLM
FAIR Sequence Modeling Toolkit 2
A Production-ready Reinforcement Learning AI Agent Library