Learning to Act by Watching Unlabeled Online Videos
Code release for "Masked-attention Mask Transformer
GLIDE: a diffusion-based text-conditional image synthesis model
Large-scale autoregressive pixel model for image generation by OpenAI
A library for Multilingual Unsupervised or Supervised word Embeddings
Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 201
Open-source code agent designed for Lean 4
JetBrains’ 4B parameter code model for completions
Vision-language-action model for robot control via images and text
Portuguese ASR model fine-tuned on XLSR-53 for 16kHz audio input
Lightweight 24B agentic coding model with vision and long context
Multimodal agent model for coding, orchestration, and autonomy