Official repository for LTX-Video
Python inference and LoRA trainer package for the LTX-2 audio–video
RGBD video generation model conditioned on camera input
Implementation of "MobileCLIP" CVPR 2024
Python SDK for Claude Agent
ChatGPT interface with better UI
Unified Multimodal Understanding and Generation Models
Foundation Models for Time Series
PyTorch code and models for the DINOv2 self-supervised learning
Towards Real-World Vision-Language Understanding
Generate Any 3D Scene in Seconds
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Multimodal embedding and reranking models built on Qwen3-VL
Real-time behaviour synthesis with MuJoCo, using Predictive Control
tiktoken is a fast BPE tokeniser for use with OpenAI's models
Large-language-model & vision-language-model based on Linear Attention
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
A minimal PyTorch re-implementation of the OpenAI GPT
Reference implementation of the Transformer architecture optimized
Code release for "Masked-attention Mask Transformer
A mix of GAN implementations including progressive growing