Official repository for LTX-Video
Native and Compact Structured Latents for 3D Generation
Python inference and LoRA trainer package for the LTX-2 audio–video
C#/.NET binding of llama.cpp, including LLaMa/GPT model inference
Implementation of "MobileCLIP" CVPR 2024
Python SDK for Claude Agent
tiktoken is a fast BPE tokeniser for use with OpenAI's models
Flux 2 image generation model pure C inference
Foundation Models for Time Series
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Unified Multimodal Understanding and Generation Models
Multimodal embedding and reranking models built on Qwen3-VL
Instructions on how to use the Realtime API on Microcontrollers
RGBD video generation model conditioned on camera input
Generate Any 3D Scene in Seconds
PyTorch code and models for the DINOv2 self-supervised learning
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Large-language-model & vision-language-model based on Linear Attention
Towards Real-World Vision-Language Understanding
Real-time behaviour synthesis with MuJoCo, using Predictive Control
Software that can generate photos from paintings
A minimal PyTorch re-implementation of the OpenAI GPT
Code release for "Masked-attention Mask Transformer
A mix of GAN implementations including progressive growing
Dual LSTM Encoder for Dialog Response Generation