CLIP, Predict the most relevant text snippet given an image
New family of code large language models (LLMs)
Wan2.2: Open and Advanced Large-Scale Video Generative Model
4M: Massively Multimodal Masked Modeling
ChatGPT interface with better UI
Python inference and LoRA trainer package for the LTX-2 audio–video
Official code base for LeWorldModel: Stable End-to-End Joint-Embedding
An experimental version of DeepSeek model
Block Diffusion for Ultra-Fast Speculative Decoding
A Powerful Native Multimodal Model for Image Generation
PyTorch code and models for the DINOv2 self-supervised learning
The ChatGPT Retrieval Plugin lets you easily find personal documents
Pretrained time-series foundation model developed by Google Research
ICLR2024 Spotlight: curation/training code, metadata, distribution
LLM-based Reinforcement Learning audio edit model
Open-source, high-performance Mixture-of-Experts large language model
Official code for Style Aligned Image Generation via Shared Attention
Fine-tuning ChatGLM-6B with PEFT
A minimal PyTorch re-implementation of the OpenAI GPT
Reference implementation of the Transformer architecture optimized
Large-scale autoregressive pixel model for image generation by OpenAI
A library for Multilingual Unsupervised or Supervised word Embeddings