ChatGPT interface with better UI
PyTorch code and models for the DINOv2 self-supervised learning
Visual Causal Flow
Recovering the Visual Space from Any Views
Inference code for scalable emulation of protein equilibrium ensembles
Lets make video diffusion practical
CLIP, Predict the most relevant text snippet given an image
Official code base for LeWorldModel: Stable End-to-End Joint-Embedding
ChatGLM-6B: An Open Bilingual Dialogue Language Model
High-Fidelity and Controllable Generation of Textured 3D Assets
Ling is a MoE LLM provided and open-sourced by InclusionAI
GLM-4 series: Open Multilingual Multimodal Chat LMs
tiktoken is a fast BPE tokeniser for use with OpenAI's models
Repo for SeedVR2 & SeedVR
4M: Massively Multimodal Masked Modeling
Block Diffusion for Ultra-Fast Speculative Decoding
A Powerful Native Multimodal Model for Image Generation
Designed for text embedding and ranking tasks
Large Multimodal Models for Video Understanding and Editing
Collection of Gemma 3 variants that are trained for performance
Repo of Qwen2-Audio chat & pretrained large audio language model
Long-form streaming TTS system for multi-speaker dialogue generation
LTX-Video Support for ComfyUI
LLM-based Reinforcement Learning audio edit model
Inference script for Oasis 500M