Large Multimodal Models for Video Understanding and Editing
Diversity-driven optimization and large-model reasoning ability
Visual Causal Flow
Official inference repo for FLUX.2 models
PyTorch code and models for the DINOv2 self-supervised learning
Official code base for LeWorldModel: Stable End-to-End Joint-Embedding
Ling is a MoE LLM provided and open-sourced by InclusionAI
An experimental version of DeepSeek model
ChatGLM-6B: An Open Bilingual Dialogue Language Model
Designed for text embedding and ranking tasks
Inference code for scalable emulation of protein equilibrium ensembles
MOSS‑TTS Family open‑source speech and sound generation model
Recovering the Visual Space from Any Views
CLIP, Predict the most relevant text snippet given an image
Repo for SeedVR2 & SeedVR
tiktoken is a fast BPE tokeniser for use with OpenAI's models
4M: Massively Multimodal Masked Modeling
High-Fidelity and Controllable Generation of Textured 3D Assets
The official PyTorch implementation of Google's Gemma models
A SOTA open-source image editing model
Repo of Qwen2-Audio chat & pretrained large audio language model
Collection of Gemma 3 variants that are trained for performance
Global weather forecasting model using graph neural networks and JAX
OCR expert VLM powered by Hunyuan's native multimodal architecture
A 0.1B Omni model trained from scratch