GLM-4 series: Open Multilingual Multimodal Chat LMs
Accurate × Fast × Comprehensive
Ling is a MoE LLM provided and open-sourced by InclusionAI
Lets make video diffusion practical
Repo for SeedVR2 & SeedVR
CLIP, Predict the most relevant text snippet given an image
LTX-Video Support for ComfyUI
Visual Causal Flow
An experimental version of DeepSeek model
Inference code for scalable emulation of protein equilibrium ensembles
Programmatic access to the AlphaGenome model
Large Multimodal Models for Video Understanding and Editing
A Powerful Native Multimodal Model for Image Generation
ChatGLM-6B: An Open Bilingual Dialogue Language Model
Collection of Gemma 3 variants that are trained for performance
4M: Massively Multimodal Masked Modeling
tiktoken is a fast BPE tokeniser for use with OpenAI's models
Designed for text embedding and ranking tasks
Recovering the Visual Space from Any Views
High-Fidelity and Controllable Generation of Textured 3D Assets
Repo of Qwen2-Audio chat & pretrained large audio language model
Official implementation of DreamCraft3D
A SOTA open-source image editing model
OCR expert VLM powered by Hunyuan's native multimodal architecture
Pretrained time-series foundation model developed by Google Research