Tiny vision language model
Visual Causal Flow
CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)
LTX-Video Support for ComfyUI
Moonshot's most powerful AI model
Qwen3-VL, the multimodal large language model series by Alibaba Cloud
Recovering the Visual Space from Any Views
Lets make video diffusion practical
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Flux 2 image generation model pure C inference
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Python inference and LoRA trainer package for the LTX-2 audio–video
Foundational Models for State-of-the-Art Speech and Text Translation
Inference script for Oasis 500M
Clean and efficient FP8 GEMM kernels with fine-grained scaling
Official code for Style Aligned Image Generation via Shared Attention
llama.go is like llama.cpp in pure Golang
GLIDE: a diffusion-based text-conditional image synthesis model
Vision-language-action model for robot control via images and text