Qwen3-VL, the multimodal large language model series by Alibaba Cloud
Flux 2 image generation model pure C inference
Project Lyra: Open Generative 3D World Models
Large Multimodal Models for Video Understanding and Editing
An experimental version of DeepSeek model
CLIP, Predict the most relevant text snippet given an image
Access to Anthropic's safety-first language model APIs
Block Diffusion for Ultra-Fast Speculative Decoding
4M: Massively Multimodal Masked Modeling
PyTorch code and models for the DINOv2 self-supervised learning
Recovering the Visual Space from Any Views
CogView4, CogView3-Plus and CogView3(ECCV 2024)
High-Fidelity and Controllable Generation of Textured 3D Assets
Global weather forecasting model using graph neural networks and JAX
Repo of Qwen2-Audio chat & pretrained large audio language model
Pretrained time-series foundation model developed by Google Research
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
MiniMax-M2, a model built for Max coding & agentic workflows
Accurate × Fast × Comprehensive
OCR expert VLM powered by Hunyuan's native multimodal architecture
Inference script for Oasis 500M
tiktoken is a fast BPE tokeniser for use with OpenAI's models
A Powerful Native Multimodal Model for Image Generation
Claude Code image, a one-stop open source transit service
Collection of Gemma 3 variants that are trained for performance