Bidirectional token-classification model for identifiable info
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Large-language-model & vision-language-model based on Linear Attention
Diffusion Transformer with Fine-Grained Chinese Understanding
Visual Causal Flow
Qwen3-ASR is an open-source series of ASR models
Generate Any 3D Scene in Seconds
Unified Multimodal Understanding and Generation Models
Implementation of "MobileCLIP" CVPR 2024
Block Diffusion for Ultra-Fast Speculative Decoding
Ultra-Efficient LLMs on End Device
Memory-efficient and performant finetuning of Mistral's models
Audio foundation model excelling in audio understanding
Phi-3.5 for Mac: Locally-run Vision and Language Models
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
Fast-stable-diffusion + DreamBooth
Multimodal Diffusion with Representation Alignment
The official PyTorch implementation of Google's Gemma models
Large Multimodal Models for Video Understanding and Editing
Renderer for the harmony response format to be used with gpt-oss
Open-weight, large-scale hybrid-attention reasoning model
ICLR2024 Spotlight: curation/training code, metadata, distribution
Official implementation of DreamCraft3D
Towards Real-World Vision-Language Understanding
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning