Large-language-model & vision-language-model based on Linear Attention
Diffusion Transformer with Fine-Grained Chinese Understanding
CogView4, CogView3-Plus and CogView3(ECCV 2024)
High-Resolution 3D Assets Generation with Large Scale Diffusion Models
Qwen3-Coder is the code version of Qwen3
Repo of Qwen2-Audio chat & pretrained large audio language model
Qwen2.5-VL is the multimodal large language model series
Implementation of "MobileCLIP" CVPR 2024
Unified Multimodal Understanding and Generation Models
Official code for Style Aligned Image Generation via Shared Attention
Memory-efficient and performant finetuning of Mistral's models
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
The official PyTorch implementation of Google's Gemma models
Multimodal Diffusion with Representation Alignment
Renderer for the harmony response format to be used with gpt-oss
Phi-3.5 for Mac: Locally-run Vision and Language Models
Large Multimodal Models for Video Understanding and Editing
FAIR Sequence Modeling Toolkit 2
Official implementation of DreamCraft3D
Open-weight, large-scale hybrid-attention reasoning model
Language modeling in a sentence representation space
ICLR2024 Spotlight: curation/training code, metadata, distribution
Pushing the Limits of Mathematical Reasoning in Open Language Models
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training
A Conversational Speech Generation Model