Large-language-model & vision-language-model based on Linear Attention
A SOTA open-source image editing model
Repo of Qwen2-Audio chat & pretrained large audio language model
CLIP, Predict the most relevant text snippet given an image
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
Sharp Monocular Metric Depth in Less Than a Second
Global weather forecasting model using graph neural networks and JAX
tiktoken is a fast BPE tokeniser for use with OpenAI's models
A state-of-the-art open visual language model
Chinese and English multimodal conversational language model
General-purpose image editing model that delivers high-fidelity
A Multi-Modal World Model for Reconstructing, Generating, Simulation
Diffusion Transformer with Fine-Grained Chinese Understanding
Python SDK for Claude Agent
A Unified Framework for Text-to-3D and Image-to-3D Generation
A series of math-specific large language models of our Qwen2 series
ChatGPT interface with better UI
Large Multimodal Models for Video Understanding and Editing
Phi-3.5 for Mac: Locally-run Vision and Language Models
AlphaFold 3 inference pipeline
An AI-powered security review GitHub Action using Claude
Advancing Open-source World Models
High-resolution models for human tasks
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
Genome modeling and design across all domains of life