Sharp Monocular Metric Depth in Less Than a Second
PyTorch code and models for the DINOv2 self-supervised learning
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Official implementation of DreamCraft3D
A Unified Framework for Text-to-3D and Image-to-3D Generation
A state-of-the-art open visual language model
Pokee Deep Research Model Open Source Repo
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
A series of math-specific large language models of our Qwen2 series
GLM-4 series: Open Multilingual Multimodal Chat LMs
Inference code for scalable emulation of protein equilibrium ensembles
This repository contains the official implementation of FastVLM
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Diffusion Transformer with Fine-Grained Chinese Understanding
Qwen3-omni is a natively end-to-end, omni-modal LLM
code for Mesh R-CNN, ICCV 2019
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Implementation of "MobileCLIP" CVPR 2024
Towards Real-World Vision-Language Understanding
Multimodal Diffusion with Representation Alignment
Example Discord bot written in Python that uses the completions API
Qwen3-TTS is an open-source series of TTS models
A PyTorch library for implementing flow matching algorithms
Pushing the Limits of Mathematical Reasoning in Open Language Models
Chat & pretrained large audio language model proposed by Alibaba Cloud