Qwen3-omni is a natively end-to-end, omni-modal LLM
New family of code large language models (LLMs)
Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1
DeepMind model for tracking arbitrary points across videos & robotics
code for Mesh R-CNN, ICCV 2019
Designed for text embedding and ranking tasks
A SOTA open-source image editing model
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Multi-modal large language model designed for audio understanding
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Capable of understanding text, audio, vision, video
A state-of-the-art open visual language model
AI Suite for upscaling, interpolating & restoring images/videos
Qwen2.5-Coder is the code version of Qwen2.5, the large language model
Chat & pretrained large audio language model proposed by Alibaba Cloud
Example Discord bot written in Python that uses the completions API
Code for the paper Hybrid Spectrogram and Waveform Source Separation
PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)
Per-Pixel Classification is Not All You Need for Semantic Segmentation
Open language model developed by NVIDIA as part of Nemotron-3 family
Tencent’s 36-language state-of-the-art translation model