Chat & pretrained large audio language model proposed by Alibaba Cloud
High-Resolution 3D Assets Generation with Large Scale Diffusion Models
Official inference repo for FLUX.2 models
GLM-4-Voice | End-to-End Chinese-English Conversational Model
AlphaFold 3 inference pipeline
Qwen3-omni is a natively end-to-end, omni-modal LLM
Open-source framework for intelligent speech interaction
An experimental version of DeepSeek model
Multi-modal large language model designed for audio understanding
Controllable & emotion-expressive zero-shot TTS
Chat & pretrained large vision language model
Qwen3-Coder is the code version of Qwen3
Dataset of GPT-2 outputs for research in detection, biases, and more
Hunyuan Translation Model Version 1.5
Qwen3-TTS is an open-source series of TTS models
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
Towards Real-World Vision-Language Understanding
State-of-the-art (SoTA) text-to-video pre-trained model
Inference script for Oasis 500M
code for Mesh R-CNN, ICCV 2019
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Multimodal Diffusion with Representation Alignment
gpt-oss-120b and gpt-oss-20b are two open-weight language models
Tool for exploring and debugging transformer model behaviors
RGBD video generation model conditioned on camera input