Open image model at the forefront of design
Text and image to video generation: CogVideoX and CogVideo
Qwen3 is the large language model series developed by Qwen team
Controllable & emotion-expressive zero-shot TTS
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
An experimental version of DeepSeek model
GLM-4 series: Open Multilingual Multimodal Chat LMs
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Achieving 3+ generation speedup on reasoning tasks
PyTorch code and models for the DINOv2 self-supervised learning
Video Object and Interaction Deletion
Foundation model for image generation
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
State-of-the-art (SoTA) text-to-video pre-trained model
Block Diffusion for Ultra-Fast Speculative Decoding
Official implementation of Watermark Anything with Localized Messages
A Unified Framework for Text-to-3D and Image-to-3D Generation
Pretrained time-series foundation model developed by Google Research
Long-form streaming TTS system for multi-speaker dialogue generation
This repository contains the official implementation of FastVLM
Qwen3-omni is a natively end-to-end, omni-modal LLM
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Chinese and English multimodal conversational language model
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training
Powerful open source image generation model