Qwen-Image is a powerful image generation foundation model
A Unified Framework for Text-to-3D and Image-to-3D Generation
Towards Real-World Vision-Language Understanding
ChatGLM-6B: An Open Bilingual Dialogue Language Model
HY-Motion model for 3D character animation generation
Tongyi Deep Research, the Leading Open-source Deep Research Agent
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Audio foundation model excelling in audio understanding
Inference code for scalable emulation of protein equilibrium ensembles
Tool for exploring and debugging transformer model behaviors
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
The ChatGPT Retrieval Plugin lets you easily find personal documents
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Open Source Speech Language Model
Video understanding codebase from FAIR for reproducing video models
Multimodal Diffusion with Representation Alignment
General-purpose image editing model that delivers high-fidelity
ICLR2024 Spotlight: curation/training code, metadata, distribution
Official implementation of DreamCraft3D
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
Language modeling in a sentence representation space
A SOTA open-source image editing model
OCR expert VLM powered by Hunyuan's native multimodal architecture
Release for Improved Denoising Diffusion Probabilistic Models
High-Resolution Image Synthesis with Latent Diffusion Models