Large-language-model & vision-language-model based on Linear Attention
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Official implementation of DreamCraft3D
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Open-source framework for intelligent speech interaction
Generate Any 3D Scene in Seconds
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Repo of Qwen2-Audio chat & pretrained large audio language model
Chat & pretrained large audio language model proposed by Alibaba Cloud
Multi-modal large language model designed for audio understanding
code for Mesh R-CNN, ICCV 2019
Language modeling in a sentence representation space
AI-powered tool to quickly remove watermarks from images flawlessly
Official code for Style Aligned Image Generation via Shared Attention
Let us control diffusion models
Official PyTorch Implementation of "Scalable Diffusion Models"
Text-to-3D & Image-to-3D & Mesh Exportation with NeRF + Diffusion
A latent text-to-image diffusion model
GLIDE: a diffusion-based text-conditional image synthesis model
Reproduces results of "Fixing the train-test resolution discrepancy"
Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 201