OCR expert VLM powered by Hunyuan's native multimodal architecture
Large-language-model & vision-language-model based on Linear Attention
High-Resolution Image Synthesis with Latent Diffusion Models
AI Suite for upscaling, interpolating & restoring images/videos
Powerful open source image generation model
Chat & pretrained large vision language model
Official PyTorch Implementation of "Scalable Diffusion Models"
Text-to-3D & Image-to-3D & Mesh Exportation with NeRF + Diffusion
Code release for "Masked-attention Mask Transformer
Reproduces results of "Fixing the train-test resolution discrepancy"