Chinese and English multimodal conversational language model
OCR expert VLM powered by Hunyuan's native multimodal architecture
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Phi-3.5 for Mac: Locally-run Vision and Language Models
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
Inference script for Oasis 500M
ICLR2024 Spotlight: curation/training code, metadata, distribution
Large-language-model & vision-language-model based on Linear Attention
Official code for Style Aligned Image Generation via Shared Attention
PyTorch implementation of MAE
GLIDE: a diffusion-based text-conditional image synthesis model