PyTorch code and models for the DINOv2 self-supervised learning
A Family of Open Sourced Music Foundation Models
Reference PyTorch implementation and models for DINOv3
Accurate × Fast × Comprehensive
OCR expert VLM powered by Hunyuan's native multimodal architecture
Open-Source Financial Large Language Models
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Pokee Deep Research Model Open Source Repo
Uncommon Objects in 3D dataset
Reproduces results of "Fixing the train-test resolution discrepancy"
A mix of GAN implementations including progressive growing
Lightweight multimodal translation model for 55 languages
Multimodal Transformer for document image understanding and layout
Qwen2.5-VL-3B-Instruct: Multimodal model for chat, vision & video
ClinicalBERT model trained on MIMIC notes for clinical NLP tasks
Small 3B-base multimodal model ideal for custom AI on edge hardware