PyTorch code and models for the DINOv2 self-supervised learning
Towards Ultimate Expert Specialization in Mixture-of-Experts Language
Repo of Qwen2-Audio chat & pretrained large audio language model
CLIP, Predict the most relevant text snippet given an image
Ling is a MoE LLM provided and open-sourced by InclusionAI
Lets make video diffusion practical
A Customizable Image-to-Video Model based on HunyuanVideo
tiktoken is a fast BPE tokeniser for use with OpenAI's models
4M: Massively Multimodal Masked Modeling
The official PyTorch implementation of Google's Gemma models
GLM-4 series: Open Multilingual Multimodal Chat LMs
A Powerful Native Multimodal Model for Image Generation
OCR expert VLM powered by Hunyuan's native multimodal architecture
Official implementation of DreamCraft3D
High-Fidelity and Controllable Generation of Textured 3D Assets
Large Multimodal Models for Video Understanding and Editing
Implementation of "MobileCLIP" CVPR 2024
Global weather forecasting model using graph neural networks and JAX
code for Mesh R-CNN, ICCV 2019
Implementation of the Surya Foundation Model for Heliophysics
Official code for Style Aligned Image Generation via Shared Attention
ICLR2024 Spotlight: curation/training code, metadata, distribution
Diffusion Transformer with Fine-Grained Chinese Understanding
Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1
The ChatGPT Retrieval Plugin lets you easily find personal documents