Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Official implementation of DreamCraft3D
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
Multimodal Diffusion with Representation Alignment
Pokee Deep Research Model Open Source Repo
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Renderer for the harmony response format to be used with gpt-oss
PyTorch code and models for the DINOv2 self-supervised learning
The official PyTorch implementation of Google's Gemma models
An AI-powered security review GitHub Action using Claude
Implementation of "MobileCLIP" CVPR 2024
Video understanding codebase from FAIR for reproducing video models
Towards Real-World Vision-Language Understanding
CLIP, Predict the most relevant text snippet given an image
A Unified Framework for Text-to-3D and Image-to-3D Generation
Official code for Style Aligned Image Generation via Shared Attention
4M: Massively Multimodal Masked Modeling
This repository contains the official implementation of FastVLM
ICLR2024 Spotlight: curation/training code, metadata, distribution
A PyTorch library for implementing flow matching algorithms
Official DeiT repository
Memory-efficient and performant finetuning of Mistral's models
Towards Ultimate Expert Specialization in Mixture-of-Experts Language
Diffusion Transformer with Fine-Grained Chinese Understanding