Multimodal Diffusion with Representation Alignment
Personalize Any Characters with a Scalable Diffusion Transformer
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
Qwen3-omni is a natively end-to-end, omni-modal LLM
Official code for Style Aligned Image Generation via Shared Attention
4M: Massively Multimodal Masked Modeling
This repository contains the official implementation of FastVLM
FAIR Sequence Modeling Toolkit 2
ICLR2024 Spotlight: curation/training code, metadata, distribution
A PyTorch library for implementing flow matching algorithms
Official DeiT repository
PyTorch code and models for the DINOv2 self-supervised learning
Memory-efficient and performant finetuning of Mistral's models
Towards Ultimate Expert Specialization in Mixture-of-Experts Language
Official implementation of DreamCraft3D
Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
Open-source large language model family from Tencent Hunyuan
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
Unified Multimodal Understanding and Generation Models
Language modeling in a sentence representation space
An AI-powered security review GitHub Action using Claude
The ChatGPT Retrieval Plugin lets you easily find personal documents
Designed for text embedding and ranking tasks