High-Resolution Image Synthesis with Latent Diffusion Models
Achieving 3+ generation speedup on reasoning tasks
ICLR2024 Spotlight: curation/training code, metadata, distribution
Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
Open-source large language model family from Tencent Hunyuan
Qwen2.5-VL is the multimodal large language model series
Foundation model for image generation
Block Diffusion for Ultra-Fast Speculative Decoding
Official implementation of Watermark Anything with Localized Messages
Open-source deep-learning framework
Models for object and human mesh reconstruction
gpt-oss-120b and gpt-oss-20b are two open-weight language models
Project Lyra: Open Generative 3D World Models
Stable Diffusion with Core ML on Apple Silicon
High-resolution models for human tasks
GLM-4 series: Open Multilingual Multimodal Chat LMs
FAIR Sequence Modeling Toolkit 2
PyTorch code and models for the DINOv2 self-supervised learning
tiktoken is a fast BPE tokeniser for use with OpenAI's models
Advancing Open-source World Models
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
GPT4V-level open-source multi-modal model based on Llama3-8B
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
The ChatGPT Retrieval Plugin lets you easily find personal documents