Language modeling in a sentence representation space
Diffusion Bee is the easiest way to run Stable Diffusion locally
Renderer for the harmony response format to be used with gpt-oss
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Collection of Gemma 3 variants that are trained for performance
ICLR2024 Spotlight: curation/training code, metadata, distribution
Memory-efficient and performant finetuning of Mistral's models
HY-Motion model for 3D character animation generation
Qwen2.5-VL is the multimodal large language model series
Large Multimodal Models for Video Understanding and Editing
Unified Multimodal Understanding and Generation Models
Pokee Deep Research Model Open Source Repo
The ChatGPT Retrieval Plugin lets you easily find personal documents
Open Source Speech Language Model
Implementation of "MobileCLIP" CVPR 2024
CLIP, Predict the most relevant text snippet given an image
Pretrained time-series foundation model developed by Google Research
Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI
PyTorch code and models for the DINOv2 self-supervised learning
Chat & pretrained large audio language model proposed by Alibaba Cloud
FlashMLA: Efficient Multi-head Latent Attention Kernels
Release for Improved Denoising Diffusion Probabilistic Models
Encoder of greater-than-word length text trained on a variety of data