Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1
Pokee Deep Research Model Open Source Repo
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
Unified Multimodal Understanding and Generation Models
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Provides convenient access to the Anthropic REST API from any Python 3
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
State-of-the-art (SoTA) text-to-video pre-trained model
Chat & pretrained large audio language model proposed by Alibaba Cloud
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training
The Clay Foundation Model - An open source AI model and interface
Phi-3.5 for Mac: Locally-run Vision and Language Models
The official PyTorch implementation of Google's Gemma models
Fast stable diffusion on CPU and AI PC
Fast-stable-diffusion + DreamBooth
A Pragmatic VLA Foundation Model
Collection of Gemma 3 variants that are trained for performance
Implementation of "MobileCLIP" CVPR 2024
VMZ: Model Zoo for Video Modeling
High-resolution models for human tasks
Video understanding codebase from FAIR for reproducing video models
Towards Real-World Vision-Language Understanding
CLIP, Predict the most relevant text snippet given an image
Ling is a MoE LLM provided and open-sourced by InclusionAI
Multimodal-Driven Architecture for Customized Video Generation