A simple, secure MCP-to-OpenAPI proxy server
Code release for Cut and Learn for Unsupervised Object Detection
CLIP, Predict the most relevant text snippet given an image
RL research on Android devices
tiktoken is a fast BPE tokeniser for use with OpenAI's models
LTX-Video Support for ComfyUI
A Powerful Native Multimodal Model for Image Generation
4M: Massively Multimodal Masked Modeling
Guiding Instruction-based Image Editing via Multimodal Large Language
Collection of reference environments, offline reinforcement learning
PPTAgent: Generating and Evaluating Presentations
Concatenate a directory full of files into a single prompt
Get a ChatGPT plugin up and running in under 5 minutes
Implementation of Vision Transformer, a simple way to achieve SOTA
LLM powered fuzzing via OSS-Fuzz
Block Diffusion for Ultra-Fast Speculative Decoding
Generate Any 3D Scene in Seconds
The repository provides code for running inference with SAM 2
The best ChatGPT that $100 can buy
Build Vision Agents quickly with any model or video provider
Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1
Visual Causal Flow
The official PyTorch implementation of Google's Gemma models
Large Multimodal Models for Video Understanding and Editing
Collection of Gemma 3 variants that are trained for performance