Automate browser-based workflows with LLMs and Computer Vision
State-of-the-art diffusion models for image and audio generation
The easiest way to use deep metric learning in your application
An MCP server for interacting with Google Colab
GLM-4 series: Open Multilingual Multimodal Chat LMs
AI bridge enabling assistants to control and automate Unity Editor
LLM-based agent for general purpose software engineering tasks
Dealing with all unstructured data, such as reverse image search
Implementation of Vision Transformer, a simple way to achieve SOTA
PPTAgent: Generating and Evaluating Presentations
Superfast AI decision making and processing of multi-modal data
Tool for visualizing and tracking your machine learning experiments
Fully Local Manus AI. No APIs, No $200 monthly bills
A lightweight framework for building LLM-based agents
Outcome driven agent development framework that evolves
Open Source Document Management System for Digital Archives
A python library for self-supervised learning on images
GEO-first SEO skill for Claude Code
A modular Agentic RAG built with LangGraph
The repository provides code for running inference with SAM 2
A specialized Claude Code workspace for creating long-form
Multilingual Document Layout Parsing in a Single Vision-Language Model
Open platform for building, deploying, and managing LLM agents
Shared repository for open-sourced projects from the Google AI Lang
Collect, organize, use, and share, all in OmniBox