OCR model for complex documents with layout-aware structured outputs
A general fine-tuning kit geared toward image/video/audio diffusion
Quick illustration of how one can easily read books together with LLMs
Fast-stable-diffusion + DreamBooth
GLM-4-Voice | End-to-End Chinese-English Conversational Model
An agentic Machine Learning Engineer
The first real AI developer
Scalable generative AI framework built for researchers and developers
The official PyTorch implementation of Google's Gemma models
Lightweight framework for evaluating large language model performance
One-click deployment (including offline integration package)
Unleash Next-Level AI
Specify a github or local repo, github pull request
Open-source evaluation toolkit of large multi-modality models (LMMs)
Ready-to-run cloud templates for RAG
AnyTool: Universal Tool-Use Layer for AI Agents
Anthropic's original performance take-home, now open for you to try
Free, high-quality text-to-speech API endpoint to replace OpenAI
Automatically translates the text of a video based on a subtitle file
An MCP server that autonomously evaluates web applications
An MCP server for interacting with Google Colab
Ship AI Agents to Google Cloud in minutes, not months
Set of tools to assess and improve LLM security
A PyTorch library for implementing flow matching algorithms
Research code artifacts for Code World Model (CWM)