Control Gmail, Google Calendar, Docs, Sheets, Slides, Chat, Forms
Gracefully face hCaptcha challenge with multimodal llms
A modular Agentic RAG built with LangGraph
Build and run agents you can see, understand and trust
Bash is all you need, write a claude code with only 16 line code
A Systematic Framework for Interactive World Modeling
Management of Yandex Station and other smart home devices
Controllable and fast Text-to-Speech for over 7000 languages
Pokee Deep Research Model Open Source Repo
Unified Multimodal Understanding and Generation Models
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Renderer for the harmony response format to be used with gpt-oss
A Python library for audio
Bring the notion of Model-as-a-Service to life
A refreshing functional take on deep learning
AI Toolkit for Healthcare Imaging
Request recommended movies, TV shows and anime to Jellyseer/Overseer
A fast TTS architecture with conditional flow matching
Python library and CLI tool to interface with Google Translate
A text-to-speech, speech-to-text and speech-to-speech library
Helping you get the most out of AWS, wherever you use MCP
No-code multi-agent framework to build LLM Agents, workflows
Toolkit for conversational AI
Build AI-powered semantic search applications
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning