A framework to enable multimodal models to operate a computer
YOLOv5 is the world's most loved vision AI
Open-source, high-performance AI model with advanced reasoning
Powerful AI language model (MoE) optimized for efficiency/performance
State-of-the-art TTS model under 25MB
Powerful Android AI agent with tools, automation, and Linux shell
An open phone agent model & framework
A natural language interface for computers
Speech recognition module for Python
Run your own AI cluster at home with everyday devices
Enable AI to control your desktop, mobile and HMI devices
Python client for the Telegram's tdlib
Low-latency AI inference engine optimized for mobile devices
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
General proxy performance testing tool based on Clash using Telegram
A Telegram bot that integrates with OpenAI's official ChatGPT APIs
An open sourced end-to-end VLM-based GUI Agent
Datasets, transforms and models specific to Computer Vision
Automate native Android apps with AI using accessibility APIs
Interact with your documents using the power of GPT
Operating LLMs in production
A lightweight audio-to-MIDI converter with pitch bend detection
RL research on Android devices
Reading book source
On-device Speech-to-Intent engine powered by deep learning