OCR software, free and offline
Image polygonal annotation with Python
The most powerful and modular diffusion model GUI, api and backend
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Automate native Android apps with AI using accessibility APIs
Inference framework for 1-bit LLMs
"LightRAG: Simple and Fast Retrieval-Augmented Generation"
Lightweight Python library for adding real-time multi-object tracking
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Implementation of Recurrent Interface Network (RIN)
A lightweight vision library for performing large object detection
Superfast AI decision making and processing of multi-modal data
Code for running inference and finetuning with SAM 3 model
PyTorch code and models for VJEPA2 self-supervised learning from video
Qwen3-omni is a natively end-to-end, omni-modal LLM
Letta (formerly MemGPT) is a framework for creating LLM services
Minimal Claude Code alternative. Single Python file, zero dependencies
SwarmZero's SDK for building AI agents, swarms of agents and much more
Build cross-modal and multimodal applications on the cloud
Open-source, high-performance AI model with advanced reasoning
Implementation of a U-net complete with efficient attention
LLM based autonomous agent that does online comprehensive research
Convert AI papers to GUI
Generate audiobooks from e-books, voice cloning & 1107+ languages
High-resolution models for human tasks