The most powerful and modular diffusion model GUI, api and backend
OCRmyPDF adds an OCR text layer to scanned PDF files
A gradio web UI for running Large Language Models like LLaMA
Port of Facebook's LLaMA model in C/C++
A GUI tool for extracting hard-coded subtitle (hardsub) from videos
YOLOv5 is the world's most loved vision AI
Visualizer for neural network, deep learning, machine learning models
Offline speech recognition API for Android, iOS, Raspberry Pi
Stable Diffusion web UI
3D reconstruction software
Chemcrow
Powerful AI language model (MoE) optimized for efficiency/performance
A deep learning toolkit for Text-to-Speech, battle-tested in research
Open-source, high-performance AI model with advanced reasoning
Image inpainting tool powered by SOTA AI Model
Open-Sora: Democratizing Efficient Video Production for All
NVR with realtime local object detection for IP cameras
Low-code app builder for RAG and multi-agent AI applications
Ready-to-use OCR with 80+ supported languages
Speech-to-text, text-to-speech, and speaker recognition
State-of-the-art TTS model under 25MB
Generate short videos with one click using AI LLM
Comprehensive Gradio WebUI for audio processing
One-click face swap
A Lightweight Face Recognition and Facial Attribute Analysis