The data structure for multimodal data
A TTS model capable of generating ultra-realistic dialogue
Unofficial Python API and agentic skill for Google NotebookLM
The most powerful and modular diffusion model GUI, api and backend
Build cross-modal and multimodal applications on the cloud
Easily pair images with audio file counterparts in bulk
Instant voice cloning by MIT and MyShell. Audio foundation model
Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model
The comprehensive WSGI web application library
An adaptive Web Scraping framework
Data Lake for Deep Learning. Build, manage, and query datasets
Open-source MCP server that gives your coding agent
Stream your laptop or external webcam over LAN or Wi-Fi hotspot.
OCR expert VLM powered by Hunyuan's native multimodal architecture
Broadcast Automation Emergency Alerting
tensorboard for pytorch (and chainer, mxnet, numpy, etc.)
chromecast videos from KDE desktop
PyIDM remake for downloading stuff
Spyder IDE plugin providing separate chat pane for AI Assistance
Get easy bot lobbies in any game with our bot lobbies tool.
UFONet - Denial of Service Toolkit
Chatbot daemon that connects to your favorite chat services
Code for the paper Hybrid Spectrogram and Waveform Source Separation