A library for audio and music analysis, feature extraction
Automatic Speech Recognition with Word-level Timestamps
Framework for building real-time voice and multimodal AI agents
SOTA Open Source TTS
Self-hosted AI audio transcription
Sample code and notebooks for Generative AI on Google Cloud
Local-first AI Notepad for Private Meetings
Robust Speech Recognition via Large-Scale Weak Supervision
Oobabooga - The definitive Web UI for local AI, with powerful features
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Automatically translates the text of a video based on a subtitle file
Video translation and dubbing tool powered by LLMs
Open speech-to-speech models and pipelines by Hugging Face toolkit AI
AI tool converting video/audio into structured documents instantly
A Systematic Framework for Interactive World Modeling
An Open Source implementation of Notebook LM with more flexibility
Label Studio is a multi-type data labeling and annotation tool
Generate music based on natural language prompts using LLMs
Offline Text To Speech synthesis for python
Convert files and web content into clean, usable Markdown easily
Generate blog articles from video or audio
Make videos programmatically with React
Unified web UI for training and running open models locally
AI app store powered by 24/7 desktop history. open source
Multimodal-Driven Architecture for Customized Video Generation