Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Large Audio Language Model built for natural interactions
Streaming Real-time Audio-Driven Avatar Generation
Automated YouTube Shorts pipeline
BlackHole is a modern macOS audio loopback driver
Synchronized Translation for Videos
Fast multimodal LLM for real-time voice interaction and AI apps
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Video translation and dubbing tool powered by LLMs
Framework for building real-time voice and multimodal AI agents
A gallery that showcases on-device ML/GenAI use cases
Generate audiobooks from EPUBs, PDFs and text with captions
Virtual modular synthesizer plugin
Translate the video from one language to another and embed dubbing
Robust Speech Recognition via Large-Scale Weak Supervision
High-resolution models for human tasks
Stream VR games from your PC to your headset via Wi-Fi
Cross-platform, customizable ML solutions
Taming Stable Diffusion for Lip Sync
A python tool that uses GPT-4, FFmpeg, and OpenCV
Generate audiobooks from e-books, voice cloning & 1107+ languages
Clean network diagrams, One-time setup, zero upkeep
Solidity Compiler for Solana, Polkadot and Stellar
A fast TTS architecture with conditional flow matching