Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Large Audio Language Model built for natural interactions
Streaming Real-time Audio-Driven Avatar Generation
BlackHole is a modern macOS audio loopback driver
Synchronized Translation for Videos
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Fast multimodal LLM for real-time voice interaction and AI apps
Automated YouTube Shorts pipeline
Video translation and dubbing tool powered by LLMs
Framework for building real-time voice and multimodal AI agents
A gallery that showcases on-device ML/GenAI use cases
Translate the video from one language to another and embed dubbing
Virtual modular synthesizer plugin
Generate audiobooks from EPUBs, PDFs and text with captions
Robust Speech Recognition via Large-Scale Weak Supervision
Cross-platform, customizable ML solutions
Generate audiobooks from e-books, voice cloning & 1107+ languages
Stream VR games from your PC to your headset via Wi-Fi
Taming Stable Diffusion for Lip Sync
High-resolution models for human tasks
A python tool that uses GPT-4, FFmpeg, and OpenCV
Clean network diagrams, One-time setup, zero upkeep
Solidity Compiler for Solana, Polkadot and Stellar
A Systematic Framework for Interactive World Modeling