Generate audiobooks from EPUBs, PDFs and text with captions
Official repository for LTX-Video
Automated YouTube Shorts pipeline
A2M is a desktop app that converts AUDIO TO MIDI in one click.
Streaming Real-time Audio-Driven Avatar Generation
Pythonic bindings for FFmpeg's libraries
JamTools is a cross-platform gadget set software
AI-powered tool for generating, optimizing, and translating subtitles
Fast multimodal LLM for real-time voice interaction and AI apps
Video editing with Python
Hub of ready-to-use datasets for ML models
Towards Human-Sounding Speech
Edit videos with Claude Code
Use Microsoft Edge's online text-to-speech service from Python
Instill Core is a full-stack AI infrastructure tool for data
Voice Recognition to Text Tool
Open Source Speech Language Model
Build AI-powered semantic search applications
Framework for building real-time voice and multimodal AI agents
A sound cloning tool with a web interface, using your voice
Spring AI Alibaba examples for building and testing AI apps
The Triton Inference Server provides an optimized cloud
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Document Image Parsing via Heterogeneous Anchor Prompting”
Build Vision Agents quickly with any model or video provider