Easily compute clip embeddings and build a clip retrieval system
Handwritten Text Recognition (HTR) system implemented with TensorFlow
Reading book source
Industrial-level controllable zero-shot text-to-speech system
Speech-AI-Forge is a project developed around TTS generation model
Controllable & emotion-expressive zero-shot TTS
ComfyUI wrapper nodes for HunyuanVideo
Unifying 3D Mesh Generation with Language Models
A nearly-live implementation of OpenAI's Whisper
Framework for building real-time voice and multimodal AI agents
AI-powered tool for generating, optimizing, and translating subtitles
Implementation of Video Diffusion Models
Free, high-quality text-to-speech API endpoint to replace OpenAI
Speech recognition module for Python
Python binding to the Apache Tika™ REST services
High-Quality Voice Cloning TTS for 600+ Languages
A high-quality rapid TTS voice cloning model
Open Source Document Management System for Digital Archives
Agent harness to make your slop code well-engineered and beautiful
Spark-TTS Inference Code
An Open Source text-to-speech system built by inverting Whisper
AI-powered code assistant for Vim. OpenAI and ChatGPT plugin for Vim
Generate blog articles from video or audio
State-of-the-art (SoTA) text-to-video pre-trained model
A community-supported supercharged version of paperless