Instant voice cloning by MIT and MyShell. Audio foundation model
Automatic subtitle synchronization tool
A Family of Open Sourced Music Foundation Models
Zero-copy PDF text extraction library written in Zig
A tool for semi-automatic cell type classification, harmonization
SOTA Open Source TTS
Taming Stable Diffusion for Lip Sync
Interface for OuteTTS models
Calculate quality metrics with FFmpeg (SSIM, PSNR, VMAF, VIF)
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
Multi-lingual large voice generation model, providing inference
Run PyTorch LLMs locally on servers, desktop and mobile
A lightweight text-to-speech model with zero-shot voice cloning
TorchMultimodal is a PyTorch library
Open speech-to-speech models and pipelines by Hugging Face toolkit AI
A collection of tools, libraries, and tests for Vulkan shader
Learn all about Digital Forensics and Computer Forensics
Official code for Style Aligned Image Generation via Shared Attention
Open source implementation of Microsoft's VALL-E X zero-shot TTS model
Sample code for Google Cloud Vision
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Some useful apps based on PyQt5
Chinese version of the official document of TensorFlow