Towards Human-Level Text-to-Speech through Style Diffusion
Simple and flexible tool for managing secrets
Extract audio and video content and organize it into a Markdown note
Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model
VITS2 backbone with multilingual-bert
A fast TTS architecture with conditional flow matching
A library to help you make the most out of your Pixoo 64
Math OCR model that outputs LaTeX and markdown
Transforming Multimodal Content into Captivating Multilingual Audio
A community-supported supercharged version of paperless
Scalable data pre processing and curation toolkit for LLMs
Tool for parsing all logs in present directory for search of phrases.
Stanford NLP Python library for many human languages
Dataset of GPT-2 outputs for research in detection, biases, and more
Algorithms for outlier, adversarial and drift detection
State-of-the-art (SoTA) text-to-video pre-trained model
MTEB: Massive Text Embedding Benchmark
Implementation of Imagen, Google's Text-to-Image Neural Network
StreamSpeech is a seamless model for offline speech recognition
GLM-4-Voice | End-to-End Chinese-English Conversational Model
A julia code generator for regular expressions
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
An Open Source text-to-speech system built by inverting Whisper
tiktoken is a fast BPE tokeniser for use with OpenAI's models
Tools for manipulating datasets