CLIP, Predict the most relevant text snippet given an image
Transforming Multimodal Content into Captivating Multilingual Audio
Statusline plugin for vim with prompts for several other applications
Framework for building realtime multimodal voice AI agents apps
A high-quality rapid TTS voice cloning model
Label Studio is a multi-type data labeling and annotation tool
Free, high-quality text-to-speech API endpoint to replace OpenAI
A Model Context Protocol (MCP) server
Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model
Persian NLP Toolkit
A library for converting HTML into PDFs using ReportLab
Mozc - a Japanese Input Method Editor designed for multi-platform
A Powerful Native Multimodal Model for Image Generation
Framework for building real-time voice and multimodal AI agents
Industrial-level controllable zero-shot text-to-speech system
Spark-TTS Inference Code
A Unified Framework for Text-to-3D and Image-to-3D Generation
TextWorld is a sandbox learning environment for the training
Compute distance between sequences
Speech-AI-Forge is a project developed around TTS generation model
Snippet solution for Vim
Implementation of Phenaki Video, which uses Mask GIT
Faster Whisper transcription with CTranslate2
Easily compute clip embeddings and build a clip retrieval system
Chat with it via text and voice