Speech-AI-Forge is a project developed around TTS generation model
Tokenizer-Free TTS for Multilingual Speech Generation
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
A nearly-live implementation of OpenAI's Whisper
CLIP, Predict the most relevant text snippet given an image
A Powerful Native Multimodal Model for Image Generation
A high-quality rapid TTS voice cloning model
Framework for building realtime multimodal voice AI agents apps
Industrial-level controllable zero-shot text-to-speech system
TextWorld is a sandbox learning environment for the training
Easy to use Python library for creating 2D arcade games
Handwritten Text Recognition (HTR) system implemented with TensorFlow
Easily compute clip embeddings and build a clip retrieval system
Framework for building real-time voice and multimodal AI agents
Accurate × Fast × Comprehensive
Agent harness to make your slop code well-engineered and beautiful
Instagram OSINT tool for gathering profile data and public posts
Official Python inference and LoRA trainer package
Create prompt-friendly codebase digests from any Git repository URL
AI-powered tool for generating, optimizing, and translating subtitles
A fast TTS architecture with conditional flow matching
Speakr is a personal, self-hosted web application
Generate blog articles from video or audio
A Unified Framework for Text-to-3D and Image-to-3D Generation