OCR model for complex documents with layout-aware structured outputs
Document (PDF, Word, PPTX ...) extraction and parse API
Visual Causal Flow
Open speech-to-speech models and pipelines by Hugging Face toolkit AI
Contexts Optical Compression
Use Microsoft Edge's online text-to-speech service from Python
AI tool for automatic batch short video creation and editing
A TTS that fits in your CPU (and pocket)
Modular AI image and video generation web UI with extensible tools
Implementing large models into scenario-based applications
Qwen3-ASR is an open-source series of ASR models
Automated translation solution for visual novels
Audiocraft is a library for audio processing and generation
AI-assisted storyboard and video generation tool
95% token savings. 155x faster queries. 16 languages
End-to-end speech processing toolkit
Open source NLP guide with models, methods, and real use cases
Semantic search and document parsing tools for the command line
General-purpose image editing model that delivers high-fidelity
Framework for building realtime multimodal voice AI agents apps
Sora AI Video Generator by Sora.FM
Running large language models on a single GPU
Framework for building real-time voice and multimodal AI agents
The python library for real-time communication
Open Source Speech Language Model