OCR model for complex documents with layout-aware structured outputs
Document (PDF, Word, PPTX ...) extraction and parse API
Visual Causal Flow
Open speech-to-speech models and pipelines by Hugging Face toolkit AI
Contexts Optical Compression
Use Microsoft Edge's online text-to-speech service from Python
AI tool for automatic batch short video creation and editing
A TTS that fits in your CPU (and pocket)
Modular AI image and video generation web UI with extensible tools
Implementing large models into scenario-based applications
Qwen3-ASR is an open-source series of ASR models
Automated translation solution for visual novels
Audiocraft is a library for audio processing and generation
AI-assisted storyboard and video generation tool
95% token savings. 155x faster queries. 16 languages
End-to-end speech processing toolkit
Open source NLP guide with models, methods, and real use cases
Semantic search and document parsing tools for the command line
Framework for building realtime multimodal voice AI agents apps
General-purpose image editing model that delivers high-fidelity
Sora AI Video Generator by Sora.FM
Framework for building real-time voice and multimodal AI agents
Running large language models on a single GPU
The python library for real-time communication
Open Source Speech Language Model