OCR model for complex documents with layout-aware structured outputs
Document (PDF, Word, PPTX ...) extraction and parse API
Visual Causal Flow
Open speech-to-speech models and pipelines by Hugging Face toolkit AI
Contexts Optical Compression
PDFCraft is a free, privacy-focused PDF toolkit
Use Microsoft Edge's online text-to-speech service from Python
MCScanX: Multiple Collinearity Scan toolkit X version
AI tool for automatic batch short video creation and editing
Automated YouTube Shorts pipeline
Clean network diagrams, One-time setup, zero upkeep
A TTS that fits in your CPU (and pocket)
Self-hosted collection of powerful web-based tools for everyday tasks
Modular AI image and video generation web UI with extensible tools
Skills shared by Baoyu for improving daily work efficiency with Claude
Implementing large models into scenario-based applications
Qwen3-ASR is an open-source series of ASR models
Automated translation solution for visual novels
Audiocraft is a library for audio processing and generation
95% token savings. 155x faster queries. 16 languages
AI-assisted storyboard and video generation tool
End-to-end speech processing toolkit
Open source NLP guide with models, methods, and real use cases
Semantic search and document parsing tools for the command line