Image polygonal annotation with Python
Voice Recognition to Text Tool
Underthesea - Vietnamese NLP Toolkit
Omnilingual ASR Open-Source Multilingual SpeechRecognition
Open source AI VTuber platform with voice chat and Live2D avatars
Han Language Processing
Accurate × Fast × Comprehensive
OCRmyPDF adds an OCR text layer to scanned PDF files
Toolkit for conversational AI
OCR expert VLM powered by Hunyuan's native multimodal architecture
Fast multimodal LLM for real-time voice interaction and AI apps
Repo of Qwen2-Audio chat & pretrained large audio language model
LLM Large Model of Selling Anchor
NLP Cloud serves high performance pre-trained or custom models for NER
Open speech-to-speech models and pipelines by Hugging Face toolkit AI
A high-quality tool for convert PDF to Markdown and JSON
An open and fair framework for everyone to build AI agents
Open source annotation tool for machine learning practitioners
Framework for building real-time voice and multimodal AI agents
State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX
2D and 3D Face alignment library build using pytorch
Video understanding codebase from FAIR for reproducing video models
Image processing in Python
Semantic search and workflows for medical/scientific papers
Conversational voice AI agents