Toolkit for conversational AI
Open source AI VTuber platform with voice chat and Live2D avatars
A full spaCy pipeline and models for scientific/biomedical documents
Omnilingual ASR Open-Source Multilingual SpeechRecognition
Voice Recognition to Text Tool
Formula recognition based on LaTeX-OCR and ONNXRuntime
OCRmyPDF adds an OCR text layer to scanned PDF files
Replace OpenAI GPT with another LLM in your app
Training data (data labeling, annotation, workflow) for all data types
Accurate × Fast × Comprehensive
Repo of Qwen2-Audio chat & pretrained large audio language model
Framework for building real-time voice and multimodal AI agents
An open and fair framework for everyone to build AI agents
Open speech-to-speech models and pipelines by Hugging Face toolkit AI
OCR expert VLM powered by Hunyuan's native multimodal architecture
State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX
Han Language Processing
LLM Large Model of Selling Anchor
Large Audio Language Model built for natural interactions
2D and 3D Face alignment library build using pytorch
A high-quality tool for convert PDF to Markdown and JSON
Translate the video from one language to another and embed dubbing
Video understanding codebase from FAIR for reproducing video models
NLP Cloud serves high performance pre-trained or custom models for NER
Image processing in Python