Fast multimodal LLM for real-time voice interaction and AI apps
A high-quality tool for convert PDF to Markdown and JSON
Repo of Qwen2-Audio chat & pretrained large audio language model
2D and 3D Face alignment library build using pytorch
Semantic search and workflows for medical/scientific papers
An open and fair framework for everyone to build AI agents
A proof-of-concept jupyter extension which converts english queries
LLM Large Model of Selling Anchor
Get your documents ready for gen AI
Ready-to-use OCR with 80+ supported languages
A framework to enable multimodal models to operate a computer
Open-Source AI Camera. Empower any camera/CCTV
Open speech-to-speech models and pipelines by Hugging Face toolkit AI
Large Audio Language Model built for natural interactions
StreamSpeech is a seamless model for offline speech recognition
Video understanding codebase from FAIR for reproducing video models
Industrial-strength Natural Language Processing (NLP)
Stanford NLP Python library for many human languages
The behavior guidance framework for customer-facing LLM agents
Translate the video from one language to another and embed dubbing
Visual Causal Flow
Real-time voice interactive digital human
A simple tool for reading in poorly redacted documents
Advanced NLP with spaCy: A free online course
Integrating LLMs into structured NLP pipelines