Framework for building real-time voice and multimodal AI agents
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Fast multimodal LLM for real-time voice interaction and AI apps
StreamSpeech is a seamless model for offline speech recognition
NLP Cloud serves high performance pre-trained or custom models for NER
Persian NLP Toolkit
Capable of understanding text, audio, vision, video
OCR expert VLM powered by Hunyuan's native multimodal architecture
A simple tool for reading in poorly redacted documents
Open speech-to-speech models and pipelines by Hugging Face toolkit AI
Ready-to-use OCR with 80+ supported languages
CLI tool to extract (meta)data from PDF and manipulate PDF files
Repo of Qwen2-Audio chat & pretrained large audio language model
Translate the video from one language to another and embed dubbing
Real-time voice interactive digital human
Chat & pretrained large vision language model
AI-powered tool for generating, optimizing, and translating subtitles
Open source AI VTuber platform with voice chat and Live2D avatars
A Web UI for easy subtitle using whisper model
Advanced NLP with spaCy: A free online course
Qwen3-omni is a natively end-to-end, omni-modal LLM
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Conversational voice AI agents
LLM Large Model of Selling Anchor
A very simple framework for state-of-the-art NLP