Fast and accurate automatic speech recognition (ASR) for edge devices
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Speech to Text to Speech, sends text as OSC messages
Robust Speech Recognition via Large-Scale Weak Supervision
Speech-to-text, text-to-speech, and speaker recognition
Framework for building real-time voice and multimodal AI agents
Real-time voice interactive digital human
Open source AI VTuber platform with voice chat and Live2D avatars
Conversational voice AI agents
In-App assistant SDK to build a multimodal conversational UX websites
The behavior guidance framework for customer-facing LLM agents
Build voice-based LLM agents. Modular + open source
TEN, a voice agent framework to create conversational AI.
Realtime AI Voice Agents with SoTA Multimodal AI models on Arduino ESP
Large Audio Language Model built for natural interactions
Automatic Speech Recognition with Word-level Timestamps
Multilingual speech recognition and audio understanding model
Map location picker component for Android
Fast multimodal LLM for real-time voice interaction and AI apps
Repo of Qwen2-Audio chat & pretrained large audio language model
A free, open source, and extensible speech-to-text application
Assistant SDK to build a multimodal conversational UX for Android
In-App assistant SDK to build a multimodal conversational UX for iOS
Deploy your private Gemini application for free with one click
Build your own AI friend