Contexts Optical Compression
Audio foundation model excelling in audio understanding
Synchronized Translation for Videos
StreamSpeech is a seamless model for offline speech recognition
Large Audio Language Model built for natural interactions
Visual Causal Flow
Real-time voice interactive digital human
Qwen3-ASR is an open-source series of ASR models
An Open Source text-to-speech system built by inverting Whisper
Workflow and speech recognition app
Interface for OuteTTS models
Bailing is a voice dialogue robot similar to GPT-4o
Transformers4Rec is a flexible and efficient library
Cerberus Content Management System
FaceXlib aims at providing ready-to-use face-related functions
A Strong and Easy-to-use Single View 3D Hand+Body Pose Estimator
VGGFace2 Dataset for Face Recognition
FaceAccess is an Access Control System based on Facial Recognition
A pygame music lib.
An Incremental Spoken Dialogue Processing Toolkit
Timelapse creation using Face Recognition