Repo of Qwen2-Audio chat & pretrained large audio language model
Detect faces in an image
Capable of understanding text, audio, vision, video
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Blazeface is a lightweight model that detects faces in images
Portuguese ASR model fine-tuned on XLSR-53 for 16kHz audio input
Russian ASR model fine-tuned on Common Voice and CSS10 datasets
Multimodal Transformer for document image understanding and layout
ClinicalBERT model trained on MIMIC notes for clinical NLP tasks