mms-300m-1130-forced-aligner is a multilingual forced alignment model based on Meta’s MMS-300M wav2vec2 checkpoint, adapted for Hugging Face’s Transformers library. It supports forced alignment between audio and corresponding text across 158 languages, offering broad multilingual coverage. The model enables accurate word- or phoneme-level timestamping using Connectionist Temporal Classification (CTC) emissions. Unlike other tools, it provides significant memory efficiency compared to the...