wav2vec2-large-xlsr-53-portuguese is an automatic speech recognition (ASR) model fine-tuned on Portuguese using the Common Voice 6.1 dataset. It is based on Facebook’s wav2vec2-large-xlsr-53, a multilingual self-supervised learning model, and is optimized to transcribe Portuguese speech sampled at 16kHz. The model performs well without a language model, though adding one can improve word error rate (WER) and character error rate (CER). It achieves a WER of 11.3% (or 9.01% with LM) on Common Voice test data, demonstrating high accuracy for a single-language ASR model. Inference can be done using HuggingSound or via a custom PyTorch script using Hugging Face Transformers and Librosa. Training scripts and evaluation methods are open source and available on GitHub. It is released under the Apache 2.0 license and intended for ASR tasks in Brazilian Portuguese.

Features

  • Fine-tuned on Common Voice 6.1 Portuguese dataset
  • Based on Facebook’s XLSR-53 wav2vec2 large architecture
  • Supports 16kHz audio input for optimal accuracy
  • Works with or without a language model (LM)
  • Available via HuggingSound and Hugging Face Transformers
  • Provides example code for evaluation and inference
  • Achieves 9.01% WER with LM and 3.21% CER with LM
  • Apache-2.0 licensed and freely usable for commercial ASR systems

Project Samples

Project Activity

See All Activity >

Categories

AI Models

Follow wav2vec2-large-xlsr-53-portuguese

wav2vec2-large-xlsr-53-portuguese Web Site

nel_h2
Keep company data safe with Chrome Enterprise Icon
Keep company data safe with Chrome Enterprise

Protect your business with AI policies and data loss prevention in the browser

Make AI work your way with Chrome Enterprise. Block unapproved sites and set custom data controls that align with your company's policies.
Download Chrome
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of wav2vec2-large-xlsr-53-portuguese!

Additional Project Details

Registered

2025-07-01