Domain adaptation with text data only

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

Domain adaptation with text data only

Forum: Help

Created: 2021-01-06

Updated: 2021-01-06

Seb - 2021-01-06

Hello,

I am trying to do domain adaptation with text data only (no speech data). Sadly I was only able to find tutorials for adapting the acoustic model.
What I want to do is taking a base model, for example the "en-70k-0.1.lm" language model and use a large text corpus to adapt this language model to work better with a certain language domain.
I have already found the lextool to expand my dictionary, but I have not found out yet how I can adapt the language model with my text data.

Are there any tutorials out there that can show me how to train my model the way I described?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2021-01-06
  
  https://cmusphinx.github.io/wiki/tutoriallmadvanced/
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.