I am trying to do domain adaptation with text data only (no speech data). Sadly I was only able to find tutorials for adapting the acoustic model.
What I want to do is taking a base model, for example the "en-70k-0.1.lm" language model and use a large text corpus to adapt this language model to work better with a certain language domain.
I have already found the lextool to expand my dictionary, but I have not found out yet how I can adapt the language model with my text data.
Are there any tutorials out there that can show me how to train my model the way I described?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
I am trying to do domain adaptation with text data only (no speech data). Sadly I was only able to find tutorials for adapting the acoustic model.
What I want to do is taking a base model, for example the "en-70k-0.1.lm" language model and use a large text corpus to adapt this language model to work better with a certain language domain.
I have already found the lextool to expand my dictionary, but I have not found out yet how I can adapt the language model with my text data.
Are there any tutorials out there that can show me how to train my model the way I described?
https://cmusphinx.github.io/wiki/tutoriallmadvanced/