I would like to ask the best way of dealing with filler models in LM.
I have trained filler models for speech samples such as DTMF, different noises (cough, breath etc...), sentence begin and end silences. Now I would like to learn what is the best way of incorporating these models in to a bi-gram language model.
I train language model with CMU SLM language model toolkit. In this toolkit, there is an option to define "contex cues" while preparing LM. The filler dictionary entries should be defined as context cues while trainign LM? Or is there a better way of dealing with filler words in LM such as not including fillers in vocabulary of LM.
Thank you very much
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I would like to ask the best way of dealing with filler models in LM.
I have trained filler models for speech samples such as DTMF, different noises (cough, breath etc...), sentence begin and end silences. Now I would like to learn what is the best way of incorporating these models in to a bi-gram language model.
I train language model with CMU SLM language model toolkit. In this toolkit, there is an option to define "contex cues" while preparing LM. The filler dictionary entries should be defined as context cues while trainign LM? Or is there a better way of dealing with filler words in LM such as not including fillers in vocabulary of LM.
Thank you very much
Fillers are inserted automatically after each word. So the language model shouldn't include them at all.
The only thing you should care about in language model is phrase boundaries. But they are completely different from fillers.