In my scenario I have audio files and transcript pairs for different speakers.
The two line up pretty well, and I primarily just need to each word in the
text where possible.
At first, I was combining many similar transcripts together and building a
language model using the tools. At some point, as an experiment, I tried
creating a language model using just one transcript and using that on
recognition with the corresponding file. This gave me better results than the
combined LM, so I began doing that for more speakers. Today I ran into a
problem with a shorter transcript and was getting poor recognition results
using this very small LM. The combined LM still performed all right for this
speaker though. I tried backing the small LM down to using bigrams, and got
better results than either the individual LM (using trigrams) or the combined
LM.
So, my question is if there is a rule of thumb anyone knows for when to use
bigrams vs trigrams when building an LM using a very limited set of data. Some
of my transcripts are longer than others, so it seems like switching between
using one or the other depending on the length is a good idea.
also, is there every an amount of text that would make it worthwhile to use
4-grams?
Thanks!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
In my scenario I have audio files and transcript pairs for different speakers.
The two line up pretty well, and I primarily just need to each word in the
text where possible.
At first, I was combining many similar transcripts together and building a
language model using the tools. At some point, as an experiment, I tried
creating a language model using just one transcript and using that on
recognition with the corresponding file. This gave me better results than the
combined LM, so I began doing that for more speakers. Today I ran into a
problem with a shorter transcript and was getting poor recognition results
using this very small LM. The combined LM still performed all right for this
speaker though. I tried backing the small LM down to using bigrams, and got
better results than either the individual LM (using trigrams) or the combined
LM.
So, my question is if there is a rule of thumb anyone knows for when to use
bigrams vs trigrams when building an LM using a very limited set of data. Some
of my transcripts are longer than others, so it seems like switching between
using one or the other depending on the length is a good idea.
also, is there every an amount of text that would make it worthwhile to use
4-grams?
Thanks!