I am having of list of movie names , song names and singers (upto 1.5 lakh).
I have built a 3-gram language model (with improved-kneser-ney smoothing) for speech recognition on movie/song/singer names. Currently I am able to get around 70 % recognition accuracy using the above 3-gram language model . Some words in the list have more frequency and some words in the list have less frequency like for example :
Dhil deewana
Dhil dhadakne do
Dhil to pagal he
"Dhil" in above list appears in multiple movie names.
some movie names like
"fanaa"
appear very few times in list and the name is single word only.
Now when I have an utterance with "fanaa" it is getting recognized as "ko naam"
"ko" and "naam" have very high frequency in the corpus used to build language model.
Now how can I artificially tune the corpus so that less frequent names (may be single word or multiword names") gets recognized .
More suggestions are welcome for tuning the language model the so that recognition accuracy improves.
Thanks
Srinidhi
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have tried different weights and found an optimal weight where I am getting recognition accuracy of 70% . But now I would like to tweak leanguage model so that even less frequent words in corpus gets recognized. (may be by manually tweaking corpus ). Every name (may be single or multiword) is present only once in the corpus.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Language model tweaks are pretty senseless for the good accuracy. They can give you a couple of percents if vocabulary is small, but overall you need to get the good acoustic model first.
Since you are trying to recognize Hindi names and I doubt you have a good Hindi acoustic model you should probably better focus on that, not on the langauge model.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have trained the accoustic model using 4 million transcriptions.
I have a test set of 3000 utterances. When I build a language model using transcriptions for those 3000 utterances only , I get accuracy of 97-98 %. But when I take a list of movie names of around 1 lakh and build a language model , utterance containing "fanaa" is not getting recognized with language model built with 1 lakh movie name corpus (also containing text "fanaa").
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
There could be many causes that particular phrase is not recognized. From stupid mistakes in phonetic dictionary to more complex interactions between acoustic model, langauge model and search beams. You have to doublecheck everything and consider all possible reasons, not just the langauge model.
To verify langauge model compute perplexity on the test set, it should be reasonably small below 200.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
And if you care about probability in langauge model, you can modify counts by popularity when building langauge model. "Fanaa" is way more popular than "Jab we met" or something. So you can create counts which would prefer former for the latter. Such langauge model might be more reasonable.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
How do I create counts for "fanaa" or any similar less frequent names so that they are recognized more consistently. What I have to do while building language model to achieve the same
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You can use ngram-count to estimate model from counts file. The counts might not be necessary counts but could be based on movie popularity. You can obtain movie popularity rating from the web or something.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I am having of list of movie names , song names and singers (upto 1.5 lakh).
I have built a 3-gram language model (with improved-kneser-ney smoothing) for speech recognition on movie/song/singer names. Currently I am able to get around 70 % recognition accuracy using the above 3-gram language model . Some words in the list have more frequency and some words in the list have less frequency like for example :
Dhil deewana
Dhil dhadakne do
Dhil to pagal he
"Dhil" in above list appears in multiple movie names.
some movie names like
"fanaa"
appear very few times in list and the name is single word only.
Now when I have an utterance with "fanaa" it is getting recognized as "ko naam"
"ko" and "naam" have very high frequency in the corpus used to build language model.
Now how can I artificially tune the corpus so that less frequent names (may be single word or multiword names") gets recognized .
More suggestions are welcome for tuning the language model the so that recognition accuracy improves.
Thanks
Srinidhi
Did you try to play with language weight? "-lw" option. Try several
I have tried different weights and found an optimal weight where I am getting recognition accuracy of 70% . But now I would like to tweak leanguage model so that even less frequent words in corpus gets recognized. (may be by manually tweaking corpus ). Every name (may be single or multiword) is present only once in the corpus.
Language model tweaks are pretty senseless for the good accuracy. They can give you a couple of percents if vocabulary is small, but overall you need to get the good acoustic model first.
Since you are trying to recognize Hindi names and I doubt you have a good Hindi acoustic model you should probably better focus on that, not on the langauge model.
I have trained the accoustic model using 4 million transcriptions.
I have a test set of 3000 utterances. When I build a language model using transcriptions for those 3000 utterances only , I get accuracy of 97-98 %. But when I take a list of movie names of around 1 lakh and build a language model , utterance containing "fanaa" is not getting recognized with language model built with 1 lakh movie name corpus (also containing text "fanaa").
There could be many causes that particular phrase is not recognized. From stupid mistakes in phonetic dictionary to more complex interactions between acoustic model, langauge model and search beams. You have to doublecheck everything and consider all possible reasons, not just the langauge model.
To verify langauge model compute perplexity on the test set, it should be reasonably small below 200.
And if you care about probability in langauge model, you can modify counts by popularity when building langauge model. "Fanaa" is way more popular than "Jab we met" or something. So you can create counts which would prefer former for the latter. Such langauge model might be more reasonable.
How do I create counts for "fanaa" or any similar less frequent names so that they are recognized more consistently. What I have to do while building language model to achieve the same
You can find a description of counts file here:
http://www.speech.sri.com/projects/srilm/manpages/ngram-count.1.html
You can use ngram-count to estimate model from counts file. The counts might not be necessary counts but could be based on movie popularity. You can obtain movie popularity rating from the web or something.