Hi,
I have downloaded cmudict-en-us.dict and using it in my transcriber program. Parallelly, I have created a small corpus of words and created .lm and .dic files using http://www.speech.cs.cmu.edu/tools/lmtool-new.html.
I observe the differnce between the 2 dictionaries is the case of the individual words. The cmudict-en-us.dict contains all words in lowercase, where as my prog.dic contains all words in uppercase. Following are my questions:
1. Does the case matter while recognition?
2. If I ignore the case differences, during adaptation procedure, the program bw gives a warning like "Three not found in the dictionary", though the word THREE is existing in the dictionary. If I add a duplicate entry with the same ARPAbets but "three" in lower case, the error disappears and the recognition accuracy is also better.
Kindly clarify.
Balaji.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I have downloaded cmudict-en-us.dict and using it in my transcriber program. Parallelly, I have created a small corpus of words and created .lm and .dic files using http://www.speech.cs.cmu.edu/tools/lmtool-new.html.
I observe the differnce between the 2 dictionaries is the case of the individual words. The cmudict-en-us.dict contains all words in lowercase, where as my prog.dic contains all words in uppercase. Following are my questions:
1. Does the case matter while recognition?
2. If I ignore the case differences, during adaptation procedure, the program bw gives a warning like "Three not found in the dictionary", though the word THREE is existing in the dictionary. If I add a duplicate entry with the same ARPAbets but "three" in lower case, the error disappears and the recognition accuracy is also better.
Kindly clarify.
Balaji.