Contents of dictionary

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

Contents of dictionary

Forum: Help

Creator: Balaji

Created: 2018-02-05

Updated: 2018-02-10

Balaji - 2018-02-05

Hi,
I have downloaded cmudict-en-us.dict and using it in my transcriber program. Parallelly, I have created a small corpus of words and created .lm and .dic files using http://www.speech.cs.cmu.edu/tools/lmtool-new.html.
I observe the differnce between the 2 dictionaries is the case of the individual words. The cmudict-en-us.dict contains all words in lowercase, where as my prog.dic contains all words in uppercase. Following are my questions:
1. Does the case matter while recognition?
2. If I ignore the case differences, during adaptation procedure, the program bw gives a warning like "Three not found in the dictionary", though the word THREE is existing in the dictionary. If I add a duplicate entry with the same ARPAbets but "three" in lower case, the error disappears and the recognition accuracy is also better.

Kindly clarify.

Balaji.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.