There are many entries for punctuations/symbols at the beginning of the CMU dictionary. I am using pocketsphinx for recognition. It recognizes, for example "semi-colon" but how could I make the decoder output it as the ";" character rather than "semi-colon." How could the punctuation/symbols in the CMU dictionary be utilized? Any examples would be much appreciated.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am just being a bit inquisitive why such punctuations/symbols are in the pronouncing dictionary in the first place if they are not really used (at least according to my understanding) while decoding. If there is any reason for their presence in the dictionary that is not readily apparent to me, I would very much like to know that. My thinking is their presence in the dictionary implies a purpose. Thanks again.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Those special words like ,comma are used in very rare special case - WSJ database training, the part with punctuation (so called verbalized punctuation). The purpose is to make a dictation model where speaker can pronounce punctuation.
They are not used normally. You can read about WSJ with punctuation here:
There are many entries for punctuations/symbols at the beginning of the CMU dictionary. I am using pocketsphinx for recognition. It recognizes, for example "semi-colon" but how could I make the decoder output it as the ";" character rather than "semi-colon." How could the punctuation/symbols in the CMU dictionary be utilized? Any examples would be much appreciated.
You have to replace manually.
Last edit: Nickolay V. Shmyrev 2014-12-19
I am just being a bit inquisitive why such punctuations/symbols are in the pronouncing dictionary in the first place if they are not really used (at least according to my understanding) while decoding. If there is any reason for their presence in the dictionary that is not readily apparent to me, I would very much like to know that. My thinking is their presence in the dictionary implies a purpose. Thanks again.
Those special words like ,comma are used in very rare special case - WSJ database training, the part with punctuation (so called verbalized punctuation). The purpose is to make a dictation model where speaker can pronounce punctuation.
They are not used normally. You can read about WSJ with punctuation here:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.329.8130
You can use them with lm with verbalized punctuation, for example this lm:
http://www.keithv.com/software/giga/lm_giga_5k_vp_3gram.zip
Nickolay, thanks for shedding light on this.