Menu

How to utilize the entries for punctuations/symbols in the CMU dictionary?

Help
Vickie
2014-12-18
2014-12-29
  • Vickie

    Vickie - 2014-12-18

    There are many entries for punctuations/symbols at the beginning of the CMU dictionary. I am using pocketsphinx for recognition. It recognizes, for example "semi-colon" but how could I make the decoder output it as the ";" character rather than "semi-colon." How could the punctuation/symbols in the CMU dictionary be utilized? Any examples would be much appreciated.

     
    • Alexander Solovets

      You have to replace manually.

       

      Last edit: Nickolay V. Shmyrev 2014-12-19
  • Vickie

    Vickie - 2014-12-21

    I am just being a bit inquisitive why such punctuations/symbols are in the pronouncing dictionary in the first place if they are not really used (at least according to my understanding) while decoding. If there is any reason for their presence in the dictionary that is not readily apparent to me, I would very much like to know that. My thinking is their presence in the dictionary implies a purpose. Thanks again.

     
    • Nickolay V. Shmyrev

      Those special words like ,comma are used in very rare special case - WSJ database training, the part with punctuation (so called verbalized punctuation). The purpose is to make a dictation model where speaker can pronounce punctuation.

      They are not used normally. You can read about WSJ with punctuation here:

      http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.329.8130

      You can use them with lm with verbalized punctuation, for example this lm:

      http://www.keithv.com/software/giga/lm_giga_5k_vp_3gram.zip

       
  • Vickie

    Vickie - 2014-12-29

    Nickolay, thanks for shedding light on this.

     

Log in to post a comment.

MongoDB Logo MongoDB