CMU Sphinx / Forums / Help: Origin of pronunciation dictionaries

Larry Houdini - 2019-05-06

In browsing through the Language and Acoustic Models folders, I've found pronunciation dictionaries for different languages. Where did these come from? I'm pretty sure that the English language dictionary is human edited. The Dutch dictionary, however, has 1.4 million words from multiple languages. Is the other dictionaries human edited, or are they autogenerated somehow? Can anyone comment on the accuracy of the pronunciation data?

alternate

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2019-05-06
  
  Most of dictionaries even hand-reviewed are not accurate at all. cmudict for example is half-phonetic half-phonemic and does not really reflect the way people speak. Dutch dictionary is probably from http://www.fon.hum.uva.nl/rob/Publications/IFAcorpusEurospeech2001.pdf or from CELEX but it is hard to tell these days https://catalog.ldc.upenn.edu/LDC96L14 . Now when Google forgets things actively it is very hard to tell. There is also Utwente-Kaldi project, they have another Dutch dictionary, probably more consistent.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Larry Houdini - 2019-05-06
    
    Thanks for the info.
    
    For the record though, I would attest that, for the English CMU dictionary, about 92%-98% of the words are correct. That is, they are correct given the constraints imposed by the phoneme set.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Nickolay V. Shmyrev - 2019-05-26
      
      No, even most common words are not really correct. See
      
      Speaking in shorthand ± A syllable-centric perspective for understanding pronunciation variation
      Steven Greenberg
      
      http://www1.icsi.berkeley.edu/~steveng/PDF/SpeakingInShorthandMIME.pdf
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Origin of pronunciation dictionaries

Speech Recognition Toolkit

Forums

Help

Origin of pronunciation dictionaries document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Origin of pronunciation dictionaries