Menu

Origin of pronunciation dictionaries

Help
2019-05-06
2019-05-26
  • Larry Houdini

    Larry Houdini - 2019-05-06

    In browsing through the Language and Acoustic Models folders, I've found pronunciation dictionaries for different languages. Where did these come from? I'm pretty sure that the English language dictionary is human edited. The Dutch dictionary, however, has 1.4 million words from multiple languages. Is the other dictionaries human edited, or are they autogenerated somehow? Can anyone comment on the accuracy of the pronunciation data?

     
    • Nickolay V. Shmyrev

      Most of dictionaries even hand-reviewed are not accurate at all. cmudict for example is half-phonetic half-phonemic and does not really reflect the way people speak. Dutch dictionary is probably from http://www.fon.hum.uva.nl/rob/Publications/IFAcorpusEurospeech2001.pdf or from CELEX but it is hard to tell these days https://catalog.ldc.upenn.edu/LDC96L14 . Now when Google forgets things actively it is very hard to tell. There is also Utwente-Kaldi project, they have another Dutch dictionary, probably more consistent.

       
      • Larry Houdini

        Larry Houdini - 2019-05-06

        Thanks for the info.

        For the record though, I would attest that, for the English CMU dictionary, about 92%-98% of the words are correct. That is, they are correct given the constraints imposed by the phoneme set.

         
        • Nickolay V. Shmyrev

          No, even most common words are not really correct. See

          Speaking in shorthand ± A syllable-centric perspective for understanding pronunciation variation
          Steven Greenberg

          http://www1.icsi.berkeley.edu/~steveng/PDF/SpeakingInShorthandMIME.pdf

           

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.