Menu

Unable to lookup word , unable to produce phonetic transcription | why no smoothing?

Help
Joy Jen
2019-03-04
2019-03-04
  • Joy Jen

    Joy Jen - 2019-03-04

    In my speech recognition project, I got warnings of "Unable to lookup word".

    WARN: "mk_phone_list.c", line 178: Unable to lookup word 'skitters' in the dictionary
    WARN: "next_utt_states.c", line 83: Unable to produce phonetic transcription for the utterance '<s> in the same way the line skitters along at a low level for millennia then rockets up exponentially in the 19th and 20th century </s>'
    WARN: "main.c", line 826: Skipped utterance '<s> in the same way the line skitters along at a low level for millennia then rockets up exponentially in the 19th and 20th century </s>'
    WARN: "mk_phone_list.c", line 178: Unable to lookup word 'crown-of-thorns' in the dictionary
    WARN: "next_utt_states.c", line 83: Unable to produce phonetic transcription for the utterance '<s> crown-of-thorns starfish in the indian ocean zebra mussels in the great lakes spruce budworm here in canada </s>'
    

    First, is there a way to fix the errors besides finding the out of vocabulary word and adding it back to the dictionary?
    If there are smoothing in language models, can we do a similar thing when we are adapting the acoustics model? If not, why not?
    Why if we miss one word, the recognizer is unable to produce phonetic transcription for the whole sentence?
    Second, how to deal with hyphens? They might be essential for compound words.

     
    • Nickolay V. Shmyrev

      First, is there a way to fix the errors besides finding the out of vocabulary word and adding it back to the dictionary?

      No.

      If there are smoothing in language models, can we do a similar thing when we are adapting the acoustics model? If not, why not?

      MAP adapatation is similar to language model interpolation.

      Why if we miss one word, the recognizer is unable to produce phonetic transcription for the whole sentence?

      Thats reasonable if you think about it.

      Second, how to deal with hyphens? They might be essential for compound words.

      Leave them as is.

       
      • Joy Jen

        Joy Jen - 2019-03-04

        I noticed that in the CMU US English Dictionary, many compound words do not have hyphens. So it causes problems when I am comparing the recognition file with the transcription file. It also lowers the word error rate.
        Should I modify the dictionary (so compound words have hyphens), ignore the error, or modify the transcription file?

         
        • Nickolay V. Shmyrev

          Ideally one wants to create more intelligent algorithm for comparison. There could be many other cases, for example when compound word is recognized as two separate words, that should not be an error in word error rate calculation.

          On a large scale, it is not very critical issue.

           

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.