CMU Sphinx / Forums / Help: Unable to lookup word , unable to produce phonetic transcription

Joy Jen - 2019-03-04

In my speech recognition project, I got warnings of "Unable to lookup word".

WARN: "mk_phone_list.c", line 178: Unable to lookup word 'skitters' in the dictionary WARN: "next_utt_states.c", line 83: Unable to produce phonetic transcription for the utterance '<s> in the same way the line skitters along at a low level for millennia then rockets up exponentially in the 19th and 20th century </s>' WARN: "main.c", line 826: Skipped utterance '<s> in the same way the line skitters along at a low level for millennia then rockets up exponentially in the 19th and 20th century </s>' WARN: "mk_phone_list.c", line 178: Unable to lookup word 'crown-of-thorns' in the dictionary WARN: "next_utt_states.c", line 83: Unable to produce phonetic transcription for the utterance '<s> crown-of-thorns starfish in the indian ocean zebra mussels in the great lakes spruce budworm here in canada </s>'

First, is there a way to fix the errors besides finding the out of vocabulary word and adding it back to the dictionary?
If there are smoothing in language models, can we do a similar thing when we are adapting the acoustics model? If not, why not?
Why if we miss one word, the recognizer is unable to produce phonetic transcription for the whole sentence?
Second, how to deal with hyphens? They might be essential for compound words.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2019-03-04
  
  First, is there a way to fix the errors besides finding the out of vocabulary word and adding it back to the dictionary?
  
  No.
  
  If there are smoothing in language models, can we do a similar thing when we are adapting the acoustics model? If not, why not?
  
  MAP adapatation is similar to language model interpolation.
  
  Why if we miss one word, the recognizer is unable to produce phonetic transcription for the whole sentence?
  
  Thats reasonable if you think about it.
  
  Second, how to deal with hyphens? They might be essential for compound words.
  
  Leave them as is.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Joy Jen - 2019-03-04
    
    I noticed that in the CMU US English Dictionary, many compound words do not have hyphens. So it causes problems when I am comparing the recognition file with the transcription file. It also lowers the word error rate.
    Should I modify the dictionary (so compound words have hyphens), ignore the error, or modify the transcription file?
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Nickolay V. Shmyrev - 2019-03-04
      
      Ideally one wants to create more intelligent algorithm for comparison. There could be many other cases, for example when compound word is recognized as two separate words, that should not be an error in word error rate calculation.
      
      On a large scale, it is not very critical issue.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Unable to lookup word , unable to produce phonetic transcription | why no...

Speech Recognition Toolkit

Forums

Help

Unable to lookup word , unable to produce phonetic transcription | why no smoothing?

Unable to lookup word , unable to produce phonetic transcription | why no...

Speech Recognition Toolkit

Forums

Help

Unable to lookup word , unable to produce phonetic transcription | why no smoothing? document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Unable to lookup word , unable to produce phonetic transcription | why no smoothing?