CMU Sphinx / Forums / Help: Unable to produce phonetic transcription for all utterances.

Speech Recognition Toolkit

Unable to produce phonetic transcription for all utterances.

Forum: Help

Creator: Panagiotis Antoniadis

Created: 2019-05-26

Updated: 2019-05-26

Panagiotis Antoniadis - 2019-05-26

I am trying to adapt a default acoustic model based on some data. Some words in my data are not in the dictionary and this causes a lot of problems.
When I run ./bw this happens for most sentences:

WARN: "mk_phone_list.c", line 178: Unable to lookup word 'Βασιλιάς.' in the dictionary
WARN: "next_utt_states.c", line 83: Unable to produce phonetic transcription for the utterance ' πρόσταξε ο Βασιλιάς. Τα γόνατα του γερο-φρούραρχου κόπηκαν κι έπεσε χάμω καθιστός. Δεύτερη φορά, από πάνω από τον πύργο, ο υπασπιστής εσήμανε το προσκλητήριο. '
WARN: "main.c", line 826: Skipped utterance ' πρόσταξε ο Βασιλιάς. Τα γόνατα του γερο-φρούραρχου κόπηκαν κι έπεσε χάμω καθιστός. Δεύτερη φορά, από πάνω από τον πύργο, ο υπασπιστής εσήμανε το προσκλητήριο. '
utt> 654 Paramythi_horis_onoma_0654 985 0 0 utt 0.000x 1.011e upd 0.000x 0.979e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.000x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e

So, adaptation works only if all words in the transcription are present in the dictionary. If yes, how can I resolve my problem?

Thanks in advance!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2019-05-26
  
  You need to lowercase everything and remove punctuation first.
  
  To extend the dictionary to cover new words you can use any g2p tool, see https://cmusphinx.github.io/wiki/tutorialdict.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Panagiotis Antoniadis - 2019-05-26
    
    Thank you! Should I remove all punctuation, including dots and commas?
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Nickolay V. Shmyrev - 2019-05-26
      
      Yes
      
      👍
      1
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Unable to produce phonetic transcription for all utterances.

Speech Recognition Toolkit

Forums

Help

Unable to produce phonetic transcription for all utterances. document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Unable to produce phonetic transcription for all utterances.