Menu

Unable to produce phonetic transcription for all utterances.

Help
2019-05-26
2019-05-26
  • Panagiotis Antoniadis

    I am trying to adapt a default acoustic model based on some data. Some words in my data are not in the dictionary and this causes a lot of problems.
    When I run ./bw this happens for most sentences:

    WARN: "mk_phone_list.c", line 178: Unable to lookup word 'Βασιλιάς.' in the dictionary
    WARN: "next_utt_states.c", line 83: Unable to produce phonetic transcription for the utterance ' πρόσταξε ο Βασιλιάς. Τα γόνατα του γερο-φρούραρχου κόπηκαν κι έπεσε χάμω καθιστός. Δεύτερη φορά, από πάνω από τον πύργο, ο υπασπιστής εσήμανε το προσκλητήριο. '
    WARN: "main.c", line 826: Skipped utterance ' πρόσταξε ο Βασιλιάς. Τα γόνατα του γερο-φρούραρχου κόπηκαν κι έπεσε χάμω καθιστός. Δεύτερη φορά, από πάνω από τον πύργο, ο υπασπιστής εσήμανε το προσκλητήριο. '
    utt> 654 Paramythi_horis_onoma_0654 985 0 0 utt 0.000x 1.011e upd 0.000x 0.979e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.000x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e

    So, adaptation works only if all words in the transcription are present in the dictionary. If yes, how can I resolve my problem?

    Thanks in advance!

     
    • Nickolay V. Shmyrev

      You need to lowercase everything and remove punctuation first.

      To extend the dictionary to cover new words you can use any g2p tool, see https://cmusphinx.github.io/wiki/tutorialdict.

       
      • Panagiotis Antoniadis

        Thank you! Should I remove all punctuation, including dots and commas?

         
        • Nickolay V. Shmyrev

          Yes

           
          👍
          1

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.