CMU Sphinx / Forums / Help: bw problem in cd untied training

viviane - 2007-11-22

I am using the scripts provided with SphinxTrain to do training. I did the ci training already and it went well. Now, I'm trying to do the cd untied training, but I'm getting some errors in baum welch iterations. I get messages like this:
WARNING: "cvt2triphone.c", line 267: utt does not end with filler phone
220 31 ERROR: "backward.c", line 401: final state not reached
ERROR: "baum_welch.c", line 331: M010802 ignored

I can't understand well the messages and I don't know why some of the utterances in the training corpus are being ignored, once this error didn't happened in the ci training.

Could someone help me? I would really appreciate it...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- viviane - 2007-12-06
  
  Ok... I think maybe there is a problem with my transcriptions. How do I make sure they are correct?
  Do I have to do force alignement before the training?
  
  Thanks for the help...
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Nickolay V. Shmyrev - 2007-12-06
    
    Exactly, run force alignment and add -phlabdir argument to sphinx3_align in force_align.pl. That will generate
    
    -phlabdir => "lab", -phsegdir => "lab",
    
    you a state alignment dump. Then look for acoustic score, bad score (very negative) can give you a hint about what transcription is wrong.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2007-11-23
  
  Hm, of course if you are trainind I suspect you must know what Baum-Welch is and what is final state. Please refer to any book on ASR for details
  
  This error means that your transcription has mistakes and aligner can't properly align it with senone models. It means you either have error in .transcription file, recording or dictionary. Please check this utterance once again. cd decoding is sometimes more precise than ci one, so there is nothing strange it's more restrictive.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

bw problem in cd untied training

Speech Recognition Toolkit

Forums

Help

bw problem in cd untied training document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

bw problem in cd untied training