Menu

question about sphinxtrain

Help
2012-01-12
2012-09-22
  • vijayabharadwaj gsr

    Dear Sir,

    I am working on Speech Recognition from videos. I have trained 75 speakers. It
    trained successfully. There are only 93 errors in final baumwelch iteration.
    That is out of 1400 wav files 93 files are ignored.

    I tried to add one more speaker data of about 1 hour for training. This
    speakers voice is very feeble. Now there are 1540 files. But in the baum welch
    iterations, many of them are ignored i.e. 1400 files. Final, iteration has
    1330 errors. What could be the reason.

    The transcription of this speaker is created carefull. It is 99% accurate. Is
    the problem because voice characteristics because of having very very feeble
    audio. I have given the sample files of audio.

    http://www.4shared.com/archive/rOMzMnMj/exampletar.html

     
  • Pranav Jawale

    Pranav Jawale - 2012-01-12

    But when you use CMN, amplitude shouldn't affect much I guess.

    What do you mean by ignored wav files? It may be ignored in a previous
    iteration, but used probably in next iterations. You can try training with
    force-alignment, to see if there is problem with audio.

     
  • Nickolay V. Shmyrev

    It might be that silences inside files you have are too big to properly
    converge the baum-welch. The initial estimation gets wrong and whole training
    process is wrong after that.

    You need to use smaller small-utterance audios for model bootstrapping or you
    just can resplit the whole of your data on utterances. Each utterance
    shouldn't have significant amount of silence inside. The silences on the
    boundaries shouldn't be more than 0.5 secs. You can use long audio aligner
    branch for that.

    See recent very similar discussion on this subject.

    https://sourceforge.net/projects/cmusphinx/forums/forum/5471/topic/4970030

     

Log in to post a comment.