Menu

speaker adaptation

aleks
2011-08-25
2012-09-22
  • aleks

    aleks - 2011-08-25

    hello,

    we used sphinxtrain to adapt a language model to a singel speaker(in german
    language) and got pretty good results(up to 75% correct sentences). But
    obviously one third of all mistakes happen in the very first or last word of a
    sentence. Is there any known reason for this effect and is there a possibility
    to reduce this mistakes ?

     
  • aleks

    aleks - 2011-08-25

    ...mixed up words: we trained the accoustic model.

     
  • Nickolay V. Shmyrev

    Hello

    Please use help forum to ask for help

    As for boundaries, it's mandatory to have silence around utterance during
    training and during adaptation as well as use force alignment to detect
    silences inside the utterances. Missing silence could cause bad effects.

    The nature of speech recognition errors could be very different and the
    debugging process is actually complex and undocumented. First you need to
    separate various aspects to support your hypothesis that utterance boundary is
    problematic. You need to decode with very wide beams in order to understand if
    pruning is the reason of failures. Then you need to try unigram langauge model
    in order to find out if langauge model has any effect on speech recognition
    errors. If the issue is an acoustic model it might be worth to try phonetic
    recognizer accuracy to find out which senones were not correctly trained.

     

Log in to post a comment.