speaker adaptation

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

speaker adaptation

Forum: Speech Recognition Theory

Created: 2011-08-25

Updated: 2012-09-22

aleks - 2011-08-25

hello,

we used sphinxtrain to adapt a language model to a singel speaker(in german
language) and got pretty good results(up to 75% correct sentences). But
obviously one third of all mistakes happen in the very first or last word of a
sentence. Is there any known reason for this effect and is there a possibility
to reduce this mistakes ?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

aleks - 2011-08-25

...mixed up words: we trained the accoustic model.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-08-25

Hello

Please use help forum to ask for help

As for boundaries, it's mandatory to have silence around utterance during
training and during adaptation as well as use force alignment to detect
silences inside the utterances. Missing silence could cause bad effects.

The nature of speech recognition errors could be very different and the
debugging process is actually complex and undocumented. First you need to
separate various aspects to support your hypothesis that utterance boundary is
problematic. You need to decode with very wide beams in order to understand if
pruning is the reason of failures. Then you need to try unigram langauge model
in order to find out if langauge model has any effect on speech recognition
errors. If the issue is an acoustic model it might be worth to try phonetic
recognizer accuracy to find out which senones were not correctly trained.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.