About adapting data based on original model

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

About adapting data based on original model

Forum: Help

Creator: stevenyslin

Created: 2016-06-15

Updated: 2016-07-25

stevenyslin - 2016-06-15

Hello,

According to adapting the default acoustic model(http://cmusphinx.sourceforge.net/wiki/tutorialadapt),
I have some questions to ask:

Can we use less data(about 20 sentences) to affect the original model(en-us)?
Such as the adaptation data have higher weight(ex：70%), and en-us model have lower weight(ex：30%).

Thanks for your help

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-06-15
  
  Can we use less data(about 20 sentences) to affect the original model(en-us)?
  
  MLLR adaptation is possible with 20-30 seconds of data, MAP adaptaiton usually requires more data.
  
  Such as the adaptation data have higher weight(ex：70%), and en-us model have lower weight(ex：30%).
  
  MLLR adaptation does not weight old model, it uses new data exclusively. MAP adaptation has tau parameter which controls weight of adaptation data. You can try with -bayesmean no -tau 100 for example.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

stevenyslin - 2016-06-17

Hi Nickolay,
Thank you for your response,

I follow your suggestion, but it seems like to need other parameter that the accuracy will be change,

-fixedtau yes

So my new command is as follows：

$ ./map_adapt -moddeffn tdt_sc_8k/mdef.txt -ts2cbfn .semi. -meanfn tdt_sc_8k/means -varfn tdt_sc_8k/variances -mixwfn tdt_sc_8k/mixture_weights -tmatfn tdt_sc_8k/transition_matrices -accumdir . -mapmeanfn tdt_sc_8k_tau_100/means -mapvarfn tdt_sc_8k_tau_100/variances -mapmixwfn tdt_sc_8k_tau_100/mixture_weights -maptmatfn tdt_sc_8k_tau_100/transition_matrices -bayesmean no -tau 100 -fixedtau yes

According to my observation, when tau was set 75, the accuracy will be the best,
is that mean adaptation data have 75% weight, and tdt_sc_8k model have 25% weight ?

Thanks for your help again.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-06-17
  
  is that mean adaptation data have 75% weight, and tdt_sc_8k model have 25% weight ?
  
  75 is not a percentage but a weight, if you increase it the impact of the adaptation data decreases. You can find exact formulae here:
  
  http://www1.icsi.berkeley.edu/Speech/docs/HTKBook3.2/node134_mn.html
  
  percentage depends on observed counts beside tau.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

stevenyslin - 2016-07-25

Dear sir,
Because the problem is same as this topic, so I ask in this topic, thank you.

My question：
Does the adapting data also need silence not exceed 0.2 second in the beginning of the utterance and in the end of the utterance ?

Thanks for your answer.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-07-25
  
  Does the adapting data also need silence not exceed 0.2 second in the beginning of the utterance and in the end of the utterance ?
  
  Yes
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Emilio Rueda - 2016-07-25

Hi Nickolay,

But .. Is not possible for the decoder interprets a final sample of silence with a duration of 0.5 milliseconds ( or even more) as < /s > ???

thanks!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-07-25
  
  I do not understand your question
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.