I tried to adapt en-us 8Khz model using arctic all data set.
Results of generated adapted model are really great for arctic all streams but
my problem is results for some of the other streams(different accent) for which results were good previously , they have now started giving very bad results compared to previous model.
What could be the reason?
Do I need to do something differently ?
Sorry, your data is not complete. There is no transcript file for cc1 and command contains errors like "en-us-8khz\mixture_weights-tmatfn" without space. You also didn't provide adaptation logs.
It's better to share the data in a single archive. You can use dropbox or google drive to share large file.
Overall map adaptation of continuous model requires significant amount of data like 30mins or 1 hour. For small data like 2 utterances it's better to use MLLR.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
So, say I watned to adapt the en-us AM like the tutorial does. Would this be the command to do that?
mllr_transform \
-cdonly yes
-ingaucntfn gauden_counts // this is one of the output files from bw
-inmeanfn en-us/means
-mllrmat mllr_matrix
-moddeffn en-us/mdef
-outgaucntfn // Do I need this one? Should I make a copy of the input gaussian counts so it can be overridden?
-outmeanfn // Is this a copy of the en-us/means file?
-varfn en-us/variances
Can you also explain when you would need the -inverse option?
Last edit: Jeff Acquaviva 2014-06-12
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Alright, I'm trying to follow the method you posted on your blog, but I'm getting a little confused. Here are the steps I have so far:
using en-us as an example of the original AM to be adapted
1. bw with the en-us
2. mllr_solve with en-us
3. mllr_transform with -in en-us -out en-us-adapt
4. bw again but with en-us-adapt as the model
Now here is wehre I'm confused
5. map_adapt: should the input and output both be en-us-adapt? or should the input be the original en-us, and the output be en-us-adapt?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm also having trouble running the mllr_transform command. You said to use the options -invarfn and -outvarfn, but I'm getting an error "Unknown argument name -invarfn". When I list help text for the command, neither the -invarfn nor -outvarfn options ar listed.
In the blog post, you said that we probably shouldn't update the variances. Can I leave these options out then?
Last edit: Jeff Acquaviva 2014-06-13
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
So, I'm still having issues correctly adapting the acoustic model.
When I test on the training set to see if adaption worked correctyly, I get near 0% WER (100% accuracy). However when testing on new data, the results are often worse than the unadapted model.
I know my test data set is poorly recorded (bush2007 - has too much reverb). My thought was to create an adapted model from data that mirrored the recording conditions of bush2007. I used bush2003 for this adaption. I used the mllr+map process you described earlier (see attached: create_adapt.sh for exact commands), however, these results were worse than the unadapted model.
Would you mind taking a look to see if I missed a command option, or failed to cahnge one from the default that would be better suited for my domain?
For reference, I have attached both the 2003 adaption set here and the 2007 test set here. My WER for the 2007 test set is 63%, 59%, 50%, 43% for 5, 10, 20, and 30 minutes of 2003 adaption data. I would also like to note that the 43% at 30 minutes is an increase over the baseline 46%.
I didn't fully look on your results in details, but mllr_transform step is certainly necessary.
Again, you need to work more on initial accuracy. 63% WER clearly demonstrates there are serious problems with decoding and unlikely you can fix them with adaptation.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm not sure how important 2007 set for you, but adaptation is unlikely to help for such a heavily reverberated data. It's just that model is too far from the original and the speech is corrupted in the way that the cross-frame dependencies are way more significant than what adaptation can handle. You need to rebuild a whole model dependency tree to account for reverberation at least, because dependency from a left phone is way more significant than in clean speech.
Thanks for your help.
I know the 2007 set is poorly recorded, but my goal was to see if I could get any imrovements in WER by AM adaption. I'm not looking to increse it by margin more than 2-3% absolute WER. I chose to adapt with the 2003 data because it had similar reverberation to 2007. My thought was that part of AM adaption learns the environment, so if I used adaption data with some reverberation, it should help with the 2007 test data. Is this correct?
Is it possible for MAP adaption to overfit the test data? When I tested the adapted model on the adaption set, I noticed near 0% WER. This surprised me because I didn't expect MAP to favor the adaption set means as much as it did. Even when I test on portions of the 2003 data that are not included in the adaption set, I get a worse word error rate. This is why I think I may be overfitting to the adaption set.
Another point of interest is my results using MLLR adaption. When I adapt en-us with bush2003 using MLLR I saw an increase from 46.5% to 41.5% in the 2007 data. This is closer to what I was expecting from MAP adaption.
Do you have an information that might explain what I'm seeing here?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
My thought was that part of AM adaption learns the environment, so if I used adaption data with some reverberation, it should help with the 2007 test data. Is this correct?
No, this is not fully correct. There are many environment changes which adaptation can not deal with easily.
Is it possible for MAP adaption to overfit the test data?
Yes
When I tested the adapted model on the adaption set, I noticed near 0% WER.
This is controlled by tau parameter of map_adapt as described in the blog post above. You can play with it.
MLLR I saw an increase from 46.5% to 41.5% in the 2007 data.
MLLR is more robust
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
No, this is not fully correct. There are many environment changes which adaptation can not deal with easily.
And I take it reverberation is one of them?
This is controlled by tau parameter of map_adapt as described in the blog post above. You can play with it.
In what direction would you recommend I go in terms of adjusting Tau? I know you say greater than 100, but from your experience, what values of Tau might be good to try? Also, is it even worth it if I have this much reverberation?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I tried to adapt en-us 8Khz model using arctic all data set.
Results of generated adapted model are really great for arctic all streams but
my problem is results for some of the other streams(different accent) for which results were good previously , they have now started giving very bad results compared to previous model.
What could be the reason?
Do I need to do something differently ?
I followed this tutorial http://cmusphinx.sourceforge.net/wiki/tutorialadapt
I'm using Sphinx3 with all phone decode mode.
Last edit: Avee 2014-06-05
To get help on this issue please provide the data you are using.
I used following commands
2.bw.exe -hmmdir en-us-8khz -moddeffn en-us-8khz/mdef -ts2cbfn .cont. -feat 1s_c_d_dd -lda feature_transform -cmn current -agc none -dictfn arcticAll.dic -ctlfn arcticAll.fileids -lsnfn arcticAll.transcription -accumdir .
3.map_adapt.exe -meanfn en-us-8khz\means -varfn en-us-8khz\variances -mixwfn en-us-8khz\mixture_weights-tmatfn en-us-8khz\transition_matrices -accumdir . -mapmeanfn en-us-8khzadapt\means -mapvarfn en-us-8khzadapt\variances -mapmixwfn en-us-8khzadapt\mixture_weights -maptmatfn en-us-8khzadapt\transition_matrices
Results for arctic files are really good , but for cc-01,cc-02 files with British accent , results are very bad which were good initially.
Wav files and bw output..
Last edit: Avee 2014-06-05
Adapted model.
Adapted model continued...
I used orignally en-us-8khz model listed @ http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/
For cc-01.wav
Correct transcription :
well here's a story for you sarah perry was a veterinary nurse
Using en-us-8khz
SIL M AY L SIL HH IH Z AH S T AY F ER Y UW SIL DH EH R AH P EH R IY AH M AH S IH Z IH T IH N IH T N ER S SIL
Using en-us-8khz adapted model
L AY L DH IY Y UW DH EH R OW P EY M EY S T
So, degraded badly with adapted model...
Last edit: Avee 2014-06-05
Any help please...
Sorry, your data is not complete. There is no transcript file for cc1 and command contains errors like "en-us-8khz\mixture_weights-tmatfn" without space. You also didn't provide adaptation logs.
It's better to share the data in a single archive. You can use dropbox or google drive to share large file.
Overall map adaptation of continuous model requires significant amount of data like 30mins or 1 hour. For small data like 2 utterances it's better to use MLLR.
How would you use MLLR with Sphinx4?
I see in the AM adaption tutorial, it says to copy
to the pocketsphinx commandline. If I wanted to use MLLR with sphinx 4, how do I add this mllr_matrix?
Ideally sphinx4 has to be modified to load mllr. Otherwise there is mllr_transform that can transform model before use.
So, say I watned to adapt the en-us AM like the tutorial does. Would this be the command to do that?
Can you also explain when you would need the -inverse option?
Last edit: Jeff Acquaviva 2014-06-12
Just
~~~~~~~~~~
mllr_transform \ -inmeanfn en-us/means
-invarfn en-us/variances
-mllrmat mllr_matrix
-outmeanfn en-us-adapt/means
-outvarfn en-us-adapt/variances
~~~~~~~~~~~~
mllr_transform just applies matrix to gaussians. -inverse applies inverse matrix, youd don't need it.
See also
http://nshmyrev.blogspot.de/2009/09/adaptation-methods.html
Wow, Thanks.
For the article to which you linked:
has map_adapt been fixed?
Also,
How do you determine what the best value is? Is this a trial by error sort of thing?
No
Yes
Alright, I'm trying to follow the method you posted on your blog, but I'm getting a little confused. Here are the steps I have so far:
using en-us as an example of the original AM to be adapted
1. bw with the en-us
2. mllr_solve with en-us
3. mllr_transform with -in en-us -out en-us-adapt
4. bw again but with en-us-adapt as the model
Now here is wehre I'm confused
5. map_adapt: should the input and output both be en-us-adapt? or should the input be the original en-us, and the output be en-us-adapt?
Input en-us-adapt output en-us-adapt-2
I'm also having trouble running the mllr_transform command. You said to use the options -invarfn and -outvarfn, but I'm getting an error "Unknown argument name -invarfn". When I list help text for the command, neither the -invarfn nor -outvarfn options ar listed.
In the blog post, you said that we probably shouldn't update the variances. Can I leave these options out then?
Last edit: Jeff Acquaviva 2014-06-13
Yes, you can leave the variance out and transform only means.
So, I'm still having issues correctly adapting the acoustic model.
When I test on the training set to see if adaption worked correctyly, I get near 0% WER (100% accuracy). However when testing on new data, the results are often worse than the unadapted model.
I know my test data set is poorly recorded (bush2007 - has too much reverb). My thought was to create an adapted model from data that mirrored the recording conditions of bush2007. I used bush2003 for this adaption. I used the mllr+map process you described earlier (see attached: create_adapt.sh for exact commands), however, these results were worse than the unadapted model.
Would you mind taking a look to see if I missed a command option, or failed to cahnge one from the default that would be better suited for my domain?
For reference, I have attached both the 2003 adaption set here and the 2007 test set here. My WER for the 2007 test set is 63%, 59%, 50%, 43% for 5, 10, 20, and 30 minutes of 2003 adaption data. I would also like to note that the 43% at 30 minutes is an increase over the baseline 46%.
again, thanks for your help.
Last edit: Jeff Acquaviva 2014-06-16
Hello Jeff
I didn't fully look on your results in details, but mllr_transform step is certainly necessary.
Again, you need to work more on initial accuracy. 63% WER clearly demonstrates there are serious problems with decoding and unlikely you can fix them with adaptation.
I'm not sure how important 2007 set for you, but adaptation is unlikely to help for such a heavily reverberated data. It's just that model is too far from the original and the speech is corrupted in the way that the cross-frame dependencies are way more significant than what adaptation can handle. You need to rebuild a whole model dependency tree to account for reverberation at least, because dependency from a left phone is way more significant than in clean speech.
There is a large and interesting research about reverberation, I'm not sure if you seen http://reverb2014.dereverberation.com/proceedings.html, but it contains a lot for information for future directions.
Thanks for your help.
I know the 2007 set is poorly recorded, but my goal was to see if I could get any imrovements in WER by AM adaption. I'm not looking to increse it by margin more than 2-3% absolute WER. I chose to adapt with the 2003 data because it had similar reverberation to 2007. My thought was that part of AM adaption learns the environment, so if I used adaption data with some reverberation, it should help with the 2007 test data. Is this correct?
Is it possible for MAP adaption to overfit the test data? When I tested the adapted model on the adaption set, I noticed near 0% WER. This surprised me because I didn't expect MAP to favor the adaption set means as much as it did. Even when I test on portions of the 2003 data that are not included in the adaption set, I get a worse word error rate. This is why I think I may be overfitting to the adaption set.
Another point of interest is my results using MLLR adaption. When I adapt en-us with bush2003 using MLLR I saw an increase from 46.5% to 41.5% in the 2007 data. This is closer to what I was expecting from MAP adaption.
Do you have an information that might explain what I'm seeing here?
No, this is not fully correct. There are many environment changes which adaptation can not deal with easily.
Yes
This is controlled by tau parameter of map_adapt as described in the blog post above. You can play with it.
MLLR is more robust
And I take it reverberation is one of them?
In what direction would you recommend I go in terms of adjusting Tau? I know you say greater than 100, but from your experience, what values of Tau might be good to try? Also, is it even worth it if I have this much reverberation?