Menu

Poor results while Adapting the acoustic model

Help
Avee
2014-06-04
2014-06-19
  • Avee

    Avee - 2014-06-04

    Hi,

    I tried to adapt en-us 8Khz model using arctic all data set.
    Results of generated adapted model are really great for arctic all streams but
    my problem is results for some of the other streams(different accent) for which results were good previously , they have now started giving very bad results compared to previous model.

    What could be the reason?
    Do I need to do something differently ?

    I followed this tutorial http://cmusphinx.sourceforge.net/wiki/tutorialadapt

    I'm using Sphinx3 with all phone decode mode.

     

    Last edit: Avee 2014-06-05
    • Nickolay V. Shmyrev

      To get help on this issue please provide the data you are using.

       
  • Avee

    Avee - 2014-06-05

    I used following commands

    1. sphinx_fe.exe -argfile "en-us-8khz\feat.params" -samprate 8000 -c arcticAll.fileids -di . -do . -ei wav -eo mfc -mswav yes

    2.bw.exe -hmmdir en-us-8khz -moddeffn en-us-8khz/mdef -ts2cbfn .cont. -feat 1s_c_d_dd -lda feature_transform -cmn current -agc none -dictfn arcticAll.dic -ctlfn arcticAll.fileids -lsnfn arcticAll.transcription -accumdir .

    3.map_adapt.exe -meanfn en-us-8khz\means -varfn en-us-8khz\variances -mixwfn en-us-8khz\mixture_weights-tmatfn en-us-8khz\transition_matrices -accumdir . -mapmeanfn en-us-8khzadapt\means -mapvarfn en-us-8khzadapt\variances -mapmixwfn en-us-8khzadapt\mixture_weights -maptmatfn en-us-8khzadapt\transition_matrices

    Results for arctic files are really good , but for cc-01,cc-02 files with British accent , results are very bad which were good initially.

     
  • Avee

    Avee - 2014-06-05

    Wav files and bw output..

     

    Last edit: Avee 2014-06-05
  • Avee

    Avee - 2014-06-05

    Adapted model.

     
  • Avee

    Avee - 2014-06-05

    Adapted model continued...

     
  • Avee

    Avee - 2014-06-05

    I used orignally en-us-8khz model listed @ http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/

    For cc-01.wav

    Correct transcription :
    well here's a story for you sarah perry was a veterinary nurse

    Using en-us-8khz

    SIL M AY L SIL HH IH Z AH S T AY F ER Y UW SIL DH EH R AH P EH R IY AH M AH S IH Z IH T IH N IH T N ER S SIL

    Using en-us-8khz adapted model
    L AY L DH IY Y UW DH EH R OW P EY M EY S T

    So, degraded badly with adapted model...

     

    Last edit: Avee 2014-06-05
  • Avee

    Avee - 2014-06-12

    Any help please...

     
    • Nickolay V. Shmyrev

      Sorry, your data is not complete. There is no transcript file for cc1 and command contains errors like "en-us-8khz\mixture_weights-tmatfn" without space. You also didn't provide adaptation logs.

      It's better to share the data in a single archive. You can use dropbox or google drive to share large file.

      Overall map adaptation of continuous model requires significant amount of data like 30mins or 1 hour. For small data like 2 utterances it's better to use MLLR.

       
      • Jeff Acquaviva

        Jeff Acquaviva - 2014-06-12

        How would you use MLLR with Sphinx4?
        I see in the AM adaption tutorial, it says to copy

        -mllr mllr_matrix

        to the pocketsphinx commandline. If I wanted to use MLLR with sphinx 4, how do I add this mllr_matrix?

         
        • Nickolay V. Shmyrev

          If I wanted to use MLLR with sphinx 4, how do I add this mllr_matrix?

          Ideally sphinx4 has to be modified to load mllr. Otherwise there is mllr_transform that can transform model before use.

           
          • Jeff Acquaviva

            Jeff Acquaviva - 2014-06-12

            So, say I watned to adapt the en-us AM like the tutorial does. Would this be the command to do that?

            mllr_transform \
            -cdonly yes
            -ingaucntfn gauden_counts // this is one of the output files from bw
            -inmeanfn en-us/means
            -mllrmat mllr_matrix
            -moddeffn en-us/mdef
            -outgaucntfn // Do I need this one? Should I make a copy of the input gaussian counts so it can be overridden?
            -outmeanfn // Is this a copy of the en-us/means file?
            -varfn en-us/variances

            Can you also explain when you would need the -inverse option?

             

            Last edit: Jeff Acquaviva 2014-06-12
            • Nickolay V. Shmyrev

              Just

              ~~~~~~~~~~
              mllr_transform \ -inmeanfn en-us/means
              -invarfn en-us/variances
              -mllrmat mllr_matrix
              -outmeanfn en-us-adapt/means
              -outvarfn en-us-adapt/variances
              ~~~~~~~~~~~~

              mllr_transform just applies matrix to gaussians. -inverse applies inverse matrix, youd don't need it.

              See also

              http://nshmyrev.blogspot.de/2009/09/adaptation-methods.html

               
              • Jeff Acquaviva

                Jeff Acquaviva - 2014-06-12

                Wow, Thanks.

                For the article to which you linked:

                automatic tau selection is broken in map_adapt

                has map_adapt been fixed?

                Also,

                greater than 100 (try to select the best value).

                How do you determine what the best value is? Is this a trial by error sort of thing?

                 
                • Nickolay V. Shmyrev

                  has map_adapt been fixed?

                  No

                  How do you determine what the best value is? Is this a trial by error sort of thing?

                  Yes

                   
              • Jeff Acquaviva

                Jeff Acquaviva - 2014-06-13

                Alright, I'm trying to follow the method you posted on your blog, but I'm getting a little confused. Here are the steps I have so far:
                using en-us as an example of the original AM to be adapted
                1. bw with the en-us
                2. mllr_solve with en-us
                3. mllr_transform with -in en-us -out en-us-adapt
                4. bw again but with en-us-adapt as the model
                Now here is wehre I'm confused
                5. map_adapt: should the input and output both be en-us-adapt? or should the input be the original en-us, and the output be en-us-adapt?

                 
  • Nickolay V. Shmyrev

    map_adapt: should the input and output both be en-us-adapt?

    Input en-us-adapt output en-us-adapt-2

     
  • Jeff Acquaviva

    Jeff Acquaviva - 2014-06-13

    I'm also having trouble running the mllr_transform command. You said to use the options -invarfn and -outvarfn, but I'm getting an error "Unknown argument name -invarfn". When I list help text for the command, neither the -invarfn nor -outvarfn options ar listed.

    In the blog post, you said that we probably shouldn't update the variances. Can I leave these options out then?

     

    Last edit: Jeff Acquaviva 2014-06-13
    • Nickolay V. Shmyrev

      Yes, you can leave the variance out and transform only means.

       
  • Jeff Acquaviva

    Jeff Acquaviva - 2014-06-16

    So, I'm still having issues correctly adapting the acoustic model.
    When I test on the training set to see if adaption worked correctyly, I get near 0% WER (100% accuracy). However when testing on new data, the results are often worse than the unadapted model.
    I know my test data set is poorly recorded (bush2007 - has too much reverb). My thought was to create an adapted model from data that mirrored the recording conditions of bush2007. I used bush2003 for this adaption. I used the mllr+map process you described earlier (see attached: create_adapt.sh for exact commands), however, these results were worse than the unadapted model.

    Would you mind taking a look to see if I missed a command option, or failed to cahnge one from the default that would be better suited for my domain?

    For reference, I have attached both the 2003 adaption set here and the 2007 test set here. My WER for the 2007 test set is 63%, 59%, 50%, 43% for 5, 10, 20, and 30 minutes of 2003 adaption data. I would also like to note that the 43% at 30 minutes is an increase over the baseline 46%.

    again, thanks for your help.

     

    Last edit: Jeff Acquaviva 2014-06-16
    • Nickolay V. Shmyrev

      Hello Jeff

      I didn't fully look on your results in details, but mllr_transform step is certainly necessary.

      Again, you need to work more on initial accuracy. 63% WER clearly demonstrates there are serious problems with decoding and unlikely you can fix them with adaptation.

       
      • Nickolay V. Shmyrev

        I'm not sure how important 2007 set for you, but adaptation is unlikely to help for such a heavily reverberated data. It's just that model is too far from the original and the speech is corrupted in the way that the cross-frame dependencies are way more significant than what adaptation can handle. You need to rebuild a whole model dependency tree to account for reverberation at least, because dependency from a left phone is way more significant than in clean speech.

        There is a large and interesting research about reverberation, I'm not sure if you seen http://reverb2014.dereverberation.com/proceedings.html, but it contains a lot for information for future directions.

         
  • Jeff Acquaviva

    Jeff Acquaviva - 2014-06-19

    Thanks for your help.
    I know the 2007 set is poorly recorded, but my goal was to see if I could get any imrovements in WER by AM adaption. I'm not looking to increse it by margin more than 2-3% absolute WER. I chose to adapt with the 2003 data because it had similar reverberation to 2007. My thought was that part of AM adaption learns the environment, so if I used adaption data with some reverberation, it should help with the 2007 test data. Is this correct?

    Is it possible for MAP adaption to overfit the test data? When I tested the adapted model on the adaption set, I noticed near 0% WER. This surprised me because I didn't expect MAP to favor the adaption set means as much as it did. Even when I test on portions of the 2003 data that are not included in the adaption set, I get a worse word error rate. This is why I think I may be overfitting to the adaption set.

    Another point of interest is my results using MLLR adaption. When I adapt en-us with bush2003 using MLLR I saw an increase from 46.5% to 41.5% in the 2007 data. This is closer to what I was expecting from MAP adaption.

    Do you have an information that might explain what I'm seeing here?

     
  • Nickolay V. Shmyrev

    My thought was that part of AM adaption learns the environment, so if I used adaption data with some reverberation, it should help with the 2007 test data. Is this correct?

    No, this is not fully correct. There are many environment changes which adaptation can not deal with easily.

    Is it possible for MAP adaption to overfit the test data?

    Yes

    When I tested the adapted model on the adaption set, I noticed near 0% WER.

    This is controlled by tau parameter of map_adapt as described in the blog post above. You can play with it.

    MLLR I saw an increase from 46.5% to 41.5% in the 2007 data.

    MLLR is more robust

     
  • Jeff Acquaviva

    Jeff Acquaviva - 2014-06-19

    No, this is not fully correct. There are many environment changes which adaptation can not deal with easily.

    And I take it reverberation is one of them?

    This is controlled by tau parameter of map_adapt as described in the blog post above. You can play with it.

    In what direction would you recommend I go in terms of adjusting Tau? I know you say greater than 100, but from your experience, what values of Tau might be good to try? Also, is it even worth it if I have this much reverberation?

     

Log in to post a comment.