Menu

Pocketsphinx_continuous result don't match the Error rate after training

Help
Toine db
2014-11-21
2015-10-26
1 2 > >> (Page 1 of 2)
  • Toine db

    Toine db - 2014-11-21

    I did a training with a result of 15% and 25% error rate.

    Now I'm testing with some live sounds/samples with Pocketsphinx_continuous and almost none of the results match????

    Any suggestion where to look at??

    PS: I'm using the two commands:

    pocketsphinx_continuous -dict model_parameters/neh.dic -jsgf model_parameters/neh.jsgf -hmm model_parameters/neh.ci_cont -inmic yes

    pocketsphinx_continuous -infile testwavs/neh1.wav -dict model_parameters/neh.dic -jsgf model_parameters/neh.jsgf -hmm model_parameters/neh.ci_cont

     
    • Nickolay V. Shmyrev

      You can share your model training folder, the test sample and other required data files to get help on this issue.

       
      • Toine db

        Toine db - 2014-11-24

        Thanks for the support.

        It would be realy great if you can find something, because this is realy an Go-NoGo moment for my project.

        It seems to work on the 'decoding' test after the training, but not in the continues test when reading the same .wav files de decode succeeded with.

        Here is my Training folder and Testing folder (with pocketshinx compiled for Win32)
        https://onedrive.live.com/redir?resid=53DF68CA92747BA6%2173460
        (the 'Testing' folder contains a few .wav files for each word, that decoded with succes after the training)

        I used the following command to test:

        pocketsphinx_continuous -infile testwavs/nehX.wav -dict model_parameters/neh.dic -jsgf model_parameters/neh.jsgf -hmm model_parameters/neh.ci_cont

        My goal is to only detect these words, zo without any gramatics or other language things.

        I realy hope you can help me.

        PS: (The training is not optimized at all yet, there are a few to small wav files, and I still need to tweak the training for a better output)

         

        Last edit: Toine db 2014-11-24
      • Toine db

        Toine db - 2014-12-01

        Sorry to ask you, don't want to put any pressure on you, but have you found time to look at the model I want to train?
        (the good results decode gives and the bad results pocketsphinx_continues gives)

         
        • Nickolay V. Shmyrev

          Sorry for delay, add the following option in model feat.params:

               -cmninit 65,-1,-35,-10,-5,-24,8,-8,-21,-12,-32,-21,-29
          

          It should work fine after that. Default cmninit option is not very accurate.

           
          • Nickolay V. Shmyrev

            Since your sounds are very short, it might be helpful to train with -cmn none (in sphinx_train.cfg).

             
          • Toine db

            Toine db - 2014-12-03

            No problem, I'm glad you would take a look at it.
            And very very glad if it realy works :-)

            I will try it as soon as I can.

            PS: is there a way I can understand what the values mean after -cmninit? and what cmninit does? (maybe I can learn to do it myself)

             
            • Nickolay V. Shmyrev

              PS: is there a way I can understand what the values mean after -cmninit?

              CMN is an estimation of channel properties, basically volumes of the sound in each frequency band

              and what cmninit does? (maybe I can learn to do it myself)

              cmninit parameter sets initial channel estimation so it can be accurate from the start otherwise it takes decoder several seconds to update channel estimation

              You can get more information about CMN and feature extraction in a textbook on speech recognition.

               
              • Toine db

                Toine db - 2014-12-05

                Wouw, I couldn't wait to tell but I'm stull half way in testing.....

                But it works!!!

                After adding -cmninit the recognition started to work, but not so good.
                After changing $CFG_CMN to 'none'; I almost get exactly the same result as in de decode after the training! (with pocketsphinx_contin...)

                And as an additional improvement, the Error rate went from 25% to 10%...

                You made me very happy!

                Now hoppping that the pocketsphinx in my code (Windows Phone) gives the same result!
                to be continued....

                 
              • Toine db

                Toine db - 2014-12-07

                OK, finished testing (for now).

                And pocketsphinx_continuous resulted in great result, with recorded sounds almost te same results as decode after the training. (lets say 95% te same)

                Still strugeling to get the same results on from live microphone data.

                Any tips where to tweak/test with settings?

                 
                • Nickolay V. Shmyrev

                  Hello Toine

                  Your questions would be more productive if you provide the files (models, audio file you are trying to decode, command line, reference). To emulate live recognition you can recognize continuous recording in audio file with pocketsphinx_continuous.

                   
                  • Toine db

                    Toine db - 2014-12-08

                    Tnx for the reply and your offer to help.
                    I first want to try it myself, I'm already asking a lot of questions...

                    The thing is, I'm at a stage the concept works but the error rate of a real implementation needs to go down. (mainly because its much higher then the error rate with the recorded sounds).

                    At this moment pocketsphinx_continuous gets a great error rate of +- 15% with the recorded sounds. But when I do pocketsphinx_continuous with the microphone data, playing the same recorded sounds, the error rate is much more (lets say 50%).

                    I was wondering if this kind of behaviour is found more often by other users?

                     
                  • Toine db

                    Toine db - 2014-12-12

                    Hi Nickolay,

                    I'm still struggling with getting good Microphone results.
                    (recorded .wav result are already great; 85% good | 15% error)

                    Because you asked, but the training is OK now I think.
                    Here is my Training folder and Testing folder (with pocketshinx compiled for Win32)
                    https://onedrive.live.com/redir?resid=53DF68CA92747BA6%2179892
                    (the 'Testing' folder contains a few .wav files for each word, that decoded with succes after the training)

                    THe Config I load is:
                    -lowerf 130 \ -upperf 6800 \ -nfilt 25 \ -transform dct \ -lifter 22 \ -feat 1s_c_d_dd \ -agc none \ -cmn none \ -varnorm no
                    -cmninit 65,-1,-35,-10,-5,-24,8,-8,-21,-12,-32,-21,-29
                    extra for the phone:
                    -kws_threshold "1e-40"

                    Above config works great with .wav files, but not so good with live Microphone data.

                    Hope you can help me.

                    PS: In real live the key is to recognize "the first vowel in the word", maybe pocketsphinx/training can be tweaked to recognize that?

                     
                    • Nickolay V. Shmyrev

                      Hi Toine

                      Can you collect raw data from microphone with -rawlogdir? Maybe it's different a bit.

                       
                      • Toine db

                        Toine db - 2014-12-20

                        I recorded 4 situations that most of the happens.
                        https://onedrive.live.com/redir?resid=53DF68CA92747BA6%2182258

                        Maybe you can see/hear what is going wrong or/and hopefuly you can give me some advise how to correct it.

                        Hope to hear from you.

                        OH, and maybe you noticed (or not) But I made the PocketSphinx demo in github work on Windows 8 apps as well (besides Windows Phone 8 apps).

                        PS: the key in the words I'm trying to distinguish is the first letter/vowel (N | E | O | EA | H ). Maybe that can help tweaking something?

                         
                      • Toine db

                        Toine db - 2014-12-25

                        Hi Nickolay,

                        First of all, happy Holiday.

                        I was wondering if you found some time to check the raw recordings?
                        (and is there a (doable) way to convert them in .wav files so I can hear and user them?)

                        Again, merry christmas and happy new year.

                        Regards, Toine

                         
                        • Nickolay V. Shmyrev

                          Hi Toine

                          Sorry, its a holiday time so not much time for work.

                          You can open raw files in wavesurfer or audacity (just select sample rate 16khz). You can also convert them to wav with sox:

                           sox -r 16000 -s -2 file.raw file.wav
                          

                          Merry Christmas and Happy New Year for you as well!

                           
                          • Toine db

                            Toine db - 2014-12-25

                            Thanks for the info.

                            Of course, no work but holiday time :-)

                            I'm still hoping you could take a look/check in to my Raw audio files (what I could be doing wrong) after the holiday time of course.

                            Kind Regards,
                            Toine

                             

                            Last edit: Toine db 2014-12-25
                          • Toine db

                            Toine db - 2015-01-12

                            Hi Nickolay,

                            Hope you had a nice holiday.

                            I have taken a look at the raw recordings, but they sound the same in my ears.

                            Is it possible you can take a look, maybe you can see whats the main issue.

                            PS: the key in the words I'm trying to distinguish is the first letter/vowel (N | E | O | EA | H ). Maybe that can help tweaking something?

                             
                            • Nickolay V. Shmyrev

                              Hi Toine

                              I checked the data you provided, thanks for that. Here are my thoughts:

                              1) You should separate test set and train set. Currently your train set includes your test set and for that reason you get wrong idea about accuracy. Currently accuracy is about 40-30%, not 15. More work on this is required, for example, you might adjust features to shift them to higher frequencies of child voice.

                              2) More data would help too

                              3) You can enable MGAU training

                              $CFG_CI_MGAU = 'yes';
                              $CFG_FINAL_NUM_DENSITIES = 4;

                              4) You can optimize lw parameter. With -lw 1.0 I get best results.

                              5) You need to provide initial CMN estimation in feat.params file. In that case you will get more reliable recognition in continuous mode. In the trained add to the feat.params the following line:

                                -cmninit 62,-44,-26,18,-0.98,22,-20,-15,16,-10,-11,13,-2
                              

                              See our en-us model how it is configured. Then your online samples will recognized correctly.

                              6) As for your idea about first vowel, you do not need any special treatment, HMM framework should get it right. However, please note that it's not just a simple sound but a whole recording which makes the difference. The most distinctive factor in such sounds is the movement of formants, they define the distinction and those formats are changed across all the sound, not just in the beginning.

                               

                              Last edit: Nickolay V. Shmyrev 2015-01-29
                              • Nickolay V. Shmyrev

                                Also, I see I suggested you to use none for CMN, that is indeed a good idea, but in your training setup I don't see you are using none, you are using current.

                                So the proposed changes are:

                                1) Use CMN none
                                2) Use 4 gaussians for HMM instead of 1
                                3) Use -lw 1

                                Then my WER goes to 20% only.

                                Another idea is to remove final HH phone from the dictionary, I think it is not really physically present. You need to consider how many distinct regions are present in your data and design HMM based on that.

                                ~~~~~~~~
                                neh N_neh E_neh
                                eh E_eh
                                heh H_heh E_heh
                                owh O_owh W_owh
                                eairh E_eairh A_eairh I_eairh R_eairh
                                ~~~~~~~~~~

                                That gives a bit more accuracy.

                                 
                                • Nickolay V. Shmyrev

                                  Another thing related to CMN. I noticed that your training db amplitude is about 7000-8000 while in raw files recordings are more quiet (1000-2000). That means you will get a significant mismatch without CMN and even CMN will not help a lot. I suggest you to normalize recordings to match audio level of training set.

                                  Ideally you need a good recording level normalizer, probably we need to improve AGC or implement very quick CMN.

                                   
                                  • Toine db

                                    Toine db - 2015-09-16

                                    Nickolay,

                                    I totaly missed these replies, somehow I don't get messages from Sourceforge..... if there are any.

                                    I will look at yout suggestions as soon as possible, they look promissing.

                                     
                                  • Toine db

                                    Toine db - 2015-10-24

                                    Hi Again Nickolay,

                                    This thread is realy helpfull for me, many thanks for that and your support.

                                    I have been away for a litle while so I need to re-read myself into the content, and this thread is gettings so long and to big, so I'm going to start a new thread with no such major overhead.

                                    But before doing this I want to get some assumptions and definition clarified if possible.

                                    I hope you can help me with some litle questions
                                    is the CMN option for training, final recognition or both?
                                    when CMN='none', will -cmninit be useless?
                                    * What do you mean with AGC? (Volume control by change?)
                                    (can option '-agc none' be helpfull in this, or do I need to make my own volume equalizer)

                                    Thanks for your help, I think I'm getting close so I hope I get things right now.

                                     
  • Nickolay V. Shmyrev

    is the CMN option for training, final recognition or both?

    Both

    when CMN='none', will -cmninit be useless?

    Yes

    What do you mean with AGC? (Volume control by change?)

    AGC is automatic gain control. Current agc implementation is not functional though.

     
1 2 > >> (Page 1 of 2)

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.