Menu

VERY HIGH WER even with enough amount of audio

Help
Leimiaoren
2016-02-02
2016-03-02
  • Leimiaoren

    Leimiaoren - 2016-02-02

    I have 5 hours of training audio

    I used some of the training data as test data but the value of the WER is 93%, What could be the problem?

    I attached here my whole training folder and I have the following questions:

    Why can't I achieve 0% WER for a testing data that came from the training audios itself?
    Is it possible to achieve 10% WER with 5 hours of data? or is this not enough?
    What are other ways to improve WER of my system?
    I have 4000 audio utterances and sometimes I get 3000 errors in one baum welch iteration. What can I do to fix this because I have read this is a serious issue.
    *Do i need to use force aligned? if yes then where can I download sphinx3?

     
    • Nickolay V. Shmyrev

      Why can't I achieve 0% WER for a testing data that came from the training audios itself?

      In your training folder features are not extracted properly. Some features are shorter than it shoudl be. Total duration is reported to be 2.1 hours while it is actually 4.8 hours. It seems features are still extracted from 44khz files, you need to run training from start reextracting features.

      For example, file size of BT_204.mfc must be 31kB, in your archive it is 8kB.

      Once you reextract features, you will get WER 5%, I have this WER when I train with your folder without any modifications.

      I have 4000 audio utterances and sometimes I get 3000 errors in one baum welch iteration. What can I do to fix this because I have read this is a serious issue.

      Once you fix feature extraction errors will go away.

       
      • Nickolay V. Shmyrev

        Some of your files are still 44khz:

        D_104.wav:     RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 44100 Hz
        D_12.wav:      RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 44100 Hz
        D_20.wav:      RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 44100 Hz
        D_28.wav:      RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 44100 Hz
        D_37.wav:      RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 44100 Hz
        D_45.wav:      RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 44100 Hz
        D_57.wav:      RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 44100 Hz
        D_64.wav:      RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 44100 Hz
        D_71.wav:      RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 44100 Hz
        D_79.wav:      RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 44100 Hz
        D_86.wav:      RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 44100 Hz
        D_97.wav:      RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 44100 Hz
        E_12.wav:      RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 44100 Hz
        E_18.wav:      RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 44100 Hz
        E_25.wav:      RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 44100 Hz
        E_30.wav:      RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 44100 Hz
        E_49.wav:      RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 44100 Hz
        E_56.wav:      RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 44100 Hz
        E_5.wav:       RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 44100 Hz
        E_63.wav:      RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 44100 Hz
        E_71.wav:      RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 44100 Hz
        E_78.wav:      RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 44100 Hz
        E_83.wav:      RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 44100 Hz
        

        To batch resample files, use sox.

         
        • Leimiaoren

          Leimiaoren - 2016-02-03

          Thank you very much for patiently answering all my questions Sir

           
      • Leimiaoren

        Leimiaoren - 2016-02-03

        Thank you for the fast reply Sir and it was indeed true that Word Error Rate had dropped to 6%.

        But we still have a problem

        When I configured my acoustic model to my program, it does not transcribe the data files accurately while the an4.align shows really good result.

        For example, I used A_1 audio file as input and the transcribed text from my program is "abaja abajo abajada abajada abajada" but in an4.align it showed "aba abaaba abaja abajo abajada"

        What could be my problem this time? I have already checked everything. Why can't the configured model transcribe my trained audio data accurately?

         
      • Leimiaoren

        Leimiaoren - 2016-02-03
         

        Last edit: Leimiaoren 2016-02-03
        • Nickolay V. Shmyrev

          Sphinx4 is not going to accurately reproduce results of training. You need accurate model first of all, for example your LM does not seem correct, you'd better train it properly.

           
          • Leimiaoren

            Leimiaoren - 2016-02-23

            If Sphinx4 cannot accurately reproduce results of training then is there any study about this? Like a research or article so I can cite it on my paper. I can't find any.

             
            • Nickolay V. Shmyrev

              There are no studies, it is a common sense.

               
              • Leimiaoren

                Leimiaoren - 2016-03-01

                Then whats in the training data that does not let the sphinx4 decoder transcribe it accurately? Or what does the sphinx4 decoder lack that cannot transcribe the training data accurately? Please do reply. I REALLY NEED THIS.

                 
  • Leimiaoren

    Leimiaoren - 2016-03-02

    Please. help.

     
    • Nickolay V. Shmyrev

      You need to provide all the information to describe your problem and reproduce your troubles. The faster you provide the information the faster you get an advice.

       
      • Leimiaoren

        Leimiaoren - 2016-03-02

        Then whats in the training data that does not let the sphinx4 decoder transcribe it accurately? Or what does the sphinx4 decoder lack that cannot transcribe the training data accurately? Please do reply. I REALLY NEED THIS.

         
        • Nickolay V. Shmyrev

          No idea, I haven't seen your training data and also I haven't seen why sphinx4 cannot transribe accurately.

           
          • Leimiaoren

            Leimiaoren - 2016-03-02

            When I configured my acoustic model to my program, it does not transcribe the data files accurately while the an4.align shows really good result.

            For example, I used A_1 audio file as input and the transcribed text from my program is "abaja abajo abajada abajada abajada" but in an4.align it showed "aba abaaba abaja abajo abajada"

            There must be a reason why sphinx4 decoder cannot accurately reproduced trained results. Do you by any chance know the reason sir?

             
            • Nickolay V. Shmyrev

              It seems to be a bug in sphinx4 related to .lm.bin language model. If you use arpa format language model, result will be accurate. I need to investigate why lmbin does not work in s4, that will take some time. You can report a bug about it in our issue tracker.

               
  • Leimiaoren

    Leimiaoren - 2016-03-02

    thank you very much for answering. I can use this as reference.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.