Menu

High error rate

Help
2020-01-20
2020-01-20
  • Mariano Baci

    Mariano Baci - 2020-01-20

    Hello again. I used to have a very good acoustic model with 43 words and 124 wav files to train with just 12 % error rate. I decide to extend my model with 22 more words and 216 more wav files. So now I have 65 words with 340 wav files to train. but the problem is that now the error rate is very high. It is about 38%. Why does it happend?
    This are the last lines:
    MODULE: DECODE Decoding using models previously trained
    Decoding 340 segments starting at 0 (part 1 of 1)
    0%
    Aligning results to find error rate
    SENTENCE ERROR: 37.4% (127/340) WORD ERROR RATE: 22.1% (205/930)

     

    Last edit: Mariano Baci 2020-01-20
    • Nickolay V. Shmyrev

      You see the unstability of results due to the small amount of training data.

       
  • Mariano Baci

    Mariano Baci - 2020-01-20

    But is it not supposed to increase accuracy as quantity increases?

     
    • Nickolay V. Shmyrev

      No, until you have at least 50 hours of data.

       
  • Mariano Baci

    Mariano Baci - 2020-01-20

    But I need just 65 word. Should I record these words for 50 hours?

     
    • Nickolay V. Shmyrev

      Not necessary, you can take arbitrary annotated recordings.

       
      • Mariano Baci

        Mariano Baci - 2020-01-20

        They doesn't exist in my language

         
        • Nickolay V. Shmyrev

          Care to say what is your super secret language?

           
  • Mariano Baci

    Mariano Baci - 2020-01-20

    Albanian

     
  • Mariano Baci

    Mariano Baci - 2020-01-20

    I think this is little bit weird, because my model used to work very well with just 124 wav files so i though adding 216 more it will work at least little bit better but it doesnt, it has the same results.

     
    • Mariano Baci

      Mariano Baci - 2020-01-20

      Is this normal?

       
      • Nickolay V. Shmyrev

        Yes, without dataset of the size specified in the tutorial it will not work. You can probably record more data and use less parameters in the model.

        There is a large research on low-resourced languages, I am just not sure you'll be able to apply it. For example, you can also add English data to training as a helper dataset and use a common phoneset.

         
        • Mariano Baci

          Mariano Baci - 2020-01-20

          Okay, thank you so much

           
        • Mariano Baci

          Mariano Baci - 2020-01-20

          Okay, thank you so much

           

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.