Menu

What should be the maximum length of recorded audio files for training?

Help
rezaee
2017-06-23
2017-06-23
  • rezaee

    rezaee - 2017-06-23

    Hi
    I like to know what should be the maximum length of recorded audio files for training acoustic model?
    And what is the best length?

     
    • Nickolay V. Shmyrev

      Tutorial http://cmusphinx.github.io/wiki/tutorialam says:

      For continuous speech audio files shouldn’t be very long and shouldn’t be very short. Optimal length is not less than 5 seconds and not more than 30 seconds. Very long files make training much harder. If you are going to recognize short isolated commands, your training database should contain the files with short isolated commands. It is better to design database to recognize continuous speech from the beginning though and not spend your time on commands. In the end you speak continuously anyway. Amount of silence in the beginning of the utterance and in the end of the utterance should not exceed 0.2 second.

       
  • rezaee

    rezaee - 2017-09-26

    Thank you Nickolay!
    But, if we want to use movie's sound for acoustic model, there is a lot of silence or environments sounds between their speech, should we conside this space between their speech?

    For example:

    Hello (2s sil)
    Hi (1s sil)
    (walking sound for 5s)
    What did you do?
    Nothing(10s silence and road sound)
    OK, let's talk about that night ...

    I hope I could convey my mean clear!

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.