What should be the maximum length of recorded audio files for training?

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

What should be the maximum length of recorded audio files for training?

Forum: Help

Creator: rezaee

Created: 2017-06-23

Updated: 2017-06-23

rezaee - 2017-06-23

Hi
I like to know what should be the maximum length of recorded audio files for training acoustic model?
And what is the best length?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-06-29
  
  Tutorial http://cmusphinx.github.io/wiki/tutorialam says:
  
  For continuous speech audio files shouldn’t be very long and shouldn’t be very short. Optimal length is not less than 5 seconds and not more than 30 seconds. Very long files make training much harder. If you are going to recognize short isolated commands, your training database should contain the files with short isolated commands. It is better to design database to recognize continuous speech from the beginning though and not spend your time on commands. In the end you speak continuously anyway. Amount of silence in the beginning of the utterance and in the end of the utterance should not exceed 0.2 second.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

rezaee - 2017-09-26

Thank you Nickolay!
But, if we want to use movie's sound for acoustic model, there is a lot of silence or environments sounds between their speech, should we conside this space between their speech?

For example:

Hello (2s sil)
Hi (1s sil)
(walking sound for 5s)
What did you do?
Nothing(10s silence and road sound)
OK, let's talk about that night ...

I hope I could convey my mean clear!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.