audio file length in building AM

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

audio file length in building AM

Forum: Help

Created: 2011-02-15

Updated: 2012-09-22

Jake - 2011-02-15

Hi,

I'm not sure if the question has been asked. I read AM training on CMUSphinx
Wiki. It says "Audio files shouldn't be very long and shouldn't be very
short." I wonder what would happen (no problem, or errors, or hang) if I would
use a bunch of audio files with a few minutes long for each as training data?
If this would be the limit of SphinxTrain, is audio segmentation the only
solution? If so, what segmentation tool(s) do you recommend me use to cut the
long audio to 5-30 seconds long according to the tutorial on Wiki? Thanks a
lot.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-02-15

I wonder what would happen (no problem, or errors, or hang) if I would use a
bunch of audio files with a few minutes long for each as training data?

Few minutes will work without an issue but if number minutes will be more the
ten you'll get buffer overflow. Potential threats are also not reaching the
final state of the utterance and ignoring it and get a reduced accuracy of the
model because of misalignment.

If this would be the limit of SphinxTrain, is audio segmentation the only
solution?

Yes

If so, what segmentation tool(s) do you recommend me use to cut the long
audio to 5-30 seconds long according to the tutorial on Wiki? Thanks a lot.

You can train am with long files, then use this model to segment long files on
shorter ones and then retrain the model.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.