I'm not sure if the question has been asked. I read AM training on CMUSphinx
Wiki. It says "Audio files shouldn't be very long and shouldn't be very
short." I wonder what would happen (no problem, or errors, or hang) if I would
use a bunch of audio files with a few minutes long for each as training data?
If this would be the limit of SphinxTrain, is audio segmentation the only
solution? If so, what segmentation tool(s) do you recommend me use to cut the
long audio to 5-30 seconds long according to the tutorial on Wiki? Thanks a
lot.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I wonder what would happen (no problem, or errors, or hang) if I would use a
bunch of audio files with a few minutes long for each as training data?
Few minutes will work without an issue but if number minutes will be more the
ten you'll get buffer overflow. Potential threats are also not reaching the
final state of the utterance and ignoring it and get a reduced accuracy of the
model because of misalignment.
If this would be the limit of SphinxTrain, is audio segmentation the only
solution?
Yes
If so, what segmentation tool(s) do you recommend me use to cut the long
audio to 5-30 seconds long according to the tutorial on Wiki? Thanks a lot.
You can train am with long files, then use this model to segment long files on
shorter ones and then retrain the model.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I'm not sure if the question has been asked. I read AM training on CMUSphinx
Wiki. It says "Audio files shouldn't be very long and shouldn't be very
short." I wonder what would happen (no problem, or errors, or hang) if I would
use a bunch of audio files with a few minutes long for each as training data?
If this would be the limit of SphinxTrain, is audio segmentation the only
solution? If so, what segmentation tool(s) do you recommend me use to cut the
long audio to 5-30 seconds long according to the tutorial on Wiki? Thanks a
lot.
Few minutes will work without an issue but if number minutes will be more the
ten you'll get buffer overflow. Potential threats are also not reaching the
final state of the utterance and ignoring it and get a reduced accuracy of the
model because of misalignment.
Yes
You can train am with long files, then use this model to segment long files on
shorter ones and then retrain the model.