I'm trying to better understand how complex a task building and testing an acoustic model is. I've seen several figures for the size of data sets required for the training a testing of an acoustic model so I was hoping to get more concrete answers all in one place.
Building an acoustic model from scratch: The FAQ says you need 50 hours from 200 speakers
However, this post suggests 200-300 hours.
I have also seen elsewhere where the suggested training size is 400+ hours in papers like that for Julius
I have also seen different reports for the hours required for testing the AM.
Any where from 1 hour to 10 hours
I understand that the more data the better, but what would be a realistic lower bound for the hours of data are required to 1) build an AM from scratch, 2) adapt an existing AM to our specific domain, and 3) reliably test the accuracy of an AM in our particular domain?
Thanks
- Jeff
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This is lower bound for dictation. For callcenter it's certainly not enough or data has to be very high quality (very accurately transcribed with all fillers, false starts and so on).
The problem with callcenter is that the transcriptions are usually low quality, so you need more of them.
However, this post suggests 200-300 hours.
This is a good amount.
I have also seen elsewhere where the suggested training size is 400+ hours in papers like that for Julius
400 hours is also good.
Any where from 1 hour
1-3 hours is enough for test. The link you give is about training though.
to 10 hours
10 is probably too much, but why not
in our particular domain?
What is your domain?
The fact is that current technology is not very robust to transcription erros in a training database. So a lot depends on the quality of transcription. That's why the estimate could be different by order depending on the quality of the transcription you have. So there is no good answer on your question.
Another problem is that technology is not very robust to domain. So the dictation data is almost useless for callcenter even if you adapt except you apply some advanced algorithms.
Last edit: Nickolay V. Shmyrev 2013-12-02
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I'm trying to better understand how complex a task building and testing an acoustic model is. I've seen several figures for the size of data sets required for the training a testing of an acoustic model so I was hoping to get more concrete answers all in one place.
Building an acoustic model from scratch:
The FAQ says you need 50 hours from 200 speakers
However, this post suggests 200-300 hours.
I have also seen elsewhere where the suggested training size is 400+ hours in papers like that for Julius
I have also seen different reports for the hours required for testing the AM.
Any where from 1 hour to 10 hours
I understand that the more data the better, but what would be a realistic lower bound for the hours of data are required to 1) build an AM from scratch, 2) adapt an existing AM to our specific domain, and 3) reliably test the accuracy of an AM in our particular domain?
Thanks
- Jeff
This is lower bound for dictation. For callcenter it's certainly not enough or data has to be very high quality (very accurately transcribed with all fillers, false starts and so on).
The problem with callcenter is that the transcriptions are usually low quality, so you need more of them.
This is a good amount.
400 hours is also good.
1-3 hours is enough for test. The link you give is about training though.
10 is probably too much, but why not
What is your domain?
The fact is that current technology is not very robust to transcription erros in a training database. So a lot depends on the quality of transcription. That's why the estimate could be different by order depending on the quality of the transcription you have. So there is no good answer on your question.
Another problem is that technology is not very robust to domain. So the dictation data is almost useless for callcenter even if you adapt except you apply some advanced algorithms.
Last edit: Nickolay V. Shmyrev 2013-12-02
I appreciate the feedback.
Could you still create an accurate acoustic with as little as 100 hours of voice?
How many hours of data are used in the creation of the generic acoustic model that is available for sphinx?
With 100 hours you can create ok model not good model. Generic model was created with 300 hours.