These are the requirements for being able to train a new Acoustic Model:
1 hour of recording for command and control for a single speaker
5 hours of recordings of 200 speakers for command and control for many speakers
10 hours of recordings for single speaker dictation
50 hours of recordings of 200 speakers for many speakers dictation
I am misunderstanding a concept here. Which is the difference, and what should be changed in the recordings and transcriptions, between those different concepts:
"command and control" (single/many speakers)
"speaker dictation" (single/many speakers)
Thanks!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
These are the requirements for being able to train a new Acoustic Model:
I am misunderstanding a concept here. Which is the difference, and what should be changed in the recordings and transcriptions, between those different concepts:
Thanks!
We have trouble to understand what you do not understand.