CMU Sphinx / Forums / Help: Questions about preparing the data

Speech Recognition Toolkit

Questions about preparing the data

Forum: Help

Creator: Burak Kaan Bilgehan

Created: 2018-07-20

Updated: 2018-07-20

Burak Kaan Bilgehan - 2018-07-20

Hello,
I'm preparing data for the acoustic model to recognize the speech of kids reading primary school stories. my question is about that sentence in the tutorial:

"the database should have recording of enough speakers, a variety of recording conditions, enough acoustic variations and all possible linguistic sentences"

the database should have recording of enough speakers:
how many speakers are enough? we thought that 10 hours of soeech would be enough for us. but can't decide how we should distribute the speakers. is it okay to make many speakers read the same sentences, or is it better to find a few speakers and make them read many sentences?

a variety of recording conditions, enough acoustic variations:
what's the difference between them? I understand these are about recording inside a studio, outdoors, in a noisy environment etc. do i get it right?

all possible linguistic sentences:
doesn't it mean infinitely many sentences? how can we cover all the possible sentences in a language?

thank you

burak
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2018-07-20
  
  how many speakers are enough?
  
  Tutorial says at least 200
  
  we thought that 10 hours of soeech would be enough for us.
  
  Modern databases are 50+ hours, ideally 500+ hours
  
  what's the difference between them? I understand these are about recording inside a studio, outdoors, in a noisy environment etc. do i get it right?
  
  You understand correctly
  
  doesn't it mean infinitely many sentences? how can we cover all the possible sentences in a language?
  
  No, it does not mean infinite amount of sentences. You can download any example database like librispeech and follow the example.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Questions about preparing the data

Speech Recognition Toolkit

Forums

Help

Questions about preparing the data document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Questions about preparing the data