Menu

Questions about preparing the data

Help
2018-07-20
2018-07-20
  • Burak Kaan Bilgehan

    Hello,
    I'm preparing data for the acoustic model to recognize the speech of kids reading primary school stories. my question is about that sentence in the tutorial:

    "the database should have recording of enough speakers, a variety of recording conditions, enough acoustic variations and all possible linguistic sentences"

    • the database should have recording of enough speakers:
      how many speakers are enough? we thought that 10 hours of soeech would be enough for us. but can't decide how we should distribute the speakers. is it okay to make many speakers read the same sentences, or is it better to find a few speakers and make them read many sentences?

    • a variety of recording conditions, enough acoustic variations:
      what's the difference between them? I understand these are about recording inside a studio, outdoors, in a noisy environment etc. do i get it right?

    • all possible linguistic sentences:
      doesn't it mean infinitely many sentences? how can we cover all the possible sentences in a language?

    thank you

    burak

     
    • Nickolay V. Shmyrev

      how many speakers are enough?

      Tutorial says at least 200

      we thought that 10 hours of soeech would be enough for us.

      Modern databases are 50+ hours, ideally 500+ hours

      what's the difference between them? I understand these are about recording inside a studio, outdoors, in a noisy environment etc. do i get it right?

      You understand correctly

      doesn't it mean infinitely many sentences? how can we cover all the possible sentences in a language?

      No, it does not mean infinite amount of sentences. You can download any example database like librispeech and follow the example.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.