Menu

General Question about speech recording for Speaker Independent Speech recognition

2016-07-11
2016-07-12
  • Yeasin Ar Rahman

    My question is what kind of speech should i record. The general rule of thumb is one needs to record those sound that one wants to recognize. However most of the big speech recognition systems have large amount of data(200 hour+) we are thinking about taking incremently(i mean record some and then test then again record ...) 10-20 hour of recording. However the rergional bias is also a issue. As our system is Speaker Independent
    How do we find an ideal recording script(the texts that we want to record) that will work well on the following two working conditions
    1. General Command and Control for PC and Mobile
    2. Common search sentences in Web

    we want to cover around most common 5000-15000 words in a foreign(not common) language

    my concerns are following

    1. Where should the recording take place a) isolated room with clean sound or b) in a noisy environment
    2. In speech synthesis there is a notion called phonetically balanced corpus, in case of speech recognition does that notion hold any value, i mean if i try to record sentences with different phonemes rather that similar phonemes again and again will it increase my accuracy. If so is there any automated way or algorithm to find out the suitable sentences from the text corpus.
    3. How can i generalise the regional bias(accent) so that the accuracy improves.

    thank you

     

    Last edit: Yeasin Ar Rahman 2016-07-11
    • Nickolay V. Shmyrev

      Where should the recording take place a) isolated room with clean sound or b) in a noisy environment

      The general rule of thumb is one needs to record those sound that one wants to recognize. I doubt your recognition is going to work in an isolated room. So it is better to record noisy sound.

      In speech synthesis there is a notion called phonetically balanced corpus, in case of speech recognition does that notion hold any value, i mean if i try to record those sentences with different phonemes rathen that similar phonemes will it increase my accuracy. If so is there any automated way or algorithm to find out the suitable sentences from text corpus.

      These days the preference is to collect more data than to spend time on balancing corpus. Such approach is both more efficient and allows you to avoid shortcomings of hand-prepared data. So you do not need any phonetic balance, you just need more data. You can get it from the books, podcasts, tv shows and so on.

      How can i generalise the regional bias(accent) so that the accuracy improves.

      It is still an open question on how to support regional accents efficiently. There is no good solution, you can just use more data.

       

Log in to post a comment.