Menu

ASR on Far field, and noise ridden speech

audiophile
2014-02-26
2014-03-01
  • audiophile

    audiophile - 2014-02-26

    Hi All,

    I am looking at experimenting with performance (accuracy) of pocketsphinx (with hub4wsj_sc_8k, as acoustic model) when speech is -

    1. Noisy - Car noise (windows up and down), cafe noise.
    2. Far field - when input is coming from distance of more than 4 feet (speaker is 4-6 feet away, from microphone) and ambience is -

    a) regular rooms in home
    b) open spaces like park.

    I am looking to achieve WER or > 90%-95%. I would like to ask the experts -

    1. Which AM is best to start with for #1, #2a and #2b ?
    2. I am assuming that I may not get very high accuracy with some of AMs that pocketsphinx comes bundled with, and I may have to do training/adaptation of AMs. If so what can be my first baby step ?

    a)Should I go for outright training of AM ? If so could you please point me to right audio corpus preferably free, but willing to pay for right stimulus.
    b) Start with adaptation ? Again, would be great to know starting point in terms of right audio corpus

    Thanks in advance for the guidance.

     
  • Nickolay V. Shmyrev

    Which AM is best to start with for #1, #2a and #2b ?

    Our recommended acoustic model for US English is US English Generic acoustic model available in downloads.

    I am assuming that I may not get very high accuracy with some of AMs that pocketsphinx comes bundled with, and I may have to do training/adaptation of AMs.

    Usually training/adaptation has nothing to do with the robust speech recognition. For robust distant speech recognition in noise the critical features of the recognizer are:

    1. Noise-robust features
    2. Noise cancellation with microphone array
    3. Dereverberation algoirthm

    First of all you need to try to get a microphone array for the speech source separation.

    If so what can be my first baby step ?

    The first baby step would be to collect a set of transcribed test recordings to reproduce your problems and try to estimate the current accuracy and current problems. For small vocabulary and not very large noise speech recognition should work out of box with latest cmusphinx versions which include noise-robust processing already.

     
  • audiophile

    audiophile - 2014-02-27

    Can I cheat somewhat :-) Can I take up a existing corpus (lets say WSJ0) and play it out (on good quality speakers) and record it from microphone at different distances ? I believe I can get done quickly and I will have 'far field' equivalent of WSJ0, for an apple to apple comparison.

    I could be naive here - to assume that speaker to microphone channel will not introduce significant alteration in sound attributes/features. Please give your opinion.

     
  • audiophile

    audiophile - 2014-02-28

    Alternatively - I would like to ask - is there any audio corpus available that has recordings from distant microphones. In my search I came across AMI. Some document which describes AMI - seemed to mention that audio recordings were done with distant mics.

    I would like to try these out over PS for small vocabulary system, to get an initial feel of what is accuracy of PS, as suggested by you Nickolay.

     
  • Nickolay V. Shmyrev

    Can I take up a existing corpus (lets say WSJ0) and play it out (on good quality speakers) and record it from microphone at different distances ? I believe I can get done quickly and I will have 'far field' equivalent of WSJ0, for an apple to apple comparison. I could be naive here - to assume that speaker to microphone channel will not introduce significant alteration in sound attributes/features. Please give your opinion.

    You can do that. Still you need a real-case database for testing. It doesn't need to be as large as WSJ, a small set of 100 samples is enough. First of all acoustic of the sound produced by mouth is way different than the one produced by speakers. Second, people render commands differently from read speech of WSJ database.

    You can download TEDLIUM corpus for your experiments

    There are various databases for distant recording at CMU, for example

    http://www.speech.cs.cmu.edu/databases/micarray/index.html

    http://www.speech.cs.cmu.edu/databases/pda/README.html

     
  • audiophile

    audiophile - 2014-03-01

    Would anyone from sphinx community, happen to have run PS on aforementioned TEDLIUM and micarray data bases ? I would like to get myself familiar with baseline performance of PS on these databases. micarray data base from CMU does seem to have recordings for distance of 1-3 meters, which is what I was looking for.

     

Log in to post a comment.