CMU Sphinx / Forums / Help: Speech recognition accuracy is poor with default setting

Libin Yang - 2016-11-07

Hello,

I'm trying to use pocketsphinx_continuous to do speech recognition. I have confirmed that the wav file is 16bit, 16KHz mono format (I download the wav from network and use audacity to convert the format). I use the following command:

pocketsphinx_continuous -infile filename.wav

But the accuracy is poor. Almost no words is recognized. However, if I use pocketsphinx_continuous to recognize the sample wav provided by CMU sphinx, such as arctic_a0001.wav, it can recognize the wav file perfectly.

And then I download the dict cmudict-en-us.dict, the language model file en-70k-0.2.lm and the en-us acoustic model provided by CMU sphinx to recognize my own wav file, it doesn't help.

Is there anything wrong I'm using the pocketsphinx? Thanks.

Regards,
Libin

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-11-07
  
  To get answer on this question you need to share the file you are trying to process.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Libin Yang - 2016-11-08

Thanks, please see the attachment.

7.wav

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-11-08
  
  Your sample is very heavily corrupted with reduced bandwidth, Scottish accent, and, more important some noise removal. Overall noise removal corrupts speech too much, it is better to avoid it.
  
  You have to train your own acoustic model to recognize this set properly. You could use tedlium dataset probably, but most likely you have to collect Scottish data additionally. You can pass database audio through similar processing in order to get matching audio for training. You need to reconcider how noise removal is applied, something less intrusive is needed.
  
  Last edit: Nickolay V. Shmyrev 2016-11-15
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Libin Yang - 2016-11-15
    
    Hi Nickolay,
    
    Thanks for your reply. It seems I couldn't comment last few days.
    
    After studying "Adapting existing acoustic model" and "Building the acoustic model", I feel a little confused. Can I adapte the en-us acoustic model (although the accent is different) instead of building a new acoustic model?
    
    Regards,
    Libin
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Nickolay V. Shmyrev - 2016-11-15
      
      I wrote you in the post above, let me highlight: You have to train your own acoustic model to recognize this set properly. You could use tedlium dataset probably, but most likely you have to collect Scottish data additionally.
      
      Can I adapte the en-us acoustic model (although the accent is different) instead of building a new acoustic model?
      
      No
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Libin Yang - 2016-11-16
        
        Hi Nickolay,
        
        Get it. I was having that wrong idea because I saw in the wiki "handwriting recognition or dictation support for another language." need train a new acoustic model. I was thinking they are both Engligh.
        
        Thanks for your clarifcation.
        
        Regards,
        Libin
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Speech recognition accuracy is poor with default setting

Speech Recognition Toolkit

Forums

Help

Speech recognition accuracy is poor with default setting document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Speech recognition accuracy is poor with default setting