Menu

Speech recognition accuracy is poor with default setting

Help
Libin Yang
2016-11-07
2016-11-16
  • Libin Yang

    Libin Yang - 2016-11-07

    Hello,

    I'm trying to use pocketsphinx_continuous to do speech recognition. I have confirmed that the wav file is 16bit, 16KHz mono format (I download the wav from network and use audacity to convert the format). I use the following command:

    pocketsphinx_continuous -infile filename.wav

    But the accuracy is poor. Almost no words is recognized. However, if I use pocketsphinx_continuous to recognize the sample wav provided by CMU sphinx, such as arctic_a0001.wav, it can recognize the wav file perfectly.

    And then I download the dict cmudict-en-us.dict, the language model file en-70k-0.2.lm and the en-us acoustic model provided by CMU sphinx to recognize my own wav file, it doesn't help.

    Is there anything wrong I'm using the pocketsphinx? Thanks.

    Regards,
    Libin

     
    • Nickolay V. Shmyrev

      To get answer on this question you need to share the file you are trying to process.

       
  • Libin Yang

    Libin Yang - 2016-11-08

    Thanks, please see the attachment.

     
    • Nickolay V. Shmyrev

      Your sample is very heavily corrupted with reduced bandwidth, Scottish accent, and, more important some noise removal. Overall noise removal corrupts speech too much, it is better to avoid it.

      You have to train your own acoustic model to recognize this set properly. You could use tedlium dataset probably, but most likely you have to collect Scottish data additionally. You can pass database audio through similar processing in order to get matching audio for training. You need to reconcider how noise removal is applied, something less intrusive is needed.

       

      Last edit: Nickolay V. Shmyrev 2016-11-15
      • Libin Yang

        Libin Yang - 2016-11-15

        Hi Nickolay,

        Thanks for your reply. It seems I couldn't comment last few days.

        After studying "Adapting existing acoustic model" and "Building the acoustic model", I feel a little confused. Can I adapte the en-us acoustic model (although the accent is different) instead of building a new acoustic model?

        Regards,
        Libin

         
        • Nickolay V. Shmyrev

          I wrote you in the post above, let me highlight: You have to train your own acoustic model to recognize this set properly. You could use tedlium dataset probably, but most likely you have to collect Scottish data additionally.

          Can I adapte the en-us acoustic model (although the accent is different) instead of building a new acoustic model?

          No

           
          • Libin Yang

            Libin Yang - 2016-11-16

            Hi Nickolay,

            Get it. I was having that wrong idea because I saw in the wiki "handwriting recognition or dictation support for another language." need train a new acoustic model. I was thinking they are both Engligh.

            Thanks for your clarifcation.

            Regards,
            Libin

             

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.