I'm trying to use pocketsphinx_continuous to do speech recognition. I have confirmed that the wav file is 16bit, 16KHz mono format (I download the wav from network and use audacity to convert the format). I use the following command:
pocketsphinx_continuous -infile filename.wav
But the accuracy is poor. Almost no words is recognized. However, if I use pocketsphinx_continuous to recognize the sample wav provided by CMU sphinx, such as arctic_a0001.wav, it can recognize the wav file perfectly.
And then I download the dict cmudict-en-us.dict, the language model file en-70k-0.2.lm and the en-us acoustic model provided by CMU sphinx to recognize my own wav file, it doesn't help.
Is there anything wrong I'm using the pocketsphinx? Thanks.
Regards,
Libin
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Your sample is very heavily corrupted with reduced bandwidth, Scottish accent, and, more important some noise removal. Overall noise removal corrupts speech too much, it is better to avoid it.
You have to train your own acoustic model to recognize this set properly. You could use tedlium dataset probably, but most likely you have to collect Scottish data additionally. You can pass database audio through similar processing in order to get matching audio for training. You need to reconcider how noise removal is applied, something less intrusive is needed.
Last edit: Nickolay V. Shmyrev 2016-11-15
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for your reply. It seems I couldn't comment last few days.
After studying "Adapting existing acoustic model" and "Building the acoustic model", I feel a little confused. Can I adapte the en-us acoustic model (although the accent is different) instead of building a new acoustic model?
Regards,
Libin
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I wrote you in the post above, let me highlight: You have to train your own acoustic model to recognize this set properly. You could use tedlium dataset probably, but most likely you have to collect Scottish data additionally.
Can I adapte the en-us acoustic model (although the accent is different) instead of building a new acoustic model?
No
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Get it. I was having that wrong idea because I saw in the wiki "handwriting recognition or dictation support for another language." need train a new acoustic model. I was thinking they are both Engligh.
Thanks for your clarifcation.
Regards,
Libin
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
I'm trying to use pocketsphinx_continuous to do speech recognition. I have confirmed that the wav file is 16bit, 16KHz mono format (I download the wav from network and use audacity to convert the format). I use the following command:
pocketsphinx_continuous -infile filename.wav
But the accuracy is poor. Almost no words is recognized. However, if I use pocketsphinx_continuous to recognize the sample wav provided by CMU sphinx, such as arctic_a0001.wav, it can recognize the wav file perfectly.
And then I download the dict cmudict-en-us.dict, the language model file en-70k-0.2.lm and the en-us acoustic model provided by CMU sphinx to recognize my own wav file, it doesn't help.
Is there anything wrong I'm using the pocketsphinx? Thanks.
Regards,
Libin
To get answer on this question you need to share the file you are trying to process.
Thanks, please see the attachment.
Your sample is very heavily corrupted with reduced bandwidth, Scottish accent, and, more important some noise removal. Overall noise removal corrupts speech too much, it is better to avoid it.
You have to train your own acoustic model to recognize this set properly. You could use tedlium dataset probably, but most likely you have to collect Scottish data additionally. You can pass database audio through similar processing in order to get matching audio for training. You need to reconcider how noise removal is applied, something less intrusive is needed.
Last edit: Nickolay V. Shmyrev 2016-11-15
Hi Nickolay,
Thanks for your reply. It seems I couldn't comment last few days.
After studying "Adapting existing acoustic model" and "Building the acoustic model", I feel a little confused. Can I adapte the en-us acoustic model (although the accent is different) instead of building a new acoustic model?
Regards,
Libin
I wrote you in the post above, let me highlight: You have to train your own acoustic model to recognize this set properly. You could use tedlium dataset probably, but most likely you have to collect Scottish data additionally.
No
Hi Nickolay,
Get it. I was having that wrong idea because I saw in the wiki "handwriting recognition or dictation support for another language." need train a new acoustic model. I was thinking they are both Engligh.
Thanks for your clarifcation.
Regards,
Libin