help with pocketsphinx understanding simple audio

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

help with pocketsphinx understanding simple audio

Forum: Help

Creator: Brad Walker

Created: 2017-06-09

Updated: 2017-06-09

Brad Walker - 2017-06-09

I'm a noob with PocketSphinx. But, I've looked over all the docs and think what I'm trying to do is correct.

I have a file that contains audio. The only spoken word in the audio is: "remarks".

In looking at the pocketsphinx dictionary the word is contained in there. So I would assume that it should be understood correctly. But, apparently not. I'm testing the simple application located on the CMU PocketSphinx websiste site: https://cmusphinx.github.io/wiki/tutorialpocketsphinx/

My file is attached. It is called foo.mp3 I convert it to 16bit, single chanel, 16K sample rate using the following command.

ffmpeg -i file.mp3 -f s16le -ar 16K -ac 1 -acodec pcm_s16le file.pcm

Yet the recognized text is incorrectly..

Here is what pocketsphix says it see..

Recognized: what are our act

Which is not correct.. I've attached the log output from pocketsphinx.

Any insight/help is most appreciated!

-brad w.

Last edit: Brad Walker 2017-06-09

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Brad Walker - 2017-06-09

Here are my files to help with my question..

foo.mp3

pocketsphinx.log

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-06-09
  
  https://cmusphinx.github.io/wiki/faq/#q-what-is-sample-rate-and-how-does-it-affect-accuracy
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Brad Walker - 2017-06-09

Okay.. Makes sense..

So I downloaded the 8k acoustic model from the CMU Sphinx download directory..

Create an 8k sample rate file from my mp3 using the following command..

ffmpeg -i foo.mp3 -ac 1 -ar 8k foo.wav

Then used pocketsphinux_continuous as follows

pocketsphinx_continuous -samprate 8000 -hmm en-us-8khz -infile foo.wav

I won't bore you with the logs.. But, I did see the following on the output..

INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 0 words

And there was no recognized words..

Any advice?

-brad w.

BTW. Thanks very much for the prompt response!!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-06-10
  
  Your audio bandwidth is less than 8khz, you have to train an acoustic model to recognize it.
  
  I won't bore you with the logs.. But, I did see the following on the output..
  
  This is not a good idea for a technical forum.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.