Hello, I am a college student doing a project with pocketsphinx. I am trying to have it recognize numbers 1-70 and have had some success. I am running it on both a Macbook Pro and Raspberry Pi but it seems to be much more accurate on the Macbook. The recognizer got many numbers confused at first so I started doing acoustic model adaptation to differentiate how the numbers sound. Here is some test data I acquired for the number 16. The first number is the times pocketsphinx got the number correct and the second is incorrect. The M indicated the test running on my Mac and R being the Raspberry Pi. I had one audio clip in a loop to run these tests and the devices were both adapted with the clip used for the test.
16-1
adapted with one utterance of 16
loop tested using the one utterance looped
(52,3)M (40, 15)R
16-2
adapted with 10 utterances
loop tested with the same utterance
(52,3)M (37,18)R
16-3
adapted with 20 uttereances
(53,2)M (40, 15)R
16-4
adapted with 30 utterances
(53,2)M (37,18)R
16-5
adapted with 40 utterances
(54,2)M (28,27)R
16-6
adapted with 50 utterances
(52,3)M (33,22)R
16-7
adapted with 50 utterances of 16 and 10 of 13
(55,0)M (39,16)R
16-8
adapted with 50 of 16, 10 of 13 and 10 of 15
(55,0)M (46,9)R
I first wondered why the Pi got more innacurate as I added utterances but saw that it commonly got 13 or 15 instead of 16 so I adapted the models to compensate. The accuracy on the Mac is great but the Pi not as good. If I'm using the same acousic model for each why is the accuracy poorer on the Pi?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You can dump the audio on raspberry pi with -rawlogdir option and share it here, there could be many reasons, for example, bad microphone or bad audio drivers.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello and thanks for responding to my post. I actually haven't been compiling this code. I'm doing this in python so I've been running my script like this: "python ./loopDemo"
Should I be compiling it rather than allowing python to compile for me? Also, I know of the existence of options such as -rawlogdir but do not know how to enable or change them.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I downloaded pocketsphinx and sphinxbase and then built them. Then I was able to import the modules in my python code:
from pocketsphinx.pocketsphinx import *
from sphinxbase.sphinxbase import *
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello, I am a college student doing a project with pocketsphinx. I am trying to have it recognize numbers 1-70 and have had some success. I am running it on both a Macbook Pro and Raspberry Pi but it seems to be much more accurate on the Macbook. The recognizer got many numbers confused at first so I started doing acoustic model adaptation to differentiate how the numbers sound. Here is some test data I acquired for the number 16. The first number is the times pocketsphinx got the number correct and the second is incorrect. The M indicated the test running on my Mac and R being the Raspberry Pi. I had one audio clip in a loop to run these tests and the devices were both adapted with the clip used for the test.
16-1
adapted with one utterance of 16
loop tested using the one utterance looped
(52,3)M (40, 15)R
16-2
adapted with 10 utterances
loop tested with the same utterance
(52,3)M (37,18)R
16-3
adapted with 20 uttereances
(53,2)M (40, 15)R
16-4
adapted with 30 utterances
(53,2)M (37,18)R
16-5
adapted with 40 utterances
(54,2)M (28,27)R
16-6
adapted with 50 utterances
(52,3)M (33,22)R
16-7
adapted with 50 utterances of 16 and 10 of 13
(55,0)M (39,16)R
16-8
adapted with 50 of 16, 10 of 13 and 10 of 15
(55,0)M (46,9)R
I first wondered why the Pi got more innacurate as I added utterances but saw that it commonly got 13 or 15 instead of 16 so I adapted the models to compensate. The accuracy on the Mac is great but the Pi not as good. If I'm using the same acousic model for each why is the accuracy poorer on the Pi?
You can dump the audio on raspberry pi with -rawlogdir option and share it here, there could be many reasons, for example, bad microphone or bad audio drivers.
Also, you probably compiled the code with fixed point (-enable-fixed), it is not a great idea anymore.
Hello and thanks for responding to my post. I actually haven't been compiling this code. I'm doing this in python so I've been running my script like this: "python ./loopDemo"
Should I be compiling it rather than allowing python to compile for me? Also, I know of the existence of options such as -rawlogdir but do not know how to enable or change them.
How did you install the library then?
I downloaded pocketsphinx and sphinxbase and then built them. Then I was able to import the modules in my python code:
from pocketsphinx.pocketsphinx import *
from sphinxbase.sphinxbase import *
Where did you download sphinxbase and pocketsphinx? How did you build them?
I downloaded them here on sourceforge and followed the documentation here:
https://cmusphinx.github.io/wiki/tutorialpocketsphinx/
Hello, I figured out how to dump the audio using that option.
https://www.dropbox.com/sh/yrltphl38frchar/AADk_bzBWctY0lAzItQCf0ZGa?dl=0
This link has the raw audio files. I went and said the number six, 20 times on the mac and Pi