CMU Sphinx / Forums / Help: pocketsphinx accuracy and performance

luciano - 2011-05-03

Hello,
I am using pocketsphinx for spoken command recognition. Grammar is simple and
it is a small vocablary set; however, recognition accuracy and performance is
lower than expected. In order to check if I were doing anything wrong, I
compared pocketsphinx performance and accuracy with sphinx3 decoder, using
models provided by cmusphinx (US English Tidigits Telephone Acoustic Model and
Voxforge English ) and models I trained. In all
the cases I used 8kHz sampling frequency and continuous models
(tidigits_cd_phone_201103 and voxforge_en_sphinx.cd_cont_3000). I also adapted
the voxforge model with the TIDIGITS database using MLLR and MAP
I trained a model with the TIDIGIT speech database . I called this model
"tidigits.8k.cd_cont_250".
Here are some performance metrics I obtained along with some parameter I used
for sphinx 3 and pocketsphinx:

With pockecsphinx
Ins Dels Subs WER SER ACC CORR BEAM WBEAM LW WIP speech xRT Model
125 151 323 2.10% 4.89% 97.90% 98.34% 1.00E-060 1.00E-040 12 1.00E-001
14862.32 0.03 tidigits_cd_phone_201103.
112 167 1208 5.20% 13.05% 94.80% 95.19% 1.00E-060 1.00E-040 12 5.00E-001
14862.32 0.07 voxofrge_en (adapted)
103 126 181 1.43% 3.38% 98.57% 98.93% 1.00E-060 1.00E-040 12 1.00E-001
14862.32 0.02 tidigits.8k.cd_cont_250

With sphinx 3 decode
Ins Dels Subs WER SER ACC CORR BEAM WBEAM LW WIP speech xRT Model
165 29 139 1.17% 3.23% 98.83% 99.41% 1.00E-060 1.00E-040 12 1.00E-001 14862.32
0.01 tidigits_cd_phone_201103
195 67 105 1.28% 3.86% 98.72% 99.40% 1.00E-060 1.00E-040 12 5.00E-001 14862.32
0.02 voxofrge_en (adapted)
68 28 64 0.56% 1.67% 99.44% 99.68% 1.00E-060 1.00E-040 12 1.00E-001 14862.32
0.01 tidigits.8k.cd_cont_250

As you can see, sphinx 3 outperforms pocketsphinx in every case. What can I do
to get better performance with pocketsphinx? Are these performances Ok, or
should I expect better performce with pocketsphinx?
Here are some other tuning I have made:

with tidigits_cd_phone_201103 and tidigits.8k.cd_cont_250 models I used:

-sendump models/tidigits/<model>/hmm/sendump </model>

for pocketsphinx:

-fwdflat yes
-fwdtree yes
-bestpath yes
-pl_window 1

-fillprob 0.01

with voxforge english (adapted) model I used

Tuning - Reducing GMM computation

-ci_pbeam 1e-5

Tuning - Reducing HMM computation and Search

-maxcdsenpf 60
-maxhmmpf 70
-maxwpf 4

-topn 2

for pocketsphinx:

-fwdflat yes
-fwdtree yes
-bestpath yes
-pl_window 1

-fillprob 0.01

Thank you very in advance for your help
Luciano

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-05-03

Hello

I can't confirm your experiment results. Accuracy is ok for all
experiments, it's reasonable for telephone model, voxforge adaptation
result as well for your trained model but speed is not correct.

Given we consider the last experiment, with cd_cont_250 database, the
numbers must be following with latest Sphixntrain-0.7 and
pocketsphinx-0.7 and latest sphinx3 on TIDIGITS test set

WER SER RT BEAM WBEAM LW WIP pocketsphinx 1.1% 3.1% 0.006 1e-80 1e-40 12 0.1 sphinx3 0.7 1.8% 0.025 1e-80 1e-40 12 0.1

Yes, pocketsphinx is a little bit less accurate but it's compensated by
4 time faster decoding. It's possible to tune pocketsphinx for accuracy
(disable top gaussian tracking, score shifting, etc). For semicontinuous
models pocketsphinx should be as accurate as sphinx3. For a real-life
conditions (noise, etc) the difference will become way less significant.

Take into account that the following options are only useful for a large
vocabulary. They reduce accuracy and waste time with TIDIGITS:

-fwdflat yes -bestpath yes -pl_window 1

It's better to disable all three
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

luciano - 2011-05-03

Thank you very much Nickolay, for your reply.I really appreciate it very much

Now I understand that accuracy is faily well with all the models I am working
with. Regarding performance you said:

... but speed is not correct.

I calculate speed using the following lines from the log files:

In the case of pocketsphinx:

lines in the log file:
INFO: batch.c(774): TOTAL 14862.32 seconds speech, 391.89 seconds CPU, 392.90
seconds wall
INFO: batch.c(776): AVERAGE 0.03 xRT (CPU), 0.03 xRT (elapsed)

and my calculation using perl is:
while( <file> ) { </file>

print;

if (/TOTAL\s+(\S+)\sseconds speech,\s+(\S+)\sseconds CPU,\s+(\S+)/) {
$speechTime= $1;
$CPUTime = $2;
$wallTime =$3;
}
}
$xRT = $CPUTime / $speechTime;

in the case of sphinx3, it is slightly different:

lines in the log file:
INFO: stat.c(206): SUMMARY: 1486232 fr; 55 cdsen/fr, 102 cisen/fr, 440
cdgau/fr, 816 cigau/fr, 0.01 xCPU 0.01 xClk ; 32 hmm/fr, 1 wd/fr, 0.00 xCPU
0.00 xClk; tot: 0.01 xCPU, 0.02 xClk

and my calculation using perl is:
while( <file> ) {
...
if (/tot:\s+(\S+)\s+xCPU,\s+(\S+)\s+xClk/) {
$xCPU= $1;
$xClk = $2;
}
}
}
$xRT = $xCPU; </file>

Is there anything wrong with the way I extract xRT info from the log files?

and here are the results after using

-fwdflat no
-bestpath no

-pl_window 1

Ins Dels Subs WER SER ACC CORR BEAM WBEAM LW WIP speech xRT
91 116 182 1.36% 3.46% 98.64% 98.96% 1.00E-060 1.00E-040 12 1.00E-001 14862.32
0.0239

just a bit better.
I am running the experiments in a PC with i7 CPU 870/2.93GHz (quad core/8
logical processors) and 4GB of physical memory
I'm also running pocketsphinx 0.7 on a Motorola Atrix smartphone (dual core
processor) using the demo for android devices. It takes very long for the
pocketsphinx demo to load the voxforge adapted model. Recognition is also
significantly slow (comparing with TIDIGIT models). In all the cases I am
using the same configuration as in the PC. Should I use semicontinuous models?
Thank you again Nickolay

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-05-03

INFO: batch.c(774): TOTAL 14862.32 seconds speech, 391.89 seconds CPU,
392.90 seconds wall
INFO: batch.c(776): AVERAGE 0.03 xRT (CPU), 0.03 xRT (elapsed)

I don't think it's the speed calculation issue I think the issue is about
parameters you are using to invoke pocketsphinx. Our sphinx3 performance is
more or less equivalent, the pocketsphinx one is very different. Maybe you can
share the full log to compare.

Recognition is also significantly slow (comparing with TIDIGIT models). In
all the cases I am using the same configuration as in the PC. Should I use
semicontinuous models?

Yes, you need semicont model and few more things. Default model hub4wsj_sc_8k
should be better than voxforge from this point of view.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

luciano - 2011-05-03

Nickolay, I am sharing the full log it at: http://www.mediafire.com/?5st20586
2da4768

Thanks

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Hey, I see in the log you are still using pl_window. That makes decoder very
slow. Are you sure you properly pass all the arguments? Check my list.

/home/nshmyrev/projects/cmusphinx-dist/tidigits/bin/pocketsphinx_batch \
        -hmm /home/nshmyrev/projects/cmusphinx-dist/tidigits/model_parameters/tidigits.cd_cont_250 \
        -lw 12 \
        -feat 1s_c_d_dd \
        -beam 1e-80 \
        -wbeam 1e-40 \
        -dict /home/nshmyrev/projects/cmusphinx-dist/tidigits/etc/tidigits.dic \
        -lm /home/nshmyrev/projects/cmusphinx-dist/tidigits/etc/tidigits.lm.DMP \
        -wip 0.2 \
        -ctl /home/nshmyrev/projects/cmusphinx-dist/tidigits/etc/tidigits_test.fileids \
        -ctloffset 0 \
        -ctlcount 8688 \
        -cepdir /home/nshmyrev/projects/cmusphinx-dist/tidigits/feat \
        -cepext .mfc \
        -hyp /home/nshmyrev/projects/cmusphinx-dist/tidigits/result/tidigits-1-1.match \
        -agc none \
        -varnorm no \
        -cmn current \
        -fwdflat no \
        -bestpath no

luciano - 2011-05-03

You are absolutely right. that was the problem. In my script I had hardcoded
some default values and pl_windows was among them. (pl_window default value is
1 in S3 but 0 in PS)
After correcting it I am obtaining

Ins Dels Subs WER SER ACC CORR BEAM WBEAM LW WIP speech xRT1 50 59 164 1.30% 3.53% 98.70% 99.22% 1.00E-060 1.00E-040 12 1.00E-001 14862.32 0.0058

A bit less accurate than yours but almost the same speed.
Thank you very much Nickolay

Oh, by the way, In a previous message you also said:

Yes, you need semicont model and few more things.

Please, let me know if you had any other suggestion apart from the ones you
have already given to me.
Thanks
Luciano
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

pocketsphinx accuracy and performance

Speech Recognition Toolkit

Forums

Help

pocketsphinx accuracy and performance document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

for pocketsphinx:

-fillprob 0.01

Tuning - Reducing GMM computation

Tuning - Reducing HMM computation and Search

for pocketsphinx:

-fillprob 0.01

print;

-pl_window 1

pocketsphinx accuracy and performance