CMU Sphinx / Forums / Speech Recognition Theory: Pocketsphinx n-best lists

Berker Batur - 2010-09-30

Hi,
How can I obtain n-best list with pocketsphinx_batch ?

I execute the following command:
pocketsphinx_batch -hmm
/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k -dict
/usr/local/share/pocketsphinx/model/lm/en_US/cmu07a.dic -lm
/usr/local/share/pocketsphinx/model/lm/en_US/wsj0vp.5000.DMP -cepdir ./cep
-ctl files.dat -cepext .wav -adcin yes -samprate 8000 -hyp out.dat -nbestdir
./nbest -nbest 10 -outlatdir ./lattices

It generated lattice file but didn't generate nbest list.
I could use 'sphinx3_astar' to generate n-best from lattice file but a* search
has an uppler limit and I don't want to increase it.

My current configuration is:

INFO: cmd_ln.c(512): Parsing command line:
pocketsphinx_batch \
-hmm /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k \
-dict /usr/local/share/pocketsphinx/model/lm/en_US/cmu07a.dic \
-lm /usr/local/share/pocketsphinx/model/lm/en_US/wsj0vp.5000.DMP \
-cepdir ./cep \
-ctl files.dat \
-cepext .wav \
-adcin yes \
-samprate 8000 \
-hyp out.dat \
-nbestdir ./nbest \
-nbest 10 \
-outlatdir ./lattices

Current configuration:

-adchdr 0 0
-adcin no yes
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-argfile
-ascale 20.0 2.000000e+01
-backtrace no no
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-bghist no no
-build_outdirs yes yes
-cepdir ./cep
-cepext .mfc .wav
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-compallsen no no
-ctl files.dat
-ctlcount -1 -1
-ctlincr 1 1
-ctloffset 0 0
-ctm
-debug 0
-dict /usr/local/share/pocketsphinx/model/lm/en_US/cmu07a.dic
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgctl
-fsgdir
-fsgext
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k
-hyp out.dat
-hypseg
-input_endian little little
-jsgf
-kdmaxbbi -1 -1
-kdmaxdepth 0 0
-kdtree
-latsize 5000 5000
-lda
-ldadim 0 0
-lextreedump 0 0
-lifter 0 0
-lm /usr/local/share/pocketsphinx/model/lm/en_US/wsj0vp.5000.DMP
-lmctl
-lmname default default
-lmnamectl
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.333333e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 6.500000e+00
-maxhmmpf -1 -1
-maxnewoov 20 20
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mllrctl
-mllrdir
-mllrext
-mmap yes yes
-nbest 0 2
-nbestdir ./nbest
-nbestext .hyp .hyp
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-nwpen 1.0 1.000000e+00
-outlatdir ./lattices
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-5 1.000000e-05
-pl_window 0 0
-rawlogdir
-remove_dc no no
-round_filters yes yes
-samprate 16000 8.000000e+03
-seed -1 -1
-sendump
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec
-tmat
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-usewdphones no no
-uw 1.0 1.000000e+00
-var
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-29
-wip 0.65 6.500000e-01
-wlen 0.025625 2.562500e-02

Thanks.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-09-30

Hello, this feature was just added to pockesphinx subversion. Please checkout
latest trunk.

Its good that you taken my advise to move to pocketsphinx ;) Now take my
advise to use endpointer to process your file. If you want to operate in batch
mode, there is sphinx_cont_fileseg binary which can segment your long files
before processing.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Berker Batur - 2010-09-30

Hi,
I downloaded latest versions of both pocketsphinx and sphinxbase.
When I executed same command that is written at first post, I got a
segmentation fault.
(It generated only lattice file, not nbest list)

Here is the last few lines of the ouput:

INFO: ngram_search_fwdtree.c(1537): 403597 words recognized (33/fr)
INFO: ngram_search_fwdtree.c(1539): 39453036 senones evaluated (3198/fr)
INFO: ngram_search_fwdtree.c(1541): 58350028 channels searched (4730/fr),
5234702 1st, 13125941 last
INFO: ngram_search_fwdtree.c(1545): 780271 words for which last channels
evaluated (63/fr)
INFO: ngram_search_fwdtree.c(1548): 4233321 candidate words for entering last
phone (343/fr)
INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 3500 words
INFO: ngram_search_fwdflat.c(925): 243167 words recognized (20/fr)
INFO: ngram_search_fwdflat.c(927): 15739424 senones evaluated (1276/fr)
INFO: ngram_search_fwdflat.c(929): 27340278 channels searched (2216/fr)
INFO: ngram_search_fwdflat.c(931): 1528130 words searched (123/fr)
INFO: ngram_search_fwdflat.c(933): 1164817 word transitions (94/fr)
INFO: ngram_search.c(1081): not found in last frame, using <sil> instead
INFO: ngram_search.c(1133): lattice start node .0 end node <sil>.12188
INFO: ps_lattice.c(1351): Normalizer P(O) = alpha(<sil>:12188:12333) =
-83016304
INFO: ps_lattice.c(1389): Joint P(O,S) = -86598072 P(S|O) = -3581768
INFO: ps_lattice.c(241): Writing lattice file: ./lattices/test.lat
Segmentation fault </sil></sil></sil>

When I removed -nbestdir and -nbest options from argument, it worked with no
error. The output was:

INFO: ngram_search_fwdtree.c(1537): 403597 words recognized (33/fr)
INFO: ngram_search_fwdtree.c(1539): 39453036 senones evaluated (3198/fr)
INFO: ngram_search_fwdtree.c(1541): 58350028 channels searched (4730/fr),
5234702 1st, 13125941 last
INFO: ngram_search_fwdtree.c(1545): 780271 words for which last channels
evaluated (63/fr)
INFO: ngram_search_fwdtree.c(1548): 4233321 candidate words for entering last
phone (343/fr)
INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 3500 words
INFO: ngram_search_fwdflat.c(925): 243167 words recognized (20/fr)
INFO: ngram_search_fwdflat.c(927): 15739424 senones evaluated (1276/fr)
INFO: ngram_search_fwdflat.c(929): 27340278 channels searched (2216/fr)
INFO: ngram_search_fwdflat.c(931): 1528130 words searched (123/fr)
INFO: ngram_search_fwdflat.c(933): 1164817 word transitions (94/fr)
INFO: ngram_search.c(1081): not found in last frame, using <sil> instead
INFO: ngram_search.c(1133): lattice start node .0 end node <sil>.12188
INFO: ps_lattice.c(1351): Normalizer P(O) = alpha(<sil>:12188:12333) =
-83016304
INFO: ps_lattice.c(1389): Joint P(O,S) = -86598072 P(S|O) = -3581768
INFO: ps_lattice.c(241): Writing lattice file: ./lattices/test.lat
INFO: batch.c(753): test: 123.34 seconds speech, 91.67 seconds CPU, 91.83
seconds wall
INFO: batch.c(755): test: 0.74 xRT (CPU), 0.74 xRT (elapsed)
INFO: batch.c(767): TOTAL 123.34 seconds speech, 91.67 seconds CPU, 91.83
seconds wall
INFO: batch.c(769): AVERAGE 0.74 xRT (CPU), 0.74 xRT (elapsed) </sil></sil></sil>

I will use endpointer soon, can this seg. fault be related with long input
file ?

Thanks for your help.

~~Berker~~

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Berker Batur - 2010-09-30

Hi,
I found the reason of seg. fault.
If there is no such directory as we specify in '-nbestdir ./nbest' argument,
it gives seg. error.
When I made 'nbest directory' before executing the command it worked and
generated nbest list.
It generated lattice directory by itself, and I thought it is the same in
nbest generation.

And another question:
In sphinx 3, nbest list come up with acoustic and language model scores. But
in pocketsphinx generated .hyp file
does not contain scores. Is there a feature that I can obtain scores of these
hypothesis and words in them.

Thanks.

Berker

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-10-01

Hi, I found the reason of seg. fault. If there is no such directory as we
specify in '-nbestdir ./nbest' argument, it gives seg. error. When I made
'nbest directory' before executing the command it worked and generated nbest
list. It generated lattice directory by itself, and I thought it is the same
in nbest generation.

Thanks, this issue was fixed in trunk

And another question: In sphinx 3, nbest list come up with acoustic and
language model scores. But in pocketsphinx generated .hyp file does not
contain scores. Is there a feature that I can obtain scores of these
hypothesis and words in them.

There is total score (last item on the line). Separate acoustic and language
score isn't tracked yet, but it can be implemented if needed.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Berker Batur - 2010-10-02

Hi,

I tried to use sphinx_cont_fileseg. I executed following command first:
sphinx_cont_fileseg -sps 8000 -w -r -i test.wav

27 .raw files were generated.
1-) Is there any way to generate .wav instead of .raw ?

Than, I executed the following command:

pocketsphinx_batch -hmm
/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/ -dict
/usr/local/share/pocketsphinx/model/lm/en_US/cmu07a.dic -lm
/usr/local/share/pocketsphinx/model/lm/en_US/wsj0vp.5000.DMP -cepdir ./cep
-ctl files.dat -cepext .raw -samprate 8000 -hyp out.dat -nbestdir ./nbest
-nbest 10 -outlatdir ./lattices

But, an error occured. Here is the output of pocketsphinx_batch:

ERROR: "batch.c", line 207: File length mismatch: 0x2000c00 != 0xbabf
*** glibc detected *** pocketsphinx_batch: double free or corruption (!prev): 0x0000000002f68340 ***
There is a backtrace and memory map after this. Than 'Aborted'.

I couldn't find 'file length mismatch error' in forum posts. What is the
reason of this error ?

And also, I tried to decode .raw files with sphinx 3. I changed -sps value
because my sphinx 3 acoustic models trained with 16 kHz.
sphinx_cont_fileseg -sps 16000 -w -r -i test.wav
It decoded with no error but recognized words are irrelevant with actual
speech.

Am I doing something wrong in this case ?

Thanks.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-10-04

1-) Is there any way to generate .wav instead of .raw ?

No, you can convert raw files to wav with sox

But, an error occured. Here is the output of pocketsphinx_batch:

If you decode raw files you need to add -adcin yes. You forgot that

And also, I tried to decode .raw files with sphinx 3. I changed -sps value
because my sphinx 3 acoustic models trained with 16 kHz.sphinx_cont_fileseg
-sps 16000 -w -r -i test.wav It decoded with no error but recognized words are
irrelevant with actual speech.

Model trained with 16 khz can't decode 8khz audio because proper frequency
bands are missing. It's unrelated to -sps. Sample rate option (-sps) is used
to configure frontend and should match the sampling rate of the audio.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-10-04

The crash was fixed in trunk, thanks for the report!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Berker Batur - 2010-10-04

Model trained with 16 khz can't decode 8khz audio because proper frequency
bands are missing. It's unrelated to -sps. Sample rate option (-sps) is used
to configure frontend and should match the sampling rate of the audio.

I forgot to mention that, I also changed the sampling rate of the audio to
16000 before using sphinx_cont_fileseg. So I expected from Sphinx 3 to decode
.raw files with a good accuracy.
I will try this issue with some other examples soon.

Thanks.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Berker Batur - 2010-10-04

Hi,
When I want to decode a wav file with lenght of 6.10 min. with
pocketsphinx_batch, it gives an error:

INFO: ngram_search_fwdtree.c(99): 788 unique initial diphones
INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 60 single-
phone words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 60
single-phone words
INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 14016
INFO: ngram_search_fwdtree.c(338): after: 443 root, 13888 non-root channels,
22 single-phone words
INFO: ngram_search_fwdflat.c(156): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: cmn.c(175): CMN: 50.93 2.02 -0.81 -1.49 -2.04 -0.38 -0.30 0.19 -0.38
-0.37 -0.33 -0.36 -0.26
ERROR: "acmod.c", line 856: Circular feature buffer cannot be rewound (output
frame 0, alloc -28533)
ERROR: "ngram_search.c", line 1029: Couldn't find in first frame
ERROR: "batch.c", line 449: Failed to obtain word lattice for utterance test
ERROR: "ngram_search.c", line 1029: Couldn't find in first frame
Segmentation fault

I generated the wav file by merging a 2.03 min. lenght wav file as 3 times,
and pocketsphinx doceded this 2.03 length wav file with no error. Is this
error related with the lenght of the file ? Or something else?

I also couldn't manage using sphinx_cont_fileseg. It generates sometimes many
raw files and they can be decoded correctly, but it generates sometimes only 1
raw file and pocketsphinx_batch couldn't decode it.
How should be the process of using sphinx_cont_fileseg in decoding a 8 kHz and
long (greater than 10 min.) wav file ?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-10-21

Hm, quite some time gone, sorry for not replying

Is this error related with the lenght of the file ?

Yes

I also couldn't manage using sphinx_cont_fileseg. It generates sometimes
many raw files and they can be decoded correctly, but it generates sometimes
only 1 raw file and pocketsphinx_batch couldn't decode it.

That seems to be a bug. It would be nice to see that file which doesn't work

How should be the process of using sphinx_cont_fileseg in decoding a 8 kHz
and long (greater than 10 min.) wav file ?

Exactly as you described, it should split long file on many segments and you
can process each one in batch mode. Or your can use pocketsphinx_continuous
that will do the same for you.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Pocketsphinx n-best lists

Speech Recognition Toolkit

Forums

Help

Pocketsphinx n-best lists

Pocketsphinx n-best lists

Speech Recognition Toolkit

Forums

Help

Pocketsphinx n-best lists document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Pocketsphinx n-best lists