Menu

Pocketsphinx n-best lists

2010-09-30
2012-09-22
  • Berker Batur

    Berker Batur - 2010-09-30

    Hi,
    How can I obtain n-best list with pocketsphinx_batch ?

    I execute the following command:
    pocketsphinx_batch -hmm
    /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k -dict
    /usr/local/share/pocketsphinx/model/lm/en_US/cmu07a.dic -lm
    /usr/local/share/pocketsphinx/model/lm/en_US/wsj0vp.5000.DMP -cepdir ./cep
    -ctl files.dat -cepext .wav -adcin yes -samprate 8000 -hyp out.dat -nbestdir
    ./nbest -nbest 10 -outlatdir ./lattices

    It generated lattice file but didn't generate nbest list.
    I could use 'sphinx3_astar' to generate n-best from lattice file but a* search
    has an uppler limit and I don't want to increase it.

    My current configuration is:

    INFO: cmd_ln.c(512): Parsing command line:
    pocketsphinx_batch \
    -hmm /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k \
    -dict /usr/local/share/pocketsphinx/model/lm/en_US/cmu07a.dic \
    -lm /usr/local/share/pocketsphinx/model/lm/en_US/wsj0vp.5000.DMP \
    -cepdir ./cep \
    -ctl files.dat \
    -cepext .wav \
    -adcin yes \
    -samprate 8000 \
    -hyp out.dat \
    -nbestdir ./nbest \
    -nbest 10 \
    -outlatdir ./lattices

    Current configuration:

    -adchdr 0 0
    -adcin no yes
    -agc none none
    -agcthresh 2.0 2.000000e+00
    -alpha 0.97 9.700000e-01
    -argfile
    -ascale 20.0 2.000000e+01
    -backtrace no no
    -beam 1e-48 1.000000e-48
    -bestpath yes yes
    -bestpathlw 9.5 9.500000e+00
    -bghist no no
    -build_outdirs yes yes
    -cepdir ./cep
    -cepext .mfc .wav
    -ceplen 13 13
    -cmn current current
    -cmninit 8.0 8.0
    -compallsen no no
    -ctl files.dat
    -ctlcount -1 -1
    -ctlincr 1 1
    -ctloffset 0 0
    -ctm
    -debug 0
    -dict /usr/local/share/pocketsphinx/model/lm/en_US/cmu07a.dic
    -dictcase no no
    -dither no no
    -doublebw no no
    -ds 1 1
    -fdict
    -feat 1s_c_d_dd 1s_c_d_dd
    -featparams
    -fillprob 1e-8 1.000000e-08
    -frate 100 100
    -fsg
    -fsgctl
    -fsgdir
    -fsgext
    -fsgusealtpron yes yes
    -fsgusefiller yes yes
    -fwdflat yes yes
    -fwdflatbeam 1e-64 1.000000e-64
    -fwdflatefwid 4 4
    -fwdflatlw 8.5 8.500000e+00
    -fwdflatsfwin 25 25
    -fwdflatwbeam 7e-29 7.000000e-29
    -fwdtree yes yes
    -hmm /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k
    -hyp out.dat
    -hypseg
    -input_endian little little
    -jsgf
    -kdmaxbbi -1 -1
    -kdmaxdepth 0 0
    -kdtree
    -latsize 5000 5000
    -lda
    -ldadim 0 0
    -lextreedump 0 0
    -lifter 0 0
    -lm /usr/local/share/pocketsphinx/model/lm/en_US/wsj0vp.5000.DMP
    -lmctl
    -lmname default default
    -lmnamectl
    -logbase 1.0001 1.000100e+00
    -logfn
    -logspec no no
    -lowerf 133.33334 1.333333e+02
    -lpbeam 1e-40 1.000000e-40
    -lponlybeam 7e-29 7.000000e-29
    -lw 6.5 6.500000e+00
    -maxhmmpf -1 -1
    -maxnewoov 20 20
    -maxwpf -1 -1
    -mdef
    -mean
    -mfclogdir
    -mixw
    -mixwfloor 0.0000001 1.000000e-07
    -mllr
    -mllrctl
    -mllrdir
    -mllrext
    -mmap yes yes
    -nbest 0 2
    -nbestdir ./nbest
    -nbestext .hyp .hyp
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 40
    -nwpen 1.0 1.000000e+00
    -outlatdir ./lattices
    -pbeam 1e-48 1.000000e-48
    -pip 1.0 1.000000e+00
    -pl_beam 1e-10 1.000000e-10
    -pl_pbeam 1e-5 1.000000e-05
    -pl_window 0 0
    -rawlogdir
    -remove_dc no no
    -round_filters yes yes
    -samprate 16000 8.000000e+03
    -seed -1 -1
    -sendump
    -senmgau
    -silprob 0.005 5.000000e-03
    -smoothspec no no
    -svspec
    -tmat
    -tmatfloor 0.0001 1.000000e-04
    -topn 4 4
    -topn_beam 0 0
    -toprule
    -transform legacy legacy
    -unit_area yes yes
    -upperf 6855.4976 6.855498e+03
    -usewdphones no no
    -uw 1.0 1.000000e+00
    -var
    -varfloor 0.0001 1.000000e-04
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wbeam 7e-29 7.000000e-29
    -wip 0.65 6.500000e-01
    -wlen 0.025625 2.562500e-02

    Thanks.

     
  • Nickolay V. Shmyrev

    Hello, this feature was just added to pockesphinx subversion. Please checkout
    latest trunk.

    Its good that you taken my advise to move to pocketsphinx ;) Now take my
    advise to use endpointer to process your file. If you want to operate in batch
    mode, there is sphinx_cont_fileseg binary which can segment your long files
    before processing.

     
  • Berker Batur

    Berker Batur - 2010-09-30

    Hi,
    I downloaded latest versions of both pocketsphinx and sphinxbase.
    When I executed same command that is written at first post, I got a
    segmentation fault.
    (It generated only lattice file, not nbest list)

    Here is the last few lines of the ouput:

    INFO: ngram_search_fwdtree.c(1537): 403597 words recognized (33/fr)
    INFO: ngram_search_fwdtree.c(1539): 39453036 senones evaluated (3198/fr)
    INFO: ngram_search_fwdtree.c(1541): 58350028 channels searched (4730/fr),
    5234702 1st, 13125941 last
    INFO: ngram_search_fwdtree.c(1545): 780271 words for which last channels
    evaluated (63/fr)
    INFO: ngram_search_fwdtree.c(1548): 4233321 candidate words for entering last
    phone (343/fr)
    INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 3500 words
    INFO: ngram_search_fwdflat.c(925): 243167 words recognized (20/fr)
    INFO: ngram_search_fwdflat.c(927): 15739424 senones evaluated (1276/fr)
    INFO: ngram_search_fwdflat.c(929): 27340278 channels searched (2216/fr)
    INFO: ngram_search_fwdflat.c(931): 1528130 words searched (123/fr)
    INFO: ngram_search_fwdflat.c(933): 1164817 word transitions (94/fr)
    INFO: ngram_search.c(1081): not found in last frame, using <sil> instead
    INFO: ngram_search.c(1133): lattice start node .0 end node <sil>.12188
    INFO: ps_lattice.c(1351): Normalizer P(O) = alpha(<sil>:12188:12333) =
    -83016304
    INFO: ps_lattice.c(1389): Joint P(O,S) = -86598072 P(S|O) = -3581768
    INFO: ps_lattice.c(241): Writing lattice file: ./lattices/test.lat
    Segmentation fault </sil></sil>
    </sil>

    When I removed -nbestdir and -nbest options from argument, it worked with no
    error. The output was:

    INFO: ngram_search_fwdtree.c(1537): 403597 words recognized (33/fr)
    INFO: ngram_search_fwdtree.c(1539): 39453036 senones evaluated (3198/fr)
    INFO: ngram_search_fwdtree.c(1541): 58350028 channels searched (4730/fr),
    5234702 1st, 13125941 last
    INFO: ngram_search_fwdtree.c(1545): 780271 words for which last channels
    evaluated (63/fr)
    INFO: ngram_search_fwdtree.c(1548): 4233321 candidate words for entering last
    phone (343/fr)
    INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 3500 words
    INFO: ngram_search_fwdflat.c(925): 243167 words recognized (20/fr)
    INFO: ngram_search_fwdflat.c(927): 15739424 senones evaluated (1276/fr)
    INFO: ngram_search_fwdflat.c(929): 27340278 channels searched (2216/fr)
    INFO: ngram_search_fwdflat.c(931): 1528130 words searched (123/fr)
    INFO: ngram_search_fwdflat.c(933): 1164817 word transitions (94/fr)
    INFO: ngram_search.c(1081):
    not found in last frame, using <sil> instead
    INFO: ngram_search.c(1133): lattice start node .0 end node <sil>.12188
    INFO: ps_lattice.c(1351): Normalizer P(O) = alpha(<sil>:12188:12333) =
    -83016304
    INFO: ps_lattice.c(1389): Joint P(O,S) = -86598072 P(S|O) = -3581768
    INFO: ps_lattice.c(241): Writing lattice file: ./lattices/test.lat
    INFO: batch.c(753): test: 123.34 seconds speech, 91.67 seconds CPU, 91.83
    seconds wall
    INFO: batch.c(755): test: 0.74 xRT (CPU), 0.74 xRT (elapsed)
    INFO: batch.c(767): TOTAL 123.34 seconds speech, 91.67 seconds CPU, 91.83
    seconds wall
    INFO: batch.c(769): AVERAGE 0.74 xRT (CPU), 0.74 xRT (elapsed) </sil></sil>
    </sil>

    I will use endpointer soon, can this seg. fault be related with long input
    file ?

    Thanks for your help.

    Berker

     
  • Berker Batur

    Berker Batur - 2010-09-30

    Hi,
    I found the reason of seg. fault.
    If there is no such directory as we specify in '-nbestdir ./nbest' argument,
    it gives seg. error.
    When I made 'nbest directory' before executing the command it worked and
    generated nbest list.
    It generated lattice directory by itself, and I thought it is the same in
    nbest generation.

    And another question:
    In sphinx 3, nbest list come up with acoustic and language model scores. But
    in pocketsphinx generated .hyp file
    does not contain scores. Is there a feature that I can obtain scores of these
    hypothesis and words in them.

    Thanks.

    Berker

     
  • Nickolay V. Shmyrev

    Hi, I found the reason of seg. fault. If there is no such directory as we
    specify in '-nbestdir ./nbest' argument, it gives seg. error. When I made
    'nbest directory' before executing the command it worked and generated nbest
    list. It generated lattice directory by itself, and I thought it is the same
    in nbest generation.

    Thanks, this issue was fixed in trunk

    And another question: In sphinx 3, nbest list come up with acoustic and
    language model scores. But in pocketsphinx generated .hyp file does not
    contain scores. Is there a feature that I can obtain scores of these
    hypothesis and words in them.

    There is total score (last item on the line). Separate acoustic and language
    score isn't tracked yet, but it can be implemented if needed.

     
  • Berker Batur

    Berker Batur - 2010-10-02

    Hi,

    I tried to use sphinx_cont_fileseg. I executed following command first:
    sphinx_cont_fileseg -sps 8000 -w -r -i test.wav

    27 .raw files were generated.
    1-) Is there any way to generate .wav instead of .raw ?

    Than, I executed the following command:

    pocketsphinx_batch -hmm
    /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/ -dict
    /usr/local/share/pocketsphinx/model/lm/en_US/cmu07a.dic -lm
    /usr/local/share/pocketsphinx/model/lm/en_US/wsj0vp.5000.DMP -cepdir ./cep
    -ctl files.dat -cepext .raw -samprate 8000 -hyp out.dat -nbestdir ./nbest
    -nbest 10 -outlatdir ./lattices

    But, an error occured. Here is the output of pocketsphinx_batch:

    ERROR: "batch.c", line 207: File length mismatch: 0x2000c00 != 0xbabf
    *** glibc detected *** pocketsphinx_batch: double free or corruption (!prev): 0x0000000002f68340 ***
    There is a backtrace and memory map after this. Than 'Aborted'.

    I couldn't find 'file length mismatch error' in forum posts. What is the
    reason of this error ?

    And also, I tried to decode .raw files with sphinx 3. I changed -sps value
    because my sphinx 3 acoustic models trained with 16 kHz.
    sphinx_cont_fileseg -sps 16000 -w -r -i test.wav
    It decoded with no error but recognized words are irrelevant with actual
    speech.

    Am I doing something wrong in this case ?

    Thanks.

     
  • Nickolay V. Shmyrev

    1-) Is there any way to generate .wav instead of .raw ?

    No, you can convert raw files to wav with sox

    But, an error occured. Here is the output of pocketsphinx_batch:

    If you decode raw files you need to add -adcin yes. You forgot that

    And also, I tried to decode .raw files with sphinx 3. I changed -sps value
    because my sphinx 3 acoustic models trained with 16 kHz.sphinx_cont_fileseg
    -sps 16000 -w -r -i test.wav It decoded with no error but recognized words are
    irrelevant with actual speech.

    Model trained with 16 khz can't decode 8khz audio because proper frequency
    bands are missing. It's unrelated to -sps. Sample rate option (-sps) is used
    to configure frontend and should match the sampling rate of the audio.

     
  • Nickolay V. Shmyrev

    The crash was fixed in trunk, thanks for the report!

     
  • Berker Batur

    Berker Batur - 2010-10-04

    Model trained with 16 khz can't decode 8khz audio because proper frequency
    bands are missing. It's unrelated to -sps. Sample rate option (-sps) is used
    to configure frontend and should match the sampling rate of the audio.

    I forgot to mention that, I also changed the sampling rate of the audio to
    16000 before using sphinx_cont_fileseg. So I expected from Sphinx 3 to decode
    .raw files with a good accuracy.
    I will try this issue with some other examples soon.

    Thanks.

     
  • Berker Batur

    Berker Batur - 2010-10-04

    Hi,
    When I want to decode a wav file with lenght of 6.10 min. with
    pocketsphinx_batch, it gives an error:

    INFO: ngram_search_fwdtree.c(99): 788 unique initial diphones
    INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 60 single-
    phone words
    INFO: ngram_search_fwdtree.c(186): Creating search tree
    INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 60
    single-phone words
    INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 14016
    INFO: ngram_search_fwdtree.c(338): after: 443 root, 13888 non-root channels,
    22 single-phone words
    INFO: ngram_search_fwdflat.c(156): fwdflat: min_ef_width = 4, max_sf_win = 25
    INFO: cmn.c(175): CMN: 50.93 2.02 -0.81 -1.49 -2.04 -0.38 -0.30 0.19 -0.38
    -0.37 -0.33 -0.36 -0.26
    ERROR: "acmod.c", line 856: Circular feature buffer cannot be rewound (output
    frame 0, alloc -28533)
    ERROR: "ngram_search.c", line 1029: Couldn't find in first frame
    ERROR: "batch.c", line 449: Failed to obtain word lattice for utterance test
    ERROR: "ngram_search.c", line 1029: Couldn't find in first frame
    Segmentation fault

    I generated the wav file by merging a 2.03 min. lenght wav file as 3 times,
    and pocketsphinx doceded this 2.03 length wav file with no error. Is this
    error related with the lenght of the file ? Or something else?

    I also couldn't manage using sphinx_cont_fileseg. It generates sometimes many
    raw files and they can be decoded correctly, but it generates sometimes only 1
    raw file and pocketsphinx_batch couldn't decode it.
    How should be the process of using sphinx_cont_fileseg in decoding a 8 kHz and
    long (greater than 10 min.) wav file ?

     
  • Nickolay V. Shmyrev

    Hm, quite some time gone, sorry for not replying

    Is this error related with the lenght of the file ?

    Yes

    I also couldn't manage using sphinx_cont_fileseg. It generates sometimes
    many raw files and they can be decoded correctly, but it generates sometimes
    only 1 raw file and pocketsphinx_batch couldn't decode it.

    That seems to be a bug. It would be nice to see that file which doesn't work

    How should be the process of using sphinx_cont_fileseg in decoding a 8 kHz
    and long (greater than 10 min.) wav file ?

    Exactly as you described, it should split long file on many segments and you
    can process each one in batch mode. Or your can use pocketsphinx_continuous
    that will do the same for you.

     

Log in to post a comment.