CMU Sphinx / Forums / Help: Strange Crash With PocketSphinx

Hello,

I'm using PocketSphinx in a training simulator I am writing and I am getting a strange crash about 1/6 to 1/10 times I run the simulator. The crash always occurs on EXACTLY the second utterance, always gives the same call stack, and is always a read access violation at 0x0.

On the simulation, I am trying to achieve as close to real time results as possible. To do this, I am capturing the live audio into a collection of circular buffers. The end result being that every 3 seconds I send the previous 4 seconds to sphinx for processing. This means the first 1 second that is processed is the same as the last 1 second that was previously processed.

The reasoning for this approach is 2 parts:
1) There is no guarantee that the user will have frequent enough breaks of silence. Waiting for silence to process each speech segment would likely make it no longer appear to be "real time" or may result in segments that are just too long for run-time processing.
2) Since I process speech every 3 seconds, is quite likely that a segment of audio I send in for processing ends the middle of a word. Thus I include the last 1 second of the previously processed audio segment (1 second was chosen arbitrarily). I use the beginning and end timestamps of each word (from PocketSphinx) to eliminate word duplicates.

To facilitate this flow, every 3 seconds I make the following calls:

int procError = ps_start_utt(speechDecoder, NULL);
char const utteranceID = ps_get_uttid(speechDecoder);
procError = ps_process_raw(speechDecoder, (qint16)ptrToProcessBuffer, (bytesRecorded), 0, 0);
procError = ps_end_utt(speechDecoder);
ps_seg_t *recWord = ps_seg_iter(speechDecoder, &pathScore);

// retrieve each word recognized by using this in a loop:
ps_seg_frames(recWord, &startFrame, &endFrame);
float confidenceScore = logmath_exp(ps_get_logmath(speechDecoder), postProb);
QString wordString = ps_seg_word(recWord);
recWord = ps_seg_next(recWord);

Since I do not have a previous recording buffer the first time audio is sent to PocketSphinx, it is sent as a 3 second chunk (48000 bytes)instead of a 4 second chunk (64000 bytes). The segments that are sent to PocketSphinx may at times contain no speech at all. Another misc. note is that this whole recognition process is off in a separate thread from the GUI to prevent stuttering (PocketSphinx and all the systems I use directly with it are on the same thread, nothing outside this thread has access to PocketSphinx).

On the crash, the top of the call stack I am getting is always:
0 _VEC_memcpy MSVCR100D 0x5f64be8d
1 fe_shift_frame fe_sigproc.c 636 0x3a5393
2 fe_process_frames fe_interface.c 435 0x3a3499
3 acmod_process_raw acmod.c 674 0x1002a797
4 ps_process_raw pocketsphinx.c 769 0x1005bcb8
5 SpeechRecognizer::RecognizeSpeech SpeechRecognizer.cpp 85 0x2c32fb

The output I got from PocketSphinx on the most recent crash is (I scrubbed the path to make it more readable):

INFO: cmd_ln.c(691): Parsing command line:
\
-hmm PATH/sphinx_models/hub4wsj_sc_8k \
-lm PATH/sphinx_models/SM1.lm \
-dict PATH/sphinx_models/SM1.dic \
-samprate 8000 \
-fwdflat yes \
-bestpath yes

Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+000
-alpha 0.97 9.700000e-001
-ascale 20.0 2.000000e+001
-aw 1 1
-backtrace no no
-beam 1e-48 1.000000e-048
-bestpath yes yes
-bestpathlw 9.5 9.500000e+000
-bghist no no
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-compallsen no no
-debug 0
-dict PATH/sphinx_models/SM1.dic
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-008
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-064
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+000
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-029
-fwdtree yes yes
-hmm PATH/sphinx_models/hub4wsj_sc_8k
-input_endian little little
-jsgf
-kdmaxbbi -1 -1
-kdmaxdepth 0 0
-kdtree
-latsize 5000 5000
-lda
-ldadim 0 0
-lextreedump 0 0
-lifter 0 0
-lm PATH/sphinx_models/SM1.lm
-lmctl
-lmname default default
-logbase 1.0001 1.000100e+000
-logfn
-logspec no no
-lowerf 133.33334 1.333333e+002
-lpbeam 1e-40 1.000000e-040
-lponlybeam 7e-29 7.000000e-029
-lw 6.5 6.500000e+000
-maxhmmpf -1 -1
-maxnewoov 20 20
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-min_endfr 0 0
-mixw
-mixwfloor 0.0000001 1.000000e-007
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-nwpen 1.0 1.000000e+000
-pbeam 1e-48 1.000000e-048
-pip 1.0 1.000000e+000
-pl_beam 1e-10 1.000000e-010
-pl_pbeam 1e-5 1.000000e-005
-pl_window 0 0
-rawlogdir
-remove_dc no no
-round_filters yes yes
-samprate 16000 8.000000e+003
-seed -1 -1
-sendump
-senlogdir
-senmgau
-silprob 0.005 5.000000e-003
-smoothspec no no
-svspec
-tmat
-tmatfloor 0.0001 1.000000e-004
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+003
-usewdphones no no
-uw 1.0 1.000000e+000
-var
-varfloor 0.0001 1.000000e-004
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-029
-wip 0.65 6.500000e-001
-wlen 0.025625 2.562500e-002

INFO: cmd_ln.c(691): Parsing command line:
\
-nfilt 20 \
-lowerf 1 \
-upperf 4000 \
-wlen 0.025 \
-transform dct \
-round_filters no \
-remove_dc yes \
-svspec 0-12/13-25/26-38 \
-feat 1s_c_d_dd \
-agc none \
-cmn current \
-cmninit 56,-3,1 \
-varnorm no

Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+000
-alpha 0.97 9.700000e-001
-ceplen 13 13
-cmn current current
-cmninit 8.0 56,-3,1
-dither no no
-doublebw no no
-feat 1s_c_d_dd 1s_c_d_dd
-frate 100 100
-input_endian little little
-lda
-ldadim 0 0
-lifter 0 0
-logspec no no
-lowerf 133.33334 1.000000e+000
-ncep 13 13
-nfft 512 512
-nfilt 40 20
-remove_dc no yes
-round_filters yes no
-samprate 16000 8.000000e+003
-seed -1 -1
-smoothspec no no
-svspec 0-12/13-25/26-38
-transform legacy dct
-unit_area yes yes
-upperf 6855.4976 4.000000e+003
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wlen 0.025625 2.500000e-002

INFO: acmod.c(246): Parsed model-specific feature parameters from PATH/sphinx_models/hub4wsj_sc_8k/feat.params
INFO: feat.c(713): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(142): mean[0]= 12.00, mean[1..12]= 0.0
INFO: acmod.c(167): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(517): Reading model definition: PATH/sphinx_models/hub4wsj_sc_8k/mdef
INFO: mdef.c(528): Found byte-order mark BMDF, assuming this is a binary mdef file
INFO: bin_mdef.c(336): Reading binary model definition: PATH/sphinx_models/hub4wsj_sc_8k/mdef
INFO: bin_mdef.c(513): 50 CI-phone, 143047 CD-phone, 3 emitstate/phone, 150 CI-sen, 5150 Sen, 27135 Sen-Seq
INFO: tmat.c(205): Reading HMM transition probability matrices: PATH/sphinx_models/hub4wsj_sc_8k/transition_matrices
INFO: acmod.c(121): Attempting to use SCHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: PATH/sphinx_models/hub4wsj_sc_8k/means
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: PATH/sphinx_models/hub4wsj_sc_8k/variances
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(354): 0 variance values floored
INFO: s2_semi_mgau.c(903): Loading senones from dump file PATH/sphinx_models/hub4wsj_sc_8k/sendump
INFO: s2_semi_mgau.c(927): BEGIN FILE FORMAT DESCRIPTION
INFO: s2_semi_mgau.c(1022): Using memory-mapped I/O for senones
INFO: s2_semi_mgau.c(1296): Maximum top-N: 4 Top-N beams: 0 0 0
INFO: dict.c(317): Allocating 4144 * 20 bytes (80 KiB) for word entries
INFO: dict.c(332): Reading main dictionary: PATH/sphinx_models/SM1.dic
INFO: dict.c(211): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(335): 37 words read
INFO: dict.c(341): Reading filler dictionary: PATH/sphinx_models/hub4wsj_sc_8k/noisedict
INFO: dict.c(211): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(344): 11 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(404): Allocating 50^3 * 2 bytes (244 KiB) for word-initial triphones
INFO: dict2pid.c(131): Allocated 30200 bytes (29 KiB) for word-final triphones
INFO: dict2pid.c(195): Allocated 30200 bytes (29 KiB) for single-phone word triphones
INFO: ngram_model_arpa.c(477): ngrams 1=33, 2=114, 3=190
INFO: ngram_model_arpa.c(135): Reading unigrams
INFO: ngram_model_arpa.c(516): 33 = #unigrams created
INFO: ngram_model_arpa.c(195): Reading bigrams
INFO: ngram_model_arpa.c(533): 114 = #bigrams created
INFO: ngram_model_arpa.c(534): 36 = #prob2 entries
INFO: ngram_model_arpa.c(542): 29 = #bo_wt2 entries
INFO: ngram_model_arpa.c(292): Reading trigrams
INFO: ngram_model_arpa.c(555): 190 = #trigrams created
INFO: ngram_model_arpa.c(556): 15 = #prob3 entries
INFO: ngram_search_fwdtree.c(99): 25 unique initial diphones
INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 14 single-phone words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 14 single-phone words
INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 185
INFO: ngram_search_fwdtree.c(338): after: 25 root, 57 non-root channels, 13 single-phone words
INFO: ngram_search_fwdflat.c(156): fwdflat: min_ef_width = 4, max_sf_win = 25

Any ideas on what I could be doing to cause this crash?

Thanks in advance!

Thank you very much for taking the time to look over this crash information and for the recommendation. Unfortunately I do not think I will be able to use ps_get_hyp in my application. It is necessary to my application to get specific time stamps for each word, and ps_get_hyp does not appear to do this.

I knew before designing my application in this way that my use of ps_start_utt and ps_end_utt was unorthodox. However I had thought that at worst case this would just lead to reduced accuracy. Is it possible that this usage is what is causing the crash?

My application is a training simulator designed to allow Stage Managers to practice. I mentioned some of the specific about this in my other thread about tuning PocketSphinx for performance. I am using PocketSphinx to allow grading of what the user says during a simulation session. The grading criteria for each word is:
1) Must be spoken clearly
2) Must be the correct word (out of several variants)
3) Must be said at the right time
4) Each word's score is averaged into a phrase of 6-10 words. The user is not told whether each word is correct, instead they are told if a given phrase is correct (determined by having >= 50% of the words correct).

Since the correct words are averaged over a phrase, the accuracy of PocketSphinx can acceptably be in the 60% range, although 70-90% would be ideal. By far, the most important thing for the speech recognition is consistent real-time update intervals. These updates are used to give the user a rough indication of how well they are doing in their practice.

To aid in tracking the crash bug, I have made my application save the recorded audio to wav file. Since I know the crash always occurs when processing the second utterance, it is trivial to close the file just before this utterance. I then just ran the simulator enough times to get the crash to occur. I have attached a sample from one of the "crash runs". Unfortunately it doesn't appear to be anything special: just silence with some mechanical noise. This test did prove that the buffers themselves are not using invalid memory addresses.

While testing, I did come across a new crash stack. It happens a lot less frequently than the one above, but perhaps it can reveal a pattern in the crashes:

0 memcpy MSVCR100D 0x5f67c9c7
1 fe_process_frames fe_interface.c 462 0x4635fe
2 acmod_process_raw acmod.c 674 0x1002a797
3 ps_process_raw pocketsphinx.c 769 0x1005bcb8
4 SpeechRecognizer::RecognizeSpeech SpeechRecognizer.cpp 90 0x13e331b

Thank you very much for taking the time to assist me with this. It has been a very frustrating issue and I'm not the type to ask for help very often. Does this information give you any ideas on what might be causing the crash?

Last edit: Scorx Ion 2013-03-16

prevSimSample.wav

Strange Crash With PocketSphinx

Speech Recognition Toolkit

Forums

Help

Strange Crash With PocketSphinx document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Strange Crash With PocketSphinx