I am using the released version 0.6 release of pocketsphinx and sphinxbase
nightly build from 3/7/10.
I get the following message when running pocketsphinx_continuous:
"An unhandled win32 exception occurred in pocketsphinx_continuous.exe "
I was using sphinx3, but saw posting that it is recommended to use
pocketsphinx. I was getting "Failed to retrieve viterbi history" on sphinx3
when using sphinx3_livedecode. Anyways, below is the log from
pocketsphinx_continuous
This is what I got on the shell. It doesn't mention any errors.:
The following properties I used is from using sphinxtrain in building the
dictionary and training db:
-mdef ./model_parameters/doheny.cd_cont_13/mdef
-mean ./model_parameters/doheny.cd_cont_13/means
-var ./model_parameters/doheny.cd_cont_13/variances
-mixw ./model_parameters/doheny.cd_cont_13/mixture_weights
-tmat ./model_parameters/doheny.cd_cont_13/transition_matrices
I noticed that there were several types of model parameters generated and so I
used the one when performing the preliminary decode.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am thinking it has to do with the way my lm file is created because of error
"ERROR: "ngram_model_arpa.c", line 76: No \data\ mark in LM file", but am
unclear what exactly.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
After looking at the source file I noticed that -lm is using a txt format and
not the DMP format. I then changed the paramter to point to
./etc/doheny.ug.lm.
After that, I see that it can read the lm file properly (log below), but still
get a same win32 exception.
INFO: ngram_model_arpa.c(476): ngrams 1=14, 2=1, 3=0
INFO: ngram_model_arpa.c(135): Reading unigrams
INFO: ngram_model_arpa.c(515): 14 = #unigrams created
INFO: ngram_model_arpa.c(194): Reading bigrams
INFO: ngram_model_arpa.c(531): 1 = #bigrams created
INFO: ngram_model_arpa.c(532): 2 = #prob2 entries
INFO: ngram_search_fwdtree.c(99): 11 unique initial diphones
INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 4 single-phone
w
ords
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 4
single
-phone words
INFO: ngram_search_fwdtree.c(324): after: max nonroot chan increased to 146
INFO: ngram_search_fwdtree.c(333): after: 11 root, 18 non-root channels, 3
singl
e-phone words
INFO: ngram_search_fwdflat.c(153): fwdflat: min_ef_width = 4, max_sf_win = 25
Allocating 32 buffers of 2500 samples each
INFO: continuous.c(261): bin\pocketsphinx_continuous.exe COMPILED ON: Apr 19
201
0, AT: 12:51:10
READY....
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The segfault occurs in tied_mgau_common.h in routine LOGMATH_INLINE int
fast_logmath_add at the line 120 where
return r - (((uint8 *)t->table));
is called.
In debugging, this is the stack trace that I see:
pocketsphinx.dll!fast_logmath_add(logmath_s * lmath=0x002b5d30, int
mlx=-2578117, int mly=-3432857) Line 120 + 0x8 bytes C
pocketsphinx.dll!ptm_mgau_senone_eval(ptm_mgau_s * s=0x002b5cb8, short *
senone_scores=0x00b58408, unsigned char * senone_active=0x00b58620, int
n_senone_active=14, int compall=0) Line 385 + 0x21 bytes C
pocketsphinx.dll!ptm_mgau_frame_eval(ps_mgau_s * ps=0x002b5cb8, short *
senone_scores=0x00b58408, unsigned char * senone_active=0x00b58620, int
n_senone_active=14, float * * featbuf=0x00b5875c, int frame=3, int
compallsen=0) Line 448 + 0x19 bytes C
pocketsphinx.dll!acmod_score(acmod_s * acmod=0x00ad61d8, int *
inout_frame_idx=0x0013d968) Line 823 + 0x44 bytes C
pocketsphinx.dll!ngram_fwdtree_search(ngram_search_s * ngs=0x00b79028, int
frame_idx=3) Line 1420 + 0x10 bytes C
pocketsphinx.dll!ngram_search_step(ps_search_s * search=0x00b79028, int
frame_idx=3) Line 687 + 0xd bytes C
pocketsphinx.dll!ps_search_forward(ps_decoder_s * ps=0x00ad6098) Line 697 +
0x27 bytes C
pocketsphinx.dll!ps_process_raw(ps_decoder_s * ps=0x00ad6098, const short *
data=0x0013ea64, unsigned int n_samples=2566, int no_search=0, int full_utt=0)
Line 729 + 0x9 bytes C
pocketsphinx_continuous.exe!utterance_loop() Line 156 + 0x21 bytes C
pocketsphinx_continuous.exe!main(int argc=2, char * * argv=0x00ad3250) Line
267 C
pocketsphinx_continuous.exe!__tmainCRTStartup() Line 586 + 0x19 bytes C
pocketsphinx_continuous.exe!mainCRTStartup() Line 403 C
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Ok, got to work! I realized that some of the options I was passing in must be
conflicting, and so I reduced the options down to the basics and it works.
Problem though is that now when I run the application, the results are
completely off of what statistics I gathered from doing the preliminary
decode. It is as though some dictionary words are not being recognized at all.
Any suggestions??
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have messaged you the link to the training folder.
The problem I am having is that when I use pocketsphinx_continuous, i do not
have the same detection that I get when I did the preliminary decode when
creating the training data. I understand that my training data is small, and
that it will pose problems, but I want to make sure I am running
pocketsphinx_continuous with the proper parameters before I mature the
training data. I expect that since I am the sole speaker, that I would see
reasonable results when I am running pocketsphinx_continuous.
As of now, regardless of my attempts, it very rarely recognizes any of my
speech. The results I have from pocketsphinx_continuous does not compare to
the preliminary decode results. What configuration issues do I have to achieve
comparable results?
Second question is: what mdef file should I use? In using the tutorial to
create a training folder, it created several mdef files. I am confused in
which one would would be best for continuous, live speech? There are about a
dozen continuous-related mdef files.
My initial attempt was to use sphinx3_livedecode, but I saw a posting that
recommended pocketsphinx_continuous instead.
The arguments that I use for pocketsphinx can be found in
batch/args.doheny.pocketsphinx.mod
I run pocketsphinx from the root folder using the following command:
bin\pocketsphinx_continuous.exe batch\args.doheny.pocketsphinx.mod
I hope I am clarifying my 2 questions properly.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I understand that my training data is small, and that it will pose problems,
but I want to make sure I am running pocketsphinx_continuous with the proper
parameters before I mature the training data.
You can't do that. It's impossible to run with small data, you need large db
from early beginning.
I have messaged you the link to the training folder.
I haven't got anything
Second question is: what mdef file should I use?
The one in model_parameters/yourmodel.cd_cont_yoursenones. Doing tuturial
could help you, in particular don't ignore decoding stage from it.
here are about a dozen continuous-related mdef files.
That's not true for sure
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The issue is that if you configured 1000 senones like default value the model
gets so specidied that its very hard to find data similar to 10 minutes you
recorded
I didn't check your files yet but you need to make sure that audio for
training has proper sample rate and format. That's another common reason for
screwing up the recognition.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello Nickolay,
I am using the released version 0.6 release of pocketsphinx and sphinxbase
nightly build from 3/7/10.
I get the following message when running pocketsphinx_continuous:
"An unhandled win32 exception occurred in pocketsphinx_continuous.exe "
I was using sphinx3, but saw posting that it is recommended to use
pocketsphinx. I was getting "Failed to retrieve viterbi history" on sphinx3
when using sphinx3_livedecode. Anyways, below is the log from
pocketsphinx_continuous
This is what I got on the shell. It doesn't mention any errors.:
C:\Working\sphinx\tutorial\doheny>bin\pocketsphinx_continuous.exe
batch\args.doh
eny.pocketsphinx
INFO: cmd_ln.c(512): Parsing command line:
\
-mdef ./model_parameters/doheny.cd_cont_13/mdef \
-fdict ./etc/doheny.filler \
-dict ./etc/doheny.tidigit.dic \
-mean ./model_parameters/doheny.cd_cont_13/means \
-var ./model_parameters/doheny.cd_cont_13/variances \
-mixw ./model_parameters/doheny.cd_cont_13/mixture_weights \
-tmat ./model_parameters/doheny.cd_cont_13/transition_matrices \
-upperf 6855.49756 \
-lowerf 133.33334 \
-nfilt 40 \
-feat 1s_c_d_dd \
-nfft 512 \
-wlen 0.025625 \
-samprate 16000 \
-agc none \
-varnorm no \
-cmn current \
-fillprob 0.02 \
-lw 9.5 \
-maxwpf 1 \
-beam 1e-40 \
-pbeam 1e-30 \
-wbeam 1e-20 \
-maxhmmpf 1500 \
-ds 2 \
-lm ./etc/doheny.ug.lm.DMP
Current configuration:
-adcdev
-agc none none
-agcthresh 2.0 2.000000e+000
-alpha 0.97 9.700000e-001
-argfile
-ascale 20.0 2.000000e+001
-backtrace no no
-beam 1e-48 1.000000e-040
-bestpath yes yes
-bestpathlw 9.5 9.500000e+000
-bghist no no
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-compallsen no no
-debug 0
-dict ./etc/doheny.tidigit.dic
-dictcase no no
-dither no no
-doublebw no no
-ds 1 2
-fdict ./etc/doheny.filler
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 2.000000e-002
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-064
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+000
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-029
-fwdtree yes yes
-hmm
-input_endian little little
-jsgf
-kdmaxbbi -1 -1
-kdmaxdepth 0 0
-kdtree
-latsize 5000 5000
-lda
-ldadim 0 0
-lextreedump 0 0
-lifter 0 0
-lm ./etc/doheny.ug.lm.DMP
-lmctl
-lmname default default
-logbase 1.0001 1.000100e+000
-logfn
-logspec no no
-lowerf 133.33334 1.333333e+002
-lpbeam 1e-40 1.000000e-040
-lponlybeam 7e-29 7.000000e-029
-lw 6.5 9.500000e+000
-maxhmmpf -1 1500
-maxnewoov 20 20
-maxwpf -1 1
-mdef ./model_parameters/doheny.cd_cont_13/mdef
-mean ./model_parameters/doheny.cd_cont_13/means
-mfclogdir
-mixw ./model_parameters/doheny.cd_cont_13/mixture_wei
ghts
-mixwfloor 0.0000001 1.000000e-007
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-nwpen 1.0 1.000000e+000
-pbeam 1e-48 1.000000e-030
-pip 1.0 1.000000e+000
-pl_beam 1e-10 1.000000e-010
-pl_pbeam 1e-5 1.000000e-005
-pl_window 0 0
-rawlogdir
-remove_dc no no
-round_filters yes yes
-samprate 16000 1.600000e+004
-seed -1 -1
-sendump
-senmgau
-silprob 0.005 5.000000e-003
-smoothspec no no
-svspec
-tmat ./model_parameters/doheny.cd_cont_13/transition_
matrices
-tmatfloor 0.0001 1.000000e-004
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+003
-usewdphones no no
-uw 1.0 1.000000e+000
-var ./model_parameters/doheny.cd_cont_13/variances
-varfloor 0.0001 1.000000e-004
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 1.000000e-020
-wip 0.65 6.500000e-001
-wlen 0.025625 2.562500e-002
INFO: feat.c(848): Initializing feature stream to type: '1s_c_d_dd',
ceplen=13,
CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(142): mean= 12.00, mean= 0.0
INFO: mdef.c(520): Reading model definition:
./model_parameters/doheny.cd_cont_1
3/mdef
INFO: bin_mdef.c(173): Allocating 621 * 8 bytes (4 KiB) for CD tree
INFO: tmat.c(205): Reading HMM transition probability matrices:
./model_paramete
rs/doheny.cd_cont_13/transition_matrices
INFO: acmod.c(117): Attempting to use SCHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
./model_parameters/d
oheny.cd_cont_13/means
INFO: ms_gauden.c(292): 237 codebook, 1 feature, size
8x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
./model_parameters/d
oheny.cd_cont_13/variances
INFO: ms_gauden.c(292): 237 codebook, 1 feature, size
8x39
INFO: ms_gauden.c(356): 994 variance values floored
INFO: acmod.c(119): Attempting to use PTHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
./model_parameters/d
oheny.cd_cont_13/means
INFO: ms_gauden.c(292): 237 codebook, 1 feature, size
8x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
./model_parameters/d
oheny.cd_cont_13/variances
INFO: ms_gauden.c(292): 237 codebook, 1 feature, size
8x39
INFO: ms_gauden.c(356): 994 variance values floored
INFO: ptm_mgau.c(671): Reading mixture weights file
'./model_parameters/doheny.c
d_cont_13/mixture_weights'
INFO: ptm_mgau.c(765): Read 237 x 1 x 8 mixture weights
INFO: ptm_mgau.c(831): Maximum top-N: 4
INFO: dict.c(294): Allocating 4111 * 20 bytes (80 KiB) for word entries
INFO: dict.c(306): Reading main dictionary: ./etc/doheny.tidigit.dic
INFO: dict.c(206): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(309): 12 words read
INFO: dict.c(314): Reading filler dictionary: ./etc/doheny.filler
INFO: dict.c(206): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(317): 3 words read
INFO: dict2pid.c(402): Building PID tables for dictionary
INFO: dict2pid.c(409): Allocating 15 * 4 bytes (0 KiB) for word-internal
arrays
INFO: dict2pid.c(414): Allocating 40^3 * 2 bytes (125 KiB) for word-initial
trip
hones
INFO: dict2pid.c(453): Allocating 19 entries of 2 bytes (0 KiB) for internal
ssi
ds
INFO: dict2pid.c(130): Allocated 19360 bytes (18 KiB) for word-final triphones
INFO: dict2pid.c(193): Allocated 19360 bytes (18 KiB) for single-phone word
trip
hones
ERROR: "ngram_model_arpa.c", line 76: No \data\ mark in LM file
INFO: ngram_model_dmp.c(140): Will use memory-mapped I/O for LM file
INFO: ngram_model_dmp.c(194): ngrams 1=14, 2=1, 3=0
INFO: ngram_model_dmp.c(240): 14 = LM.unigrams(+trailer) read
INFO: ngram_model_dmp.c(288): 1 = LM.bigrams(+trailer) read
INFO: ngram_model_dmp.c(337): 2 = LM.prob2 entries read
INFO: ngram_model_dmp.c(460): 14 = ascii word strings read
INFO: ngram_search_fwdtree.c(99): 11 unique initial diphones
INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 4 single-phone
w
ords
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 4
single
-phone words
INFO: ngram_search_fwdtree.c(324): after: max nonroot chan increased to 146
INFO: ngram_search_fwdtree.c(333): after: 11 root, 18 non-root channels, 3
singl
e-phone words
INFO: ngram_search_fwdflat.c(153): fwdflat: min_ef_width = 4, max_sf_win = 25
Allocating 32 buffers of 2500 samples each
INFO: continuous.c(261): bin\pocketsphinx_continuous.exe COMPILED ON: Apr 19
201
0, AT: 12:51:10
READY....
C:\Working\sphinx\tutorial\doheny>
Forgot to mention.
The following properties I used is from using sphinxtrain in building the
dictionary and training db:
-mdef ./model_parameters/doheny.cd_cont_13/mdef
-mean ./model_parameters/doheny.cd_cont_13/means
-var ./model_parameters/doheny.cd_cont_13/variances
-mixw ./model_parameters/doheny.cd_cont_13/mixture_weights
-tmat ./model_parameters/doheny.cd_cont_13/transition_matrices
I noticed that there were several types of model parameters generated and so I
used the one when performing the preliminary decode.
I am thinking it has to do with the way my lm file is created because of error
"ERROR: "ngram_model_arpa.c", line 76: No \data\ mark in LM file", but am
unclear what exactly.
After looking at the source file I noticed that -lm is using a txt format and
not the DMP format. I then changed the paramter to point to
./etc/doheny.ug.lm.
After that, I see that it can read the lm file properly (log below), but still
get a same win32 exception.
INFO: ngram_model_arpa.c(476): ngrams 1=14, 2=1, 3=0
INFO: ngram_model_arpa.c(135): Reading unigrams
INFO: ngram_model_arpa.c(515): 14 = #unigrams created
INFO: ngram_model_arpa.c(194): Reading bigrams
INFO: ngram_model_arpa.c(531): 1 = #bigrams created
INFO: ngram_model_arpa.c(532): 2 = #prob2 entries
INFO: ngram_search_fwdtree.c(99): 11 unique initial diphones
INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 4 single-phone
w
ords
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 4
single
-phone words
INFO: ngram_search_fwdtree.c(324): after: max nonroot chan increased to 146
INFO: ngram_search_fwdtree.c(333): after: 11 root, 18 non-root channels, 3
singl
e-phone words
INFO: ngram_search_fwdflat.c(153): fwdflat: min_ef_width = 4, max_sf_win = 25
Allocating 32 buffers of 2500 samples each
INFO: continuous.c(261): bin\pocketsphinx_continuous.exe COMPILED ON: Apr 19
201
0, AT: 12:51:10
READY....
Another bit of information.
The segfault occurs in tied_mgau_common.h in routine LOGMATH_INLINE int
fast_logmath_add at the line 120 where
return r - (((uint8 *)t->table));
is called.
In debugging, this is the stack trace that I see:
Ok, got to work! I realized that some of the options I was passing in must be
conflicting, and so I reduced the options down to the basics and it works.
Problem though is that now when I run the application, the results are
completely off of what statistics I gathered from doing the preliminary
decode. It is as though some dictionary words are not being recognized at all.
Any suggestions??
Sorry, it's hard to suggest anything since the problem is not clearly stated.
What's the problem exactly?
In order to get help with training problems, you need to share training
folder. Without that it's hard to suggest anything.
Hello Nickolay,
I have messaged you the link to the training folder.
The problem I am having is that when I use pocketsphinx_continuous, i do not
have the same detection that I get when I did the preliminary decode when
creating the training data. I understand that my training data is small, and
that it will pose problems, but I want to make sure I am running
pocketsphinx_continuous with the proper parameters before I mature the
training data. I expect that since I am the sole speaker, that I would see
reasonable results when I am running pocketsphinx_continuous.
As of now, regardless of my attempts, it very rarely recognizes any of my
speech. The results I have from pocketsphinx_continuous does not compare to
the preliminary decode results. What configuration issues do I have to achieve
comparable results?
Second question is: what mdef file should I use? In using the tutorial to
create a training folder, it created several mdef files. I am confused in
which one would would be best for continuous, live speech? There are about a
dozen continuous-related mdef files.
My initial attempt was to use sphinx3_livedecode, but I saw a posting that
recommended pocketsphinx_continuous instead.
The arguments that I use for pocketsphinx can be found in
batch/args.doheny.pocketsphinx.mod
I run pocketsphinx from the root folder using the following command:
bin\pocketsphinx_continuous.exe batch\args.doheny.pocketsphinx.mod
I hope I am clarifying my 2 questions properly.
You can't do that. It's impossible to run with small data, you need large db
from early beginning.
I haven't got anything
The one in model_parameters/yourmodel.cd_cont_yoursenones. Doing tuturial
could help you, in particular don't ignore decoding stage from it.
That's not true for sure
The training folder can be found here
Should we not see similar results using pocketsphinx_continuous compared to
our preliminary decode if we similar data?
Thanks for your assistance.
The issue is that if you configured 1000 senones like default value the model
gets so specidied that its very hard to find data similar to 10 minutes you
recorded
I didn't check your files yet but you need to make sure that audio for
training has proper sample rate and format. That's another common reason for
screwing up the recognition.