My first guess was encoding, but you look OK there.
I can see 3 things wrong with that fsg.
The sequence 0-2-4-3-6-8-7-15-17-16-1 constitutes a short-circuit through the
grammar.
The exit probability for node 10 totals to 2, 10-6 at 1, 10-7 at 1.
The sequence 6-9-11-10-6 constitutes a short-circuited loop.
I don't know if any of these things will bother pocketsphinx. Where did you
get this FSG? I assume it's not hand-made.
You also shouldn't need the optional beginning and ending sil nodes. Sphinx
adds those by itself, so the ugly structures are 2,4,5,3 and 15,16,17,18 can
go away.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
i changed #define FSG_PNODE_CTXT_BVSZ 2 to #define FSG_PNODE_CTXT_BVSZ 4 at
fsg_lextree.h, the recognizer run sucessfully but recognize time is too long
??
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
i try to use pocketsphinx_continuous with fsg grammar but it got segmentation
fault error.
here is my mdef:
http://dl.dropbox.com/u/5137777/download/mdef
filler dic:
http://dl.dropbox.com/u/5137777/download/noisedict
dict
http://dl.dropbox.com/u/5137777/download/word.dic
fsg file:
http://dl.dropbox.com/u/5137777/download/word.fsg
output log:
INFO: cmd_ln.c(559): Parsing command line:
pocketsphinx_continuous \
-hmm /media/KHOAND/workspaces/acousticmodel-building/Aug-2011/30_td_alldata_dither/model_parameters/30_td_alldata_dither.cd_cont_5000 \
-fsg word.fsg \
-dict word.dic
Current configuration:
-adcdev
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-argfile
-ascale 20.0 2.000000e+01
-aw 1 1
-backtrace no no
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-bghist no no
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-compallsen no no
-debug 0
-dict word.dic
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg word.fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm /media/KHOAND/workspaces/acousticmodel-building/Aug-2011/30_td_alldata_dither/model_parameters/30_td_alldata_dither.cd_cont_5000
-infile
-input_endian little little
-jsgf
-kdmaxbbi -1 -1
-kdmaxdepth 0 0
-kdtree
-latsize 5000 5000
-lda
-ldadim 0 0
-lextreedump 0 0
-lifter 0 0
-lm
-lmctl
-lmname default default
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.333333e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 6.500000e+00
-maxhmmpf -1 -1
-maxnewoov 20 20
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-min_endfr 0 0
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-nwpen 1.0 1.000000e+00
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-5 1.000000e-05
-pl_window 0 0
-rawlogdir
-remove_dc no no
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-sendump
-senlogdir
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec
-time no no
-tmat
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-usewdphones no no
-uw 1.0 1.000000e+00
-var
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-29
-wip 0.65 6.500000e-01
-wlen 0.025625 2.562500e-02
INFO: cmd_ln.c(559): Parsing command line:
\
-alpha 0.97 \
-doublebw no \
-nfilt 40 \
-ncep 13 \
-lowerf 1.3333334 \
-upperf 6855.4976 \
-nfft 512 \
-wlen 0.0256 \
-transform legacy \
-samprate 16000 \
-feat 1s_c_d_dd \
-agc none \
-cmn current \
-varnorm no
Current configuration:
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-dither no no
-doublebw no no
-feat 1s_c_d_dd 1s_c_d_dd
-frate 100 100
-input_endian little little
-lda
-ldadim 0 0
-lifter 0 0
-logspec no no
-lowerf 133.33334 1.333333e+00
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-remove_dc no no
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-smoothspec no no
-svspec
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wlen 0.025625 2.560000e-02
INFO: acmod.c(238): Parsed model-specific feature parameters from
/media/KHOAND/workspaces/acousticmodel-building/Aug-2011/30_td_alldata_dither/
model_parameters/30_td_alldata_dither.cd_cont_5000/feat.params
INFO: feat.c(697): Initializing feature stream to type: '1s_c_d_dd',
ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(142): mean= 12.00, mean= 0.0
INFO: mdef.c(520): Reading model definition: /media/KHOAND/workspaces
/acousticmodel-building/Aug-2011/30_td_alldata_dither/model_parameters/30_td_a
lldata_dither.cd_cont_5000/mdef
INFO: bin_mdef.c(173): Allocating 423280 * 8 bytes (3306 KiB) for CD tree
INFO: tmat.c(205): Reading HMM transition probability matrices:
/media/KHOAND/workspaces/acousticmodel-building/Aug-2011/30_td_alldata_dither/
model_parameters/30_td_alldata_dither.cd_cont_5000/transition_matrices
INFO: acmod.c(117): Attempting to use SCHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/media/KHOAND/workspaces/acousticmodel-building/Aug-2011/30_td_alldata_dither/
model_parameters/30_td_alldata_dither.cd_cont_5000/means
INFO: ms_gauden.c(292): 5384 codebook, 1 feature, size
8x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/media/KHOAND/workspaces/acousticmodel-building/Aug-2011/30_td_alldata_dither/
model_parameters/30_td_alldata_dither.cd_cont_5000/variances
INFO: ms_gauden.c(292): 5384 codebook, 1 feature, size
8x39
INFO: ms_gauden.c(356): 121967 variance values floored
INFO: acmod.c(119): Attempting to use PTHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/media/KHOAND/workspaces/acousticmodel-building/Aug-2011/30_td_alldata_dither/
model_parameters/30_td_alldata_dither.cd_cont_5000/means
INFO: ms_gauden.c(292): 5384 codebook, 1 feature, size
8x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/media/KHOAND/workspaces/acousticmodel-building/Aug-2011/30_td_alldata_dither/
model_parameters/30_td_alldata_dither.cd_cont_5000/variances
INFO: ms_gauden.c(292): 5384 codebook, 1 feature, size
8x39
INFO: ms_gauden.c(356): 121967 variance values floored
ERROR: "ptm_mgau.c", line 801: Number of codebooks exceeds 256: 5384
INFO: acmod.c(121): Falling back to general multi-stream GMM computation
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/media/KHOAND/workspaces/acousticmodel-building/Aug-2011/30_td_alldata_dither/
model_parameters/30_td_alldata_dither.cd_cont_5000/means
INFO: ms_gauden.c(292): 5384 codebook, 1 feature, size
8x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/media/KHOAND/workspaces/acousticmodel-building/Aug-2011/30_td_alldata_dither/
model_parameters/30_td_alldata_dither.cd_cont_5000/variances
INFO: ms_gauden.c(292): 5384 codebook, 1 feature, size
8x39
INFO: ms_gauden.c(356): 121967 variance values floored
INFO: ms_senone.c(160): Reading senone mixture weights:
/media/KHOAND/workspaces/acousticmodel-building/Aug-2011/30_td_alldata_dither/
model_parameters/30_td_alldata_dither.cd_cont_5000/mixture_weights
INFO: ms_senone.c(211): Truncating senone logs3(pdf) values by 10 bits
INFO: ms_senone.c(218): Not transposing mixture weights in memory
ERROR: "ms_senone.c", line 265: Weight normalization failed for 6 senones
INFO: ms_senone.c(277): Read mixture weights for 5384 senones: 1 features x 8
codewords
INFO: ms_senone.c(331): Mapping senones to individual codebooks
INFO: ms_mgau.c(123): The value of topn: 4
INFO: dict.c(294): Allocating 4103 * 20 bytes (80 KiB) for word entries
INFO: dict.c(306): Reading main dictionary: word.dic
INFO: dict.c(206): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(309): 4 words read
INFO: dict.c(314): Reading filler dictionary: /media/KHOAND/workspaces
/acousticmodel-building/Aug-2011/30_td_alldata_dither/model_parameters/30_td_a
lldata_dither.cd_cont_5000/noisedict
INFO: dict.c(206): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(317): 3 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(405): Allocating 128^3 * 2 bytes (4096 KiB) for word-initial
triphones
INFO: dict2pid.c(131): Allocated 197120 bytes (192 KiB) for word-final
triphones
INFO: dict2pid.c(195): Allocated 197120 bytes (192 KiB) for single-phone word
triphones
INFO: fsg_search.c(139): FSG(beam: -1105112, pbeam: -1105112, wbeam: -648215;
wip: -25842, pip: 0)
INFO: fsg_model.c(678): FSG: 19 states, 5 unique words, 6 transitions (18
null)
INFO: fsg_model.c(213): Computing transitive closure for null transitions
INFO: fsg_model.c(264): 88 null transitions added
INFO: fsg_model.c(411): Adding silence transitions for <sil> to FSG
INFO: fsg_model.c(431): Added 19 silence word transitions
INFO: fsg_model.c(411): Adding silence transitions for sil to FSG
INFO: fsg_model.c(431): Added 19 silence word transitions
INFO: fsg_lextree.c(110): Allocated 4902 bytes (4 KiB) for left and right
context phones
Segmentation fault </sil>
My first guess was encoding, but you look OK there.
I can see 3 things wrong with that fsg.
The sequence 0-2-4-3-6-8-7-15-17-16-1 constitutes a short-circuit through the
grammar.
The exit probability for node 10 totals to 2, 10-6 at 1, 10-7 at 1.
The sequence 6-9-11-10-6 constitutes a short-circuited loop.
I don't know if any of these things will bother pocketsphinx. Where did you
get this FSG? I assume it's not hand-made.
You also shouldn't need the optional beginning and ending sil nodes. Sphinx
adds those by itself, so the ugly structures are 2,4,5,3 and 15,16,17,18 can
go away.
sorry, without model files it's very hard to reproduce and fix this problem.
You only provided mdef so far
here is our model parameters for Vietnamese language:
http://dl.dropbox.com/u/5137777/download/30_td.cd_cont_5000/feat.params
http://dl.dropbox.com/u/5137777/download/30_td.cd_cont_5000/mdef
http://dl.dropbox.com/u/5137777/download/30_td.cd_cont_5000/means
http://dl.dropbox.com/u/5137777/download/30_td.cd_cont_5000/mixture_weights
http://dl.dropbox.com/u/5137777/download/30_td.cd_cont_5000/noisedict
http://dl.dropbox.com/u/5137777/download/30_td.cd_cont_5000/transition_matric
es
http://dl.dropbox.com/u/5137777/download/30_td.cd_cont_5000/variances
Recent pocketsphinx_continuous doesn't produce segmenation fault, it creates a
warning:
When FSG_PNODE_CTXT_BVSZ is increased it just works. I suggest you to try a
newer version.
"newer version" do you mean "pocketsphinx-0.7" ?
i already use pocketsphinx 0.7 for this test.
By newer version I mean snapshot/subversion trunk. See
http://cmusphinx.sourceforge.net/wiki/download
i had tried with pocketsphinx & sphinxbase snapshot version but i got the same
error ?
i changed #define FSG_PNODE_CTXT_BVSZ 2 to #define FSG_PNODE_CTXT_BVSZ 4 at
fsg_lextree.h, the recognizer run sucessfully but recognize time is too long
??
Well, you need to select between big phoneset and recognizer speed. Probably
the best point is somewhere in the middle.