Current configuration: [NAME][DEFLT][VALUE]
-agc none none
-alpha 0.97 9.700000e-01
-cep2spec no no
-ceplen 13 13
-cmn current current
-dither no yes
-doublebw no no
-fbtype mel_scale mel_scale
-feat 1s_c_d_dd 1s_c_d_dd
-frate 100 100
-input_endian little little
-lda
-ldadim 29 29
-lifter 0 0
-logspec no no
-lowerf 133.33334 1.333333e+02
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-remove_dc no no
-round_filters yes yes
-samprate 16000.0 1.600000e+04
-seed -1 -1
-smoothspec no no
-spec2cep no no
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wlen 0.025625 2.560000e-02
INFO: kbcore.c(422): Parsed model-specific feature parameters from /sang/speech/sphinx/tutoial/an4/model_parameters/an4.cd_cont_1000/feat.params
INFO: logs3.c(151): Initializing logbase: 1.000300e+00 (add table: 1)
INFO: Initialization of the log add table
INFO: Log-Add table size = 29350
INFO:
INFO: feat.c(835): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(142): mean[0]= 12.00, mean[1..12]= 0.0
INFO: kbcore.c(446): .cont.
INFO: Initialization of feat_t, report:
INFO: Feature type = 1s_c_d_dd
INFO: Cepstral size = 13
INFO: Cepstral size Used = 13
INFO: Number of stream = 1
INFO: Vector size of stream[0]: 39
INFO: Whether CMN is used = 1
INFO: Whether AGC is used = 0
INFO: Whether variance is normalized = 0
INFO:
INFO: Reading HMM in Sphinx 3 Model format
INFO: Model Definition File: /sang/speech/sphinx/tutoial/an4/model_architecture/an4.1000.mdef
INFO: Mean File: /sang/speech/sphinx/tutoial/an4/model_parameters/an4.cd_cont_1000/means
INFO: Variance File: /sang/speech/sphinx/tutoial/an4/model_parameters/an4.cd_cont_1000/variances
INFO: Mixture Weight File: /sang/speech/sphinx/tutoial/an4/model_parameters/an4.cd_cont_1000/mixture_weights
INFO: Transition Matrices File: /sang/speech/sphinx/tutoial/an4/model_parameters/an4.cd_cont_1000/transition_matrices
INFO: mdef.c(679): Reading model definition: /sang/speech/sphinx/tutoial/an4/model_architecture/an4.1000.mdef
INFO: Initialization of mdef_t, report:
INFO: 34 CI-phone, 6372 CD-phone, 3 emitstate/phone, 102 CI-sen, 1102 Sen, 1676 Sen-Seq
INFO:
INFO: kbcore.c(282): Using optimized GMM computation for Continuous HMM, -topn will be ignored
INFO: cont_mgau.c(161): Reading mixture gaussian file '/sang/speech/sphinx/tutoial/an4/model_parameters/an4.cd_cont_1000/means'
INFO: cont_mgau.c(417): 1102 mixture Gaussians, 8 components, 1 streams, veclen 39
INFO: cont_mgau.c(161): Reading mixture gaussian file '/sang/speech/sphinx/tutoial/an4/model_parameters/an4.cd_cont_1000/variances'
INFO: cont_mgau.c(417): 1102 mixture Gaussians, 8 components, 1 streams, veclen 39
INFO: cont_mgau.c(505): Reading mixture weights file '/sang/speech/sphinx/tutoial/an4/model_parameters/an4.cd_cont_1000/mixture_weights'
INFO: cont_mgau.c(657): Read 1102 x 8 mixture weights
INFO: cont_mgau.c(685): Removing uninitialized Gaussian densities
137 202 708 850 853 870 889 891 937
WARNING: "cont_mgau.c", line 760: 262 densities removed (9 mixtures removed entirely)
INFO: cont_mgau.c(776): Applying variance floor
INFO: cont_mgau.c(794): 1944 variance values floored
INFO: cont_mgau.c(842): Precomputing Mahalanobis distance invariants
INFO: tmat.c(167): Reading HMM transition probability matrices: /sang/speech/sphinx/tutoial/an4/model_parameters/an4.cd_cont_1000/transition_matrices
INFO: Initialization of tmat_t, report:
INFO: Read 34 transition matrices of size 3x4
INFO:
INFO: dict.c(471): Reading main dictionary: /sang/speech/sphinx/tutoial/an4/etc/an4.dic
INFO: dict.c(474): 130 words read
INFO: dict.c(479): Reading filler dictionary: /sang/speech/sphinx/tutoial/an4/etc/an4.filler
INFO: dict.c(482): 3 words read
INFO: Initialization of dict_t, report:
INFO: No of CI phone: 0
INFO: Max word: 4229
INFO: No of word: 133
INFO:
INFO: lm.c(593): LM read('/sang/speech/sphinx/tutoial/an4/etc/an4.ug.lm.DMP', lw= 23.00, wip= 0.20, uw= 0.70)
INFO: lm.c(595): Reading LM file /sang/speech/sphinx/tutoial/an4/etc/an4.ug.lm.DMP (LM name "default")
INFO: lm_3g_dmp.c(628): Reading LM in 16 bits format
INFO: lm_3g_dmp.c(684): Read 101 unigrams [in memory]
INFO: lm_3g_dmp.c(757): 1 bigrams [on disk]
INFO: lm_3g_dmp.c(900): 2 bigram prob entries
INFO: lm_3g_dmp.c(1051): 101 word strings
INFO: lm.c(686): The LM routine is operating at 16 bits mode
INFO: Initialization of fillpen_t, report:
INFO: Language weight =23.000000
INFO: Word Insertion Penalty =0.200000
INFO: Silence probability =0.100000
INFO: Filler probability =0.100000
INFO:
INFO: dict2pid.c(577): Building PID tables for dictionary
INFO: Initialization of dict2pid_t, report:
INFO: Dict2pid is in composite triphone mode
INFO: 477 composite states; 189 composite sseq
INFO:
INFO: kbcore.c(602): Inside kbcore: Verifying models consistency ......
INFO: kbcore.c(624): End of Initialization of Core Models:
INFO: Initialization of beam_t, report:
INFO: Parameters used in Beam Pruning of Viterbi Search:
INFO: Beam=-921019
INFO: PBeam=-383758
INFO: WBeam=-614012 (Skip=0)
INFO: WEndBeam=-614012
INFO: No of CI Phone assumed=34
INFO:
INFO: Initialization of fast_gmm_t, report:
INFO: Parameters used in Fast GMM computation:
INFO: Frame-level: Down Sampling Ratio 1, Conditional Down Sampling? 0, Distance-based Down Sampling? 0
INFO: GMM-level: CI phone beam -614012. MAX CD 100000
INFO: Gaussian-level: GS map would be used for Gaussian Selection? =1, SVQ would be used as Gaussian Score? =0 SubVQ Beam -19363
INFO:
INFO: Initialization of pl_t, report:
INFO: Parameters used in phoneme lookahead:
INFO: Phoneme look-ahead type = 0
INFO: Phoneme look-ahead beam size = 65945
INFO: No of CI Phones assumed=34
INFO:
INFO: Initialization of ascr_t, report:
INFO: No. of CI senone =102
INFO: No. of senone = 1102
INFO: No. of composite senone = 477
INFO: No. of senone sequence = 1676
INFO: No. of composite senone sequence=189
INFO: Parameters used in phoneme lookahead:
INFO: Phoneme lookahead window = 1
INFO:
INFO: kb.c(306): SEARCH MODE INDEX 4
INFO: srch.c(372): Search Initialization.
WARNING: "srch_time_switch_tree.c", line 172: -Nstalextree is omitted in TST search.
INFO: lextree.c(221): Creating Unigram Table for lm (name: default)
INFO: lextree.c(234): Size of word table after unigram + words in class: 99.
INFO: lextree.c(243): Size of word table after adding alternative prons: 128.
INFO: lextree_t, report:
INFO: Parameters of the lexical tree.
INFO: Type of the tree 0 (0:unigram, 1: 2g, 2: 3g etc.)
INFO: Number of left contexts 20
INFO: Number of node 869
INFO: Number of links in the tree 3522
INFO: The previous word for this tree
INFO: The size of a node of the lexical tree 96
INFO: The size of a gnode_t 12
INFO:
INFO: srch_time_switch_tree.c(232): Lextrees (0) for lm 0, its name is default, it has 869 nodes(ug)
INFO: lextree.c(221): Creating Unigram Table for lm (name: default)
INFO: lextree.c(234): Size of word table after unigram + words in class: 99.
INFO: lextree.c(243): Size of word table after adding alternative prons: 128.
INFO: lextree_t, report:
INFO: Parameters of the lexical tree.
INFO: Type of the tree 0 (0:unigram, 1: 2g, 2: 3g etc.)
INFO: Number of left contexts 20
INFO: Number of node 869
INFO: Number of links in the tree 3522
INFO: The previous word for this tree
INFO: The size of a node of the lexical tree 96
INFO: The size of a gnode_t 12
INFO:
INFO: srch_time_switch_tree.c(232): Lextrees (1) for lm 0, its name is default, it has 869 nodes(ug)
INFO: lextree.c(221): Creating Unigram Table for lm (name: default)
INFO: lextree.c(234): Size of word table after unigram + words in class: 99.
INFO: lextree.c(243): Size of word table after adding alternative prons: 128.
INFO: lextree_t, report:
INFO: Parameters of the lexical tree.
INFO: Type of the tree 0 (0:unigram, 1: 2g, 2: 3g etc.)
INFO: Number of left contexts 20
INFO: Number of node 869
INFO: Number of links in the tree 3522
INFO: The previous word for this tree
INFO: The size of a node of the lexical tree 96
INFO: The size of a gnode_t 12
INFO:
INFO: srch_time_switch_tree.c(232): Lextrees (2) for lm 0, its name is default, it has 869 nodes(ug)
INFO: srch_time_switch_tree.c(239): Time for building trees, 0.0040 CPU 0.0031 Clk
INFO: srch_time_switch_tree.c(261): Lextrees(0), 1 nodes(filler)
INFO: srch_time_switch_tree.c(261): Lextrees(1), 1 nodes(filler)
INFO: srch_time_switch_tree.c(261): Lextrees(2), 1 nodes(filler)
INFO: vithist.c(167): Initializing Viterbi-history module
INFO: Initialization of srch_t, report:
INFO: Operation Mode = 4, Operation Name = fwdtree
INFO:
INFO: utt.c(196): Processing: an406-fcaw-b
INFO: feat.c(1139): At directory /sang/speech/sphinx/tutoial/an4/feat
INFO: feat.c(369): Reading mfc file: '/sang/speech/sphinx/tutoial/an4/feat/an4test_clstk/fcaw/an406-fcaw-b.mfc'[0..-1]
ERROR: "srch_time_switch_tree.c", line 737: ERROR Fr 0, best HMM score > 0 (1207777165); int32 wraparound?
.ERROR: "srch_time_switch_tree.c", line 737: ERROR Fr 1, best HMM score > 0 (1476394336); int32 wraparound?
ERROR: "lextree.c", line 1600: out.history==-1, error
ERROR: "srch_time_switch_tree.c", line 928: Propagation Failed for lextree_hmm_propagate_leave at tree 0
ERROR: "srch_time_switch_tree.c", line 737: ERROR Fr 2, best HMM score > 0 (268434958); int32 wraparound?
ERROR: "srch_time_switch_tree.c", line 737: ERROR Fr 3, best HMM score > 0 (1476394012); int32 wraparound?
ERROR: "srch_time_switch_tree.c", line 737: ERROR Fr 4, best HMM score > 0 (268435103); int32 wraparound?
ERROR: "lextree.c", line 1600: out.history==-1, error
ERROR: "srch_time_switch_tree.c", line 928: Propagation Failed for lextree_hmm_propagate_leave at tree 0
.......................................WARNING: "vithist.c", line 787: No word exit in frame 398, using exits from frame 3
ERROR: "vithist.c", line 814: No word exit in frame 398, using exits from frame 3
INFO: fast_algo_struct.c(398): HMMHist0..0: 399(100)
INFO: lm.c(945): 0 tg(), 0 tgcache, 0 bo; 0 fills, 0 in mem (0.0%)
INFO: lm.c(949): 2 bg(), 0 bo; 1 fills, 1 in mem (50.0%)
Sorry, but how did you created mfc files, are you just downloaded mfc for an4 from the site? I suppose you should extract sphinx features with ./scripts_pl/make_feats from raw files. The ones on the site are just MEL-cepstral coefficients.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
At first, i use make_feets.pl to generate sph file to mfc files, but got more errors, like can not get last state. So I edited these raw voice files to remove a little bit noise, then they are OK. But still has errors I just posted here.
Should I change sphinx_train.cfg file?
Thanks a lot.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
That is very strange. Actually files must have noise in order to be recognized properly. Can you please start with the original files and scripts and show the error you are getting.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I require help from a programming whiz out ther to help me develop a tutorial that I think will eventually bring in big dollars.
I already produce a CD set of lessons called "Speak Australian" (SA).
it is very effective to help Australian migrants to modify their speech patterns & enable them to get jobs or better jobs.
I plan to make my tutorial interactive using speach recognition to evaluate enunciation & produce a score, just like "Typing Tutor".
This may seem like a daunting project but when divided into its componenets it is not so:
The main varients in speech are pitch modulation, speed of words & pauses, & accent(volume on a syllable) all of which (according my limited understanding of audio recording) are all quite measureable.
The market for tutorials to improve spoken Australian is huge ...especially with foreign help desks used by many companies these days. The principle is also applicable to other versions of English & other languages...a really huge market.
Is anyone out there???????
I would like to meet with you.
Best wishes,
Michael (director Shine Institute
email MS@shine-institute.com
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Nope, 64 bit machines work fine. We have a lot of them here.
The Intel ones (Xeon 5100 and 5300 series) seem to be significantly faster than the Opterons for Sphinx, probably because they have larger caches and very good integer math performance.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
After download Sphinx3 and AN4, I installed them in Linux OS.
when run perl scripts_pl/RunAll.pl
I got many errors in 50.cd_hmm_tied.
And when ignoring these errors, run perl scripts_pl/decode/slave.pl
I got 100% Error.
The follow is Train Error:
/sang/speech/sphinx/tutoial/an4/bin/norm \ -accumdir /sang/speech/sphinx/tutoial/an4/bwaccumdir/an4_buff_1 \ -mixwfn /sang/speech/sphinx/tutoial/an4/model_parameters/an4.cd_cont_1000/mixture_weights \ -tmatfn /sang/speech/sphinx/tutoial/an4/model_parameters/an4.cd_cont_1000/transition_matrices \ -meanfn /sang/speech/sphinx/tutoial/an4/model_parameters/an4.cd_cont_1000/means \ -varfn /sang/speech/sphinx/tutoial/an4/model_parameters/an4.cd_cont_1000/variances \ -fullvar no
[Switch] [Default] [Value]
-help no no
-example no no
-accumdir /sang/speech/sphinx/tutoial/an4/bwaccumdir/an4_buff_1
-oaccumdir
-tmatfn /sang/speech/sphinx/tutoial/an4/model_parameters/an4.cd_cont_1000/transition_matrices
-mixwfn /sang/speech/sphinx/tutoial/an4/model_parameters/an4.cd_cont_1000/mixture_weights
-meanfn /sang/speech/sphinx/tutoial/an4/model_parameters/an4.cd_cont_1000/means
-varfn /sang/speech/sphinx/tutoial/an4/model_parameters/an4.cd_cont_1000/variances
-regmatfn
-dcountfn
-inmixwfn
-inmeanfn
-invarfn
-fullvar no no
-tiedvar no no
INFO: main.c(230): Reading and accumulating counts from /sang/speech/sphinx/tutoial/an4/bwaccumdir/an4_buff_1
INFO: s3mixw_io.c(116): Read /sang/speech/sphinx/tutoial/an4/bwaccumdir/an4_buff_1/mixw_counts [1102x1x4 array]
INFO: s3tmat_io.c(115): Read /sang/speech/sphinx/tutoial/an4/bwaccumdir/an4_buff_1/tmat_counts [34x3x4 array]
INFO: s3gau_io.c(379): Read /sang/speech/sphinx/tutoial/an4/bwaccumdir/an4_buff_1/gauden_counts with means with vars [1102x1x4 vector arrays]
INFO: main.c(450): Normalizing mean for n_mgau= 1102, n_stream= 1, n_density= 4
INFO: main.c(474): Normalizing var
ERROR: "gauden.c", line 1700: wt_var[i][j][k][l]=0 var (mgau= 3198763378, feat= 202, density=0, component=0) < 0
ERROR: "gauden.c", line 1700: wt_var[i][j][k][l]=1073741824 var (mgau= 3191706276, feat= 202, density=0, component=0) < 0
ERROR: "gauden.c", line 1700: wt_var[i][j][k][l]=0 var (mgau= 3183089913, feat= 202, density=0, component=0) < 0
ERROR: "gauden.c", line 1700: wt_var[i][j][k][l]=0 var (mgau= 3191203889, feat= 202, density=0, component=0) < 0
ERROR: "gauden.c", line 1700: wt_var[i][j][k][l]=-2147483648 var (mgau= 3190419760, feat= 202, density=0, component=0) < 0
ERROR: "gauden.c", line 1700: wt_var[i][j][k][l]=0 var (mgau= 3189243536, feat= 202, density=0, component=0) < 0
ERROR: "gauden.c", line 1700: wt_var[i][j][k][l]=0 var (mgau= 3188719616, feat= 202, density=0, component=0) < 0
ERROR: "gauden.c", line 1700: wt_var[i][j][k][l]=0 var (mgau= 3186575360, feat= 202, density=0, component=0) < 0
ERROR: "gauden.c", line 1700: wt_var[i][j][k][l]=0 var (mgau= 3189758608, feat= 202, density=0, component=0) < 0
ERROR: "gauden.c", line 1700: wt_var[i][j][k][l]=0 var (mgau= 3189591140, feat= 202, density=0, component=0) < 0
ERROR: "gauden.c", line 1700: wt_var[i][j][k][l]=0 var (mgau= 3182846976, feat= 202, density=0, component=0) < 0
ERROR: "gauden.c", line 1700: wt_var[i][j][k][l]=0 var (mgau= 3186902016, feat= 202, density=0, component=0) < 0
ERROR: "gauden.c", line 1700: wt_var[i][j][k][l]=0 var (mgau= 3178860544, feat= 202, density=0, component=0) < 0
ERROR: "gauden.c", line 1700: wt_var[i][j][k][l]=-2147483648 var (mgau= 3190390532, feat= 202, density=0, component=0) < 0
ERROR: "gauden.c", line 1700: wt_var[i][j][k][l]=0 var (mgau= 3188410144, feat= 202, density=0, component=0) < 0
ERROR: "gauden.c", line 1700: wt_var[i][j][k][l]=0 var (mgau= 3182792704, feat= 202, density=0, component=0) < 0
ERROR: "gauden.c", line 1700: wt_var[i][j][k][l]=0 var (mgau= 3184421664, feat= 202, density=0, component=0) < 0
ERROR: "gauden.c", line 1700: wt_var[i][j][k][l]=0 var (mgau= 3186247680, feat= 202, density=0, component=0) < 0
ERROR: "gauden.c", line 1700: wt_var[i][j][k][l]=0 var (mgau= 3178827776, feat= 202, density=0, component=0) < 0
ERROR: "gauden.c", line 1700: wt_var[i][j][k][l]=0 var (mgau= 3175470208, feat= 202, density=0, component=0) < 0
ERROR: "gauden.c", line 1700: wt_var[i][j][k][l]=0 var (mgau= 3198763378, feat= 202, density=0, component=3) < 0
..................
..................
INFO: s3mixw_io.c(232): Wrote /sang/speech/sphinx/tutoial/an4/model_parameters/an4.cd_cont_1000/mixture_weights [1102x1x4 array]
INFO: s3tmat_io.c(174): Wrote /sang/speech/sphinx/tutoial/an4/model_parameters/an4.cd_cont_1000/transition_matrices [34x3x4 array]
INFO: s3gau_io.c(226): Wrote /sang/speech/sphinx/tutoial/an4/model_parameters/an4.cd_cont_1000/means [1102x1x4 array]
INFO: s3gau_io.c(226): Wrote /sang/speech/sphinx/tutoial/an4/model_parameters/an4.cd_cont_1000/variances [1102x1x4 array]
Mon Sep 10 18:47:23 2007
Current Overall Likelihood Per Frame = 4.15883884199732
wt_var[i][j][k][l]=0 was added by me to debug.
And Decode error:
INFO: info.c(66): Directory: '/sang/speech/sphinx/tutoial/an4'
INFO: info.c(70): /sang/speech/sphinx/tutoial/an4/bin/sphinx3_decode Compiled on: Aug 30 2007, AT: 13:46:04
INFO: cmd_ln.c(430): Parsing command line:
/sang/speech/sphinx/tutoial/an4/bin/sphinx3_decode \ -mdef /sang/speech/sphinx/tutoial/an4/model_architecture/an4.1000.mdef \ -senmgau .cont. \ -hmm /sang/speech/sphinx/tutoial/an4/model_parameters/an4.cd_cont_1000 \ -lw 23 \ -feat 1s_c_d_dd \ -beam 1e-120 \ -wbeam 1e-80 \ -dict /sang/speech/sphinx/tutoial/an4/etc/an4.dic \ -fdict /sang/speech/sphinx/tutoial/an4/etc/an4.filler \ -lm /sang/speech/sphinx/tutoial/an4/etc/an4.ug.lm.DMP \ -wip 0.2 \ -ctl /sang/speech/sphinx/tutoial/an4/etc/an4_test.fileids \ -ctloffset 0 \ -ctlcount 1 \ -cepdir /sang/speech/sphinx/tutoial/an4/feat \ -cepext .mfc \ -hyp /sang/speech/sphinx/tutoial/an4/result/an4-1-1.match \ -agc none \ -varnorm no \ -cmn current
Current configuration:
[NAME] [DEFLT] [VALUE]
-adchdr 0 0
-adcin no no
-agc none none
-alpha 0.97 9.700000e-01
-backtrace yes yes
-beam 1.0e-55 1.000000e-120
-bestpath no no
-bestpathlw 0.000000e+00
-bestscoredir
-bestsenscrdir
-bghist no no
-bptbldir
-bptblsize 32768 32768
-cb2mllr .1cls. .1cls.
-cep2spec no no
-cepdir /sang/speech/sphinx/tutoial/an4/feat
-cepext .mfc .mfc
-ceplen 13 13
-ci_pbeam 1e-80 1.000000e-80
-cmn current current
-cond_ds no no
-ctl /sang/speech/sphinx/tutoial/an4/etc/an4_test.fileids
-ctlcount 1000000000 1
-ctloffset 0 0
-ctl_lm
-ctl_mllr
-dagfudge 2 2
-dict /sang/speech/sphinx/tutoial/an4/etc/an4.dic
-dist_ds no no
-dither no no
-doublebw no no
-ds 1 1
-epl 3 3
-fbtype mel_scale mel_scale
-fdict /sang/speech/sphinx/tutoial/an4/etc/an4.filler
-feat 1s_c_d_dd 1s_c_d_dd
-fillpen
-fillprob 0.1 1.000000e-01
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-gs
-gs4gs yes yes
-hmm /sang/speech/sphinx/tutoial/an4/model_parameters/an4.cd_cont_1000
-hmmdump no no
-hmmdumpef 200000000 200000000
-hmmdumpsf 200000000 200000000
-hmmhistbinsize 5000 5000
-hyp /sang/speech/sphinx/tutoial/an4/result/an4-1-1.match
-hypseg
-hypsegscore_unscale yes yes
-inlatdir
-inlatwin 50 50
-input_endian little little
-kdmaxbbi -1 -1
-kdmaxdepth 0 0
-kdtree
-latcompress yes yes
-latext lat.gz lat.gz
-lda
-ldadim 29 29
-lextreedump 0 0
-lifter 0 0
-lm /sang/speech/sphinx/tutoial/an4/etc/an4.ug.lm.DMP
-lmctlfn
-lmdumpdir
-lminmemory no no
-lmname
-log3table yes yes
-logbase 1.0003 1.000300e+00
-logfn
-logspec no no
-lowerf 133.33334 1.333333e+02
-lts_mismatch no no
-lw 9.5 2.300000e+01
-maxcdsenpf 100000 100000
-maxedge 2000000 2000000
-maxhistpf 100 100
-maxhmmpf 20000 20000
-maxlmop 100000000 100000000
-maxlpf 40000 40000
-maxppath 1000000 1000000
-maxwpf 20 20
-mdef /sang/speech/sphinx/tutoial/an4/model_architecture/an4.1000.mdef
-mean
-min_endfr 3 3
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mode fwdtree fwdtree
-nbest 200 200
-nbestdir
-nbestext nbest.gz nbest.gz
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-Nlextree 3 3
-Nstalextree 25 25
-op_mode -1 -1
-outlatdir
-outlatfmt s3 s3
-pbeam 1.0e-50 1.000000e-50
-pheurtype 0 0
-phonepen 1.0 1.000000e+00
-pl_beam 1.0e-80 1.000000e-80
-pl_window 1 1
-ppathdebug no no
-ptranskip 0 0
-remove_dc no no
-round_filters yes yes
-samprate 16000.0 1.600000e+04
-seed -1 -1
-senmgau .cont. .cont.
-silprob 0.1 1.000000e-01
-smoothspec no no
-spec2cep no no
-subvq
-subvqbeam 3.0e-3 3.000000e-03
-svq4svq no no
-tighten_factor 0.5 5.000000e-01
-tmat
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-tracewhmm
-transform legacy legacy
-treeugprob yes yes
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-utt
-uw 0.7 7.000000e-01
-var
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-vqeval 3 3
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 1.0e-35 1.000000e-80
-wend_beam 1.0e-80 1.000000e-80
-wip 0.7 2.000000e-01
-wlen 0.025625 2.562500e-02
-worddumpef 200000000 200000000
-worddumpsf 200000000 200000000
INFO: kbcore.c(404): Begin Initialization of Core Models:
INFO: cmd_ln.c(430): Parsing command line:
\ -alpha 0.97 \ -dither yes \ -doublebw no \ -nfilt 40 \ -ncep 13 \ -lowerf 133.33334 \ -upperf 6855.4976 \ -nfft 512 \ -wlen 0.0256 \ -transform legacy \ -feat 1s_c_d_dd \ -agc none \ -cmn current \ -varnorm no
Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-alpha 0.97 9.700000e-01
-cep2spec no no
-ceplen 13 13
-cmn current current
-dither no yes
-doublebw no no
-fbtype mel_scale mel_scale
-feat 1s_c_d_dd 1s_c_d_dd
-frate 100 100
-input_endian little little
-lda
-ldadim 29 29
-lifter 0 0
-logspec no no
-lowerf 133.33334 1.333333e+02
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-remove_dc no no
-round_filters yes yes
-samprate 16000.0 1.600000e+04
-seed -1 -1
-smoothspec no no
-spec2cep no no
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wlen 0.025625 2.560000e-02
INFO: kbcore.c(422): Parsed model-specific feature parameters from /sang/speech/sphinx/tutoial/an4/model_parameters/an4.cd_cont_1000/feat.params
INFO: logs3.c(151): Initializing logbase: 1.000300e+00 (add table: 1)
INFO: Initialization of the log add table
INFO: Log-Add table size = 29350
INFO:
INFO: feat.c(835): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(142): mean[0]= 12.00, mean[1..12]= 0.0
INFO: kbcore.c(446): .cont.
INFO: Initialization of feat_t, report:
INFO: Feature type = 1s_c_d_dd
INFO: Cepstral size = 13
INFO: Cepstral size Used = 13
INFO: Number of stream = 1
INFO: Vector size of stream[0]: 39
INFO: Whether CMN is used = 1
INFO: Whether AGC is used = 0
INFO: Whether variance is normalized = 0
INFO:
INFO: Reading HMM in Sphinx 3 Model format
INFO: Model Definition File: /sang/speech/sphinx/tutoial/an4/model_architecture/an4.1000.mdef
INFO: Mean File: /sang/speech/sphinx/tutoial/an4/model_parameters/an4.cd_cont_1000/means
INFO: Variance File: /sang/speech/sphinx/tutoial/an4/model_parameters/an4.cd_cont_1000/variances
INFO: Mixture Weight File: /sang/speech/sphinx/tutoial/an4/model_parameters/an4.cd_cont_1000/mixture_weights
INFO: Transition Matrices File: /sang/speech/sphinx/tutoial/an4/model_parameters/an4.cd_cont_1000/transition_matrices
INFO: mdef.c(679): Reading model definition: /sang/speech/sphinx/tutoial/an4/model_architecture/an4.1000.mdef
INFO: Initialization of mdef_t, report:
INFO: 34 CI-phone, 6372 CD-phone, 3 emitstate/phone, 102 CI-sen, 1102 Sen, 1676 Sen-Seq
INFO:
INFO: kbcore.c(282): Using optimized GMM computation for Continuous HMM, -topn will be ignored
INFO: cont_mgau.c(161): Reading mixture gaussian file '/sang/speech/sphinx/tutoial/an4/model_parameters/an4.cd_cont_1000/means'
INFO: cont_mgau.c(417): 1102 mixture Gaussians, 8 components, 1 streams, veclen 39
INFO: cont_mgau.c(161): Reading mixture gaussian file '/sang/speech/sphinx/tutoial/an4/model_parameters/an4.cd_cont_1000/variances'
INFO: cont_mgau.c(417): 1102 mixture Gaussians, 8 components, 1 streams, veclen 39
INFO: cont_mgau.c(505): Reading mixture weights file '/sang/speech/sphinx/tutoial/an4/model_parameters/an4.cd_cont_1000/mixture_weights'
INFO: cont_mgau.c(657): Read 1102 x 8 mixture weights
INFO: cont_mgau.c(685): Removing uninitialized Gaussian densities
137 202 708 850 853 870 889 891 937
WARNING: "cont_mgau.c", line 760: 262 densities removed (9 mixtures removed entirely)
INFO: cont_mgau.c(776): Applying variance floor
INFO: cont_mgau.c(794): 1944 variance values floored
INFO: cont_mgau.c(842): Precomputing Mahalanobis distance invariants
INFO: tmat.c(167): Reading HMM transition probability matrices: /sang/speech/sphinx/tutoial/an4/model_parameters/an4.cd_cont_1000/transition_matrices
INFO: Initialization of tmat_t, report:
INFO: Read 34 transition matrices of size 3x4
INFO:
INFO: dict.c(471): Reading main dictionary: /sang/speech/sphinx/tutoial/an4/etc/an4.dic
INFO: dict.c(474): 130 words read
INFO: dict.c(479): Reading filler dictionary: /sang/speech/sphinx/tutoial/an4/etc/an4.filler
INFO: dict.c(482): 3 words read
INFO: Initialization of dict_t, report:
INFO: No of CI phone: 0
INFO: Max word: 4229
INFO: No of word: 133
INFO:
INFO: lm.c(593): LM read('/sang/speech/sphinx/tutoial/an4/etc/an4.ug.lm.DMP', lw= 23.00, wip= 0.20, uw= 0.70)
INFO: lm.c(595): Reading LM file /sang/speech/sphinx/tutoial/an4/etc/an4.ug.lm.DMP (LM name "default")
INFO: lm_3g_dmp.c(628): Reading LM in 16 bits format
INFO: lm_3g_dmp.c(684): Read 101 unigrams [in memory]
INFO: lm_3g_dmp.c(757): 1 bigrams [on disk]
INFO: lm_3g_dmp.c(900): 2 bigram prob entries
INFO: lm_3g_dmp.c(1051): 101 word strings
INFO: lm.c(686): The LM routine is operating at 16 bits mode
INFO: Initialization of fillpen_t, report:
INFO: Language weight =23.000000
INFO: Word Insertion Penalty =0.200000
INFO: Silence probability =0.100000
INFO: Filler probability =0.100000
INFO:
INFO: dict2pid.c(577): Building PID tables for dictionary
INFO: Initialization of dict2pid_t, report:
INFO: Dict2pid is in composite triphone mode
INFO: 477 composite states; 189 composite sseq
INFO:
INFO: kbcore.c(602): Inside kbcore: Verifying models consistency ......
INFO: kbcore.c(624): End of Initialization of Core Models:
INFO: Initialization of beam_t, report:
INFO: Parameters used in Beam Pruning of Viterbi Search:
INFO: Beam=-921019
INFO: PBeam=-383758
INFO: WBeam=-614012 (Skip=0)
INFO: WEndBeam=-614012
INFO: No of CI Phone assumed=34
INFO:
INFO: Initialization of fast_gmm_t, report:
INFO: Parameters used in Fast GMM computation:
INFO: Frame-level: Down Sampling Ratio 1, Conditional Down Sampling? 0, Distance-based Down Sampling? 0
INFO: GMM-level: CI phone beam -614012. MAX CD 100000
INFO: Gaussian-level: GS map would be used for Gaussian Selection? =1, SVQ would be used as Gaussian Score? =0 SubVQ Beam -19363
INFO:
INFO: Initialization of pl_t, report:
INFO: Parameters used in phoneme lookahead:
INFO: Phoneme look-ahead type = 0
INFO: Phoneme look-ahead beam size = 65945
INFO: No of CI Phones assumed=34
INFO:
INFO: Initialization of ascr_t, report:
INFO: No. of CI senone =102
INFO: No. of senone = 1102
INFO: No. of composite senone = 477
INFO: No. of senone sequence = 1676
INFO: No. of composite senone sequence=189
INFO: Parameters used in phoneme lookahead:
INFO: Phoneme lookahead window = 1
INFO:
INFO: kb.c(306): SEARCH MODE INDEX 4
INFO: srch.c(372): Search Initialization.
WARNING: "srch_time_switch_tree.c", line 172: -Nstalextree is omitted in TST search.
INFO: lextree.c(221): Creating Unigram Table for lm (name: default)
INFO: lextree.c(234): Size of word table after unigram + words in class: 99.
INFO: lextree.c(243): Size of word table after adding alternative prons: 128.
INFO: lextree_t, report:
INFO: Parameters of the lexical tree.
INFO: Type of the tree 0 (0:unigram, 1: 2g, 2: 3g etc.)
INFO: Number of left contexts 20
INFO: Number of node 869
INFO: Number of links in the tree 3522
INFO: The previous word for this tree
INFO: The size of a node of the lexical tree 96
INFO: The size of a gnode_t 12
INFO:
INFO: srch_time_switch_tree.c(232): Lextrees (0) for lm 0, its name is default, it has 869 nodes(ug)
INFO: lextree.c(221): Creating Unigram Table for lm (name: default)
INFO: lextree.c(234): Size of word table after unigram + words in class: 99.
INFO: lextree.c(243): Size of word table after adding alternative prons: 128.
INFO: lextree_t, report:
INFO: Parameters of the lexical tree.
INFO: Type of the tree 0 (0:unigram, 1: 2g, 2: 3g etc.)
INFO: Number of left contexts 20
INFO: Number of node 869
INFO: Number of links in the tree 3522
INFO: The previous word for this tree
INFO: The size of a node of the lexical tree 96
INFO: The size of a gnode_t 12
INFO:
INFO: srch_time_switch_tree.c(232): Lextrees (1) for lm 0, its name is default, it has 869 nodes(ug)
INFO: lextree.c(221): Creating Unigram Table for lm (name: default)
INFO: lextree.c(234): Size of word table after unigram + words in class: 99.
INFO: lextree.c(243): Size of word table after adding alternative prons: 128.
INFO: lextree_t, report:
INFO: Parameters of the lexical tree.
INFO: Type of the tree 0 (0:unigram, 1: 2g, 2: 3g etc.)
INFO: Number of left contexts 20
INFO: Number of node 869
INFO: Number of links in the tree 3522
INFO: The previous word for this tree
INFO: The size of a node of the lexical tree 96
INFO: The size of a gnode_t 12
INFO:
INFO: srch_time_switch_tree.c(232): Lextrees (2) for lm 0, its name is default, it has 869 nodes(ug)
INFO: srch_time_switch_tree.c(239): Time for building trees, 0.0040 CPU 0.0031 Clk
INFO: srch_time_switch_tree.c(261): Lextrees(0), 1 nodes(filler)
INFO: srch_time_switch_tree.c(261): Lextrees(1), 1 nodes(filler)
INFO: srch_time_switch_tree.c(261): Lextrees(2), 1 nodes(filler)
INFO: vithist.c(167): Initializing Viterbi-history module
INFO: Initialization of srch_t, report:
INFO: Operation Mode = 4, Operation Name = fwdtree
INFO:
INFO: utt.c(196): Processing: an406-fcaw-b
INFO: feat.c(1139): At directory /sang/speech/sphinx/tutoial/an4/feat
INFO: feat.c(369): Reading mfc file: '/sang/speech/sphinx/tutoial/an4/feat/an4test_clstk/fcaw/an406-fcaw-b.mfc'[0..-1]
ERROR: "srch_time_switch_tree.c", line 737: ERROR Fr 0, best HMM score > 0 (1207777165); int32 wraparound?
.ERROR: "srch_time_switch_tree.c", line 737: ERROR Fr 1, best HMM score > 0 (1476394336); int32 wraparound?
ERROR: "lextree.c", line 1600: out.history==-1, error
ERROR: "srch_time_switch_tree.c", line 928: Propagation Failed for lextree_hmm_propagate_leave at tree 0
ERROR: "srch_time_switch_tree.c", line 737: ERROR Fr 2, best HMM score > 0 (268434958); int32 wraparound?
ERROR: "srch_time_switch_tree.c", line 737: ERROR Fr 3, best HMM score > 0 (1476394012); int32 wraparound?
ERROR: "srch_time_switch_tree.c", line 737: ERROR Fr 4, best HMM score > 0 (268435103); int32 wraparound?
ERROR: "lextree.c", line 1600: out.history==-1, error
ERROR: "srch_time_switch_tree.c", line 928: Propagation Failed for lextree_hmm_propagate_leave at tree 0
.......................................WARNING: "vithist.c", line 787: No word exit in frame 398, using exits from frame 3
ERROR: "vithist.c", line 814: No word exit in frame 398, using exits from frame 3
INFO: fast_algo_struct.c(398): HMMHist0..0: 399(100)
INFO: lm.c(945): 0 tg(), 0 tgcache, 0 bo; 0 fills, 0 in mem (0.0%)
INFO: lm.c(949): 2 bg(), 0 bo; 1 fills, 1 in mem (50.0%)
Backtrace(an406-fcaw-b)
FV:an406-fcaw-b> WORD SFrm EFrm AScr(UnNorm) LMScore AScr+LScr AScale
fv:an406-fcaw-b> <sil> 0 3 -670920344 -181889 -671102233 2147483645
fv:an406-fcaw-b> <sil> 4 398 -394 -181889 -182283 -394
FV:an406-fcaw-b> TOTAL -670920738 -363778
FWDVIT: (an406-fcaw-b)
FWDXCT: an406-fcaw-b S 2147483249 T -670936088 A -670920738 L -15350 0 -670920344 -7675 <sil> 4 -394 -7675 <sil> 399
INFO: stat.c(172): 399 frm; 0 cdsen/fr, 102 cisen/fr, 0 cdgau/fr, 815 cigau/fr, Sen 2.97, CPU 2.97 Clk [Ovrhd 2.97 CPU 2.97 Clk]; Search: 0.00 CPU 0.00 Clk (an406-fcaw-b)
INFO: corpus.c(647): an406-fcaw-b: 11.9 sec CPU, 11.9 sec Clk; TOT: 11.9 sec CPU, 11.9 sec Clk
INFO: stat.c(204): SUMMARY: 399 fr; 0 cdsen/fr, 102 cisen/fr, 0 cdgau/fr, 815 cigau/fr, 2.97 xCPU 2.97 xClk [Ovhrd 2.97 xCPU 3 xClk]; 0 hmm/fr, 0 wd/fr, 0.00 xCPU 0.00 xClk; tot: 2.97 xCPU, 2.97 xClk
root 32631 99.6 0.7 10744 3724 pts/6 R 19:23 0:11 /sang/speech/sphinx/tutoial/an4/bin/sphinx3_decode -mdef /sang/speech/sphinx/tutoial/an4/model_architecture/an4.1000.mdef -senmgau .cont. -hmm /sang/speech/sphinx/tutoial/an4/model_parameters/an4.cd_cont_1000 -lw 23 -feat 1s_c_d_dd -beam 1e-120 -wbeam 1e-80 -dict /sang/speech/sphinx/tutoial/an4/etc/an4.dic -fdict /sang/speech/sphinx/tutoial/an4/etc/an4.filler -lm /sang/speech/sphinx/tutoial/an4/etc/an4.ug.lm.DMP -wip 0.2 -ctl /sang/speech/sphinx/tutoial/an4/etc/an4_test.fileids -ctloffset 0 -ctlcount 1 -cepdir /sang/speech/sphinx/tutoial/an4/feat -cepext .mfc -hyp /sang/speech/sphinx/tutoial/an4/result/an4-1-1.match -agc none -varnorm no -cmn current
root 32632 0.0 0.1 4152 856 pts/6 R 19:24 0:00 sh -c ps aguxwww | grep sphinx3_decode
Mon Sep 10 19:24:00 2007
Any help will be appreciated.
Thanks a lot
Bob Sang
Sorry, but how did you created mfc files, are you just downloaded mfc for an4 from the site? I suppose you should extract sphinx features with ./scripts_pl/make_feats from raw files. The ones on the site are just MEL-cepstral coefficients.
At least I've just trained an4 and got 50% WER and my .mfc file differs from yours.
At first, i use make_feets.pl to generate sph file to mfc files, but got more errors, like can not get last state. So I edited these raw voice files to remove a little bit noise, then they are OK. But still has errors I just posted here.
Should I change sphinx_train.cfg file?
Thanks a lot.
That is very strange. Actually files must have noise in order to be recognized properly. Can you please start with the original files and scripts and show the error you are getting.
Are you running on 64-bit machine, what exact Sphinxtrain version are you using?
I require help from a programming whiz out ther to help me develop a tutorial that I think will eventually bring in big dollars.
I already produce a CD set of lessons called "Speak Australian" (SA).
it is very effective to help Australian migrants to modify their speech patterns & enable them to get jobs or better jobs.
I plan to make my tutorial interactive using speach recognition to evaluate enunciation & produce a score, just like "Typing Tutor".
This may seem like a daunting project but when divided into its componenets it is not so:
The main varients in speech are pitch modulation, speed of words & pauses, & accent(volume on a syllable) all of which (according my limited understanding of audio recording) are all quite measureable.
The market for tutorials to improve spoken Australian is huge ...especially with foreign help desks used by many companies these days. The principle is also applicable to other versions of English & other languages...a really huge market.
Is anyone out there???????
I would like to meet with you.
Best wishes,
Michael (director Shine Institute
email MS@shine-institute.com
Are there any issues with 64 bit machines that we should be aware of.
I was thinking of getting one for sphinxTrain.
Nope, 64 bit machines work fine. We have a lot of them here.
The Intel ones (Xeon 5100 and 5300 series) seem to be significantly faster than the Opterons for Sphinx, probably because they have larger caches and very good integer math performance.
I download newest version Sphinxtrain night build 3 weeks ago.
GCC 3.4.5,
Kernel 2.6.21.5
CPU: P4 3.0G, I do not think it is a 64 bits.
BTW,how do I know 64 bits or 32 bits?
Thanks a lot
Bob Sang
Hm, strange. Then, can you upload the whole build dir with temporary files somewhere?
hi,
every thing is here
http://www.webjb.org/sphinx/
I tried HTK before, but I want to use speech recognize in ARM9 CPU. HTK
is too heavy to fit in. So I changed to Sphinx.
I am very appreciate your help.
Bob Sang