i created a new set of LM files with just the numbers in them (i.e. one, two, three, etc.). the recognition for this engine is horrible. i tried with the default LM that sphinx3 comes with but it was just as bad.
anyone else have a different experience?
here is the log from running sphinx3 with my customized LM.
INFO: d:\speech\sphinx3\src\libs3decoder\kbcore.c(95): Initializing core models:
INFO: d:\speech\sphinx3\src\libs3decoder\logs3.c(99): Initializing logbase: 1.000300e+000 (add table: 1)
INFO: d:\speech\sphinx3\src\libs3decoder\logs3.c(161): Log-Add table size = 2935
0
INFO: d:\speech\sphinx3\src\libs3decoder\feat.c(642): Initializing feature stream to type: '1s_c_d_dd', CMN='current', VARNORM='no', AGC='none'
INFO: d:\speech\sphinx3\src\libs3decoder\mdef.c(594): Reading model definition: ./model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/hub4opensrc.6000.mdef
INFO: d:\speech\sphinx3\src\libs3decoder\mdef.c(771): 48 CI-phone, 133500 CD-phone, 3 emitstate/phone, 144 CI-sen, 6144 Sen, 32639 Sen-Seq
INFO: d:\speech\sphinx3\src\libs3decoder\dict.c(358): Reading main dictionary: ./model/lm/numbers/numbers.dic
ERROR: "d:\speech\sphinx3\src\libs3decoder\dict.c", line 192: Line 7: Bad ciphone: AX; word SEVEN ignored
INFO: d:\speech\sphinx3\src\libs3decoder\dict.c(361): 11 words read
INFO: d:\speech\sphinx3\src\libs3decoder\dict.c(366): Reading filler dictionary: ./model/lm/numbers/filler.dict
INFO: d:\speech\sphinx3\src\libs3decoder\dict.c(369): 3 words read
INFO: d:\speech\sphinx3\src\libs3decoder\lm.c(739): LM read('./model/lm/numbers/numbers.lm.DMP', lw= 9.50, wip= -1188, uw= 0.70)
INFO: d:\speech\sphinx3\src\libs3decoder\lm.c(553): 12 ug
INFO: d:\speech\sphinx3\src\libs3decoder\lm.c(583): 20 bigrams [on disk]
INFO: d:\speech\sphinx3\src\libs3decoder\lm.c(591): 10 trigrams [on disk]
INFO: d:\speech\sphinx3\src\libs3decoder\lm.c(613): 3 bigram prob entries
INFO: d:\speech\sphinx3\src\libs3decoder\lm.c(631): 3 trigram bowt entries
INFO: d:\speech\sphinx3\src\libs3decoder\lm.c(647): 2 trigram prob entries
INFO: d:\speech\sphinx3\src\libs3decoder\lm.c(662): 1 trigram segtable entries (512 segsize)
INFO: d:\speech\sphinx3\src\libs3decoder\lm.c(696): 12 word strings
ERROR: "d:\speech\sphinx3\src\libs3decoder\wid.c", line 171: SEVEN is not a word in dictionary and it is not a class tag.
INFO: d:\speech\sphinx3\src\libs3decoder\wid.c(178): 1 LM words not in dictionary; ignored
INFO: d:\speech\sphinx3\src\libs3decoder\cont_mgau.c(92): Reading mixture gaussian file './model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/means'
INFO: d:\speech\sphinx3\src\libs3decoder\cont_mgau.c(244): 6144 mixture Gaussians, 8 components, veclen 26688544
INFO: d:\speech\sphinx3\src\libs3decoder\cont_mgau.c(92): Reading mixture gaussian file './model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/variances'
INFO: d:\speech\sphinx3\src\libs3decoder\cont_mgau.c(244): 6144 mixture Gaussians, 8 components, veclen 26688496
INFO: d:\speech\sphinx3\src\libs3decoder\cont_mgau.c(265): Reading mixture weights file './model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/mixture_weights'
ERROR: "d:\speech\sphinx3\src\libs3decoder\cont_mgau.c", line 346: Weight normalization failed for 3 senones
INFO: d:\speech\sphinx3\src\libs3decoder\cont_mgau.c(358): Read 6144 x 8 mixture weights
INFO: d:\speech\sphinx3\src\libs3decoder\cont_mgau.c(374): Removing uninitialized Gaussian densities 6 7 8
INFO: d:\speech\sphinx3\src\libs3decoder\cont_mgau.c(404): 24 densities removed
(3 mixtures removed entirely)
INFO: d:\speech\sphinx3\src\libs3decoder\cont_mgau.c(412): Applying variance floor
INFO: d:\speech\sphinx3\src\libs3decoder\cont_mgau.c(424): 0 variance values floored
INFO: d:\speech\sphinx3\src\libs3decoder\cont_mgau.c(470): Precomputing Mahalanobis distance invariants
INFO: d:\speech\sphinx3\src\libs3decoder\tmat.c(135): Reading HMM transition probability matrices: ./model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/transition_matrices
ERROR: "d:\speech\sphinx3\src\libs3decoder\tmat.c", line 197: Normalization failed for tmat 2 from state 0
ERROR: "d:\speech\sphinx3\src\libs3decoder\tmat.c", line 197: Normalization failed for tmat 2 from state 1
ERROR: "d:\speech\sphinx3\src\libs3decoder\tmat.c", line 197: Normalization failed for tmat 2 from state 2
INFO: d:\speech\sphinx3\src\libs3decoder\tmat.c(217): Read 48 transition matrices of size 3x4
INFO: d:\speech\sphinx3\src\libs3decoder\dict2pid.c(254): Building PID tables for dictionary
INFO: d:\speech\sphinx3\src\libs3decoder\dict2pid.c(422): 63 composite states; 21 composite sseq
INFO: d:\speech\sphinx3\src\libs3decoder\kbcore.c(225): Verifying models consistency:
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(197): Building lextrees
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(243): Creating Unigram Table
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(246): Size of word table after unigram + words in class: 9
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(263): Lextrees(3), 112 nodes(ug)
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(291): Lextrees(3), 1 nodes(filler)
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(317): Beam= -307006, PBeam= -230254, WBeam= -153503, SVQBeam= -15350
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(322): Down Sampling Ratio = 2
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(328): Conditional Down Sampling Parameter = 0
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(333): GS map would be used for Gaussian Selection? = 1
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(336): SVQ would be used as Gaussian Score ?= 0
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(339): CI phone beam to prune the number of parent CI phones in CI-base GMM Selection = 23025
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(345): Word-end pruning beam: 7675
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(348): Phoneme look-ahead window size = 1
WARNING: "d:\speech\sphinx3\src\libs3decoder\logs3.c", line 203: logs3 argument: 0.000000e+000; using S3_LOGPROB_ZERO
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(353): Phoneme look-ahead beam = -939524096
INFO: d:\speech\sphinx3\src\libs3decoder\vithist.c(77): Initializing Viterbi-history module
Allocating 32 buffers of 2500 samples each
System will listen for ~ 5.0 sec of speech
Hit <cr> before speaking:
INFO: d:\speech\sphinx3\src\libs3decoder\feat.c(971): Feature buffers initialized to 256 vectors
INFO: d:\speech\sphinx3\src\libs3decoder\cmn_prior.c(72): mean[0]= 12.00, mean[1..12]= 0.0
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 11
INFO: d:\speech\sphinx3\src\libs3decoder\approx_cont_mgau.c(328): Re-normalizing the previous score
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 15
INFO: d:\speech\sphinx3\src\programs\main_live_example.c(128): PARTIAL HYP: <sil>
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 16
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 15
INFO: d:\speech\sphinx3\src\libs3decoder\approx_cont_mgau.c(328): Re-normalizing the previous score
INFO: d:\speech\sphinx3\src\programs\main_live_example.c(128): PARTIAL HYP: <sil>
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 16
ERROR: "d:\speech\sphinx3\src\libs3decoder\vithist.c", line 599: No word exits from last frame in block 72
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 16
INFO: d:\speech\sphinx3\src\programs\main_live_example.c(128): PARTIAL HYP: <sil> NINE
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 15
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 16
INFO: d:\speech\sphinx3\src\programs\main_live_example.c(128): PARTIAL HYP: <sil> NINE <sil> EIGHT
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 16
INFO: d:\speech\sphinx3\src\libs3decoder\approx_cont_mgau.c(328): Re-normalizing the previous score
INFO: d:\speech\sphinx3\src\libs3decoder\approx_cont_mgau.c(328): Re-normalizing the previous score
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 15
INFO: d:\speech\sphinx3\src\programs\main_live_example.c(128): PARTIAL HYP: <sil> NINE <sil> EIGHT TWO TWO
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 16
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 15
INFO: d:\speech\sphinx3\src\programs\main_live_example.c(128): PARTIAL HYP: <sil> NINE <sil> EIGHT TWO TWO
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 16
INFO: d:\speech\sphinx3\src\libs3decoder\approx_cont_mgau.c(328): Re-normalizing the previous score
INFO: d:\speech\sphinx3\src\libs3decoder\approx_cont_mgau.c(328): Re-normalizing the previous score
INFO: d:\speech\sphinx3\src\libs3decoder\approx_cont_mgau.c(328): Re-normalizing the previous score
INFO: d:\speech\sphinx3\src\libs3decoder\approx_cont_mgau.c(328): Re-normalizing the previous score
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 16
INFO: d:\speech\sphinx3\src\programs\main_live_example.c(128): PARTIAL HYP: <sil> NINE <sil> EIGHT TWO TWO <sil>
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 15
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 16
INFO: d:\speech\sphinx3\src\programs\main_live_example.c(128): PARTIAL HYP: <sil> NINE <sil> EIGHT TWO TWO <sil>
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 16
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 15
INFO: d:\speech\sphinx3\src\programs\main_live_example.c(128): PARTIAL HYP: <sil> NINE <sil> EIGHT TWO TWO <sil>
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 16
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 18
Hi Mike,
The included model is a general acoustic model, when it was trained, it was targeted to build a model for broad cast new type language and speech characteristics. The model is obviously much more "flat" than a digit specific HMM model. That's why it doesn't work for you.
Sphinx 3's model is basically just for automatic testing of the software. If you use it to build application, there will always have a lot of problems. We still recommend you to train your own models for your applications.
We have give a lot of disclaimers in README and web pages. However, this is still not a common knowledge for all users. Hopefully, we can come up something later to remind the users about this important fact.
Arthur
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
i created a new set of LM files with just the numbers in them (i.e. one, two, three, etc.). the recognition for this engine is horrible. i tried with the default LM that sphinx3 comes with but it was just as bad.
anyone else have a different experience?
here is the log from running sphinx3 with my customized LM.
D:\speech\sphinx3\win32\batch>.\sphinx3-numbers
D:\speech\sphinx3\win32\batch>echo off
" "
"sphinx3-simple:"
" Demo CMU Sphinx-3 decoder called with command line arguments."
" "
"<executing $S3CONTINUOUS, please wait>"
INFO: d:\speech\sphinx3\src\libutil\cmd_ln.c(276): Parsing command line: \
-mdef ./model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/hub4opensrc.6000.mdef \
-fdict ./model/lm/numbers/filler.dict \
-dict ./model/lm/numbers/numbers.dic \
-mean ./model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/means \
-var ./model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/variances \
-mixw ./model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/mixture_weights \
-tmat ./model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/transition_matrices\
-upperf 6855.49756 \
-lowerf 133.33334 \
-nfilt 40 \
-feat 1s_c_d_dd \
-nfft 512 \
-wlen 0.025625 \
-samprate 16000 \
-agc none \
-varnorm no \
-cmn current \
-subvqbeam 1e-02 \
-epl 4 \
-fillprob 0.02 \
-lw 9.5 \
-maxwpf 1 \
-beam 1e-40 \
-pbeam 1e-30 \
-wbeam 1e-20 \
-maxhmmpf 1500 \
-wend_beam 1e-1 \
-ci_pbeam 1e-3 \
-ds 2 \
-lm ./model/lm/numbers/numbers.lm.DMP
Configuration in effect:
[NAME] [DEFLT] [VALUE]
-agc max none
-alpha 0.97 9.700000e-001
-beam 1.0e-55 1.000000e-040
-bghist 0 0
-bptbldir
-cepdir
-ci_pbeam 1e-80 1.000000e-003
-cmn current current
-cond_ds 0 0
-ctl
-ctlcount 1000000000 1000000000
-ctloffset 0 0
-ctl_lm
-dict ./model/lm/numbers/numbers.dic
-ds 1 2
-epl 3 4
-fdict ./model/lm/numbers/filler.dict
-feat 1s_c_d_dd 1s_c_d_dd
-fillpen
-fillprob 0.1 2.000000e-002
-frate 100 100
-gs
-gs4gs 1 1
-hmmdump 0 0
-hmmhistbinsize 5000 5000
-hyp
-hypseg
-latext lat.gz lat.gz
-lextreedump 0 0
-lm ./model/lm/numbers/numbers.lm.DMP
-lmctlfn
-lmdumpdir
-lminmemory 0 0
-log3table 1 1
-logbase 1.0003 1.000300e+000
-lowerf 200 1.333333e+002
-lw 8.5 9.500000e+000
-maxcepvecs 256 256
-maxhistpf 100 100
-maxhmmpf 20000 1500
-maxhyplen 1000 1000
-maxwpf 20 1
-mdef ./model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/hub4opensrc.6000.mdef
-mean ./model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/means
-mixw ./model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/mixture_weights
-mixwfloor 0.0000001 1.000000e-007
-nfft 256 512
-nfilt 31 40
-Nlextree 3 3
-outlatdir
-outlatoldfmt 1 1
-pbeam 1.0e-50 1.000000e-030
-pheurtype 0 0
-pl_beam 1.0e-80 0.000000e+000
-pl_window 1 1
-ptranskip 0 0
-samprate 8000 16000
-senmgau .cont. .cont.
-silprob 0.1 1.000000e-001
-subvq
-subvqbeam 3.0e-3 1.000000e-002
-svq4svq 0 0
-tmat ./model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/transition_matrices
-tmatfloor 0.0001 1.000000e-004
-treeugprob 1 1
-upperf 3500 6.855498e+003
-utt
-uw 0.7 7.000000e-001
-var ./model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/variances
-varfloor 0.0001 1.000000e-004
-varnorm no no
-vqeval 3 3
-wbeam 1.0e-35 1.000000e-020
-wend_beam 1.0e-80 1.000000e-001
-wip 0.7 7.000000e-001
-wlen 0.0256 2.562500e-002
INFO: d:\speech\sphinx3\src\libs3decoder\kbcore.c(95): Initializing core models:
INFO: d:\speech\sphinx3\src\libs3decoder\logs3.c(99): Initializing logbase: 1.000300e+000 (add table: 1)
INFO: d:\speech\sphinx3\src\libs3decoder\logs3.c(161): Log-Add table size = 2935
0
INFO: d:\speech\sphinx3\src\libs3decoder\feat.c(642): Initializing feature stream to type: '1s_c_d_dd', CMN='current', VARNORM='no', AGC='none'
INFO: d:\speech\sphinx3\src\libs3decoder\mdef.c(594): Reading model definition: ./model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/hub4opensrc.6000.mdef
INFO: d:\speech\sphinx3\src\libs3decoder\mdef.c(771): 48 CI-phone, 133500 CD-phone, 3 emitstate/phone, 144 CI-sen, 6144 Sen, 32639 Sen-Seq
INFO: d:\speech\sphinx3\src\libs3decoder\dict.c(358): Reading main dictionary: ./model/lm/numbers/numbers.dic
ERROR: "d:\speech\sphinx3\src\libs3decoder\dict.c", line 192: Line 7: Bad ciphone: AX; word SEVEN ignored
INFO: d:\speech\sphinx3\src\libs3decoder\dict.c(361): 11 words read
INFO: d:\speech\sphinx3\src\libs3decoder\dict.c(366): Reading filler dictionary: ./model/lm/numbers/filler.dict
INFO: d:\speech\sphinx3\src\libs3decoder\dict.c(369): 3 words read
INFO: d:\speech\sphinx3\src\libs3decoder\lm.c(739): LM read('./model/lm/numbers/numbers.lm.DMP', lw= 9.50, wip= -1188, uw= 0.70)
INFO: d:\speech\sphinx3\src\libs3decoder\lm.c(553): 12 ug
INFO: d:\speech\sphinx3\src\libs3decoder\lm.c(583): 20 bigrams [on disk]
INFO: d:\speech\sphinx3\src\libs3decoder\lm.c(591): 10 trigrams [on disk]
INFO: d:\speech\sphinx3\src\libs3decoder\lm.c(613): 3 bigram prob entries
INFO: d:\speech\sphinx3\src\libs3decoder\lm.c(631): 3 trigram bowt entries
INFO: d:\speech\sphinx3\src\libs3decoder\lm.c(647): 2 trigram prob entries
INFO: d:\speech\sphinx3\src\libs3decoder\lm.c(662): 1 trigram segtable entries (512 segsize)
INFO: d:\speech\sphinx3\src\libs3decoder\lm.c(696): 12 word strings
ERROR: "d:\speech\sphinx3\src\libs3decoder\wid.c", line 171: SEVEN is not a word in dictionary and it is not a class tag.
INFO: d:\speech\sphinx3\src\libs3decoder\wid.c(178): 1 LM words not in dictionary; ignored
INFO: d:\speech\sphinx3\src\libs3decoder\cont_mgau.c(92): Reading mixture gaussian file './model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/means'
INFO: d:\speech\sphinx3\src\libs3decoder\cont_mgau.c(244): 6144 mixture Gaussians, 8 components, veclen 26688544
INFO: d:\speech\sphinx3\src\libs3decoder\cont_mgau.c(92): Reading mixture gaussian file './model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/variances'
INFO: d:\speech\sphinx3\src\libs3decoder\cont_mgau.c(244): 6144 mixture Gaussians, 8 components, veclen 26688496
INFO: d:\speech\sphinx3\src\libs3decoder\cont_mgau.c(265): Reading mixture weights file './model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/mixture_weights'
ERROR: "d:\speech\sphinx3\src\libs3decoder\cont_mgau.c", line 346: Weight normalization failed for 3 senones
INFO: d:\speech\sphinx3\src\libs3decoder\cont_mgau.c(358): Read 6144 x 8 mixture weights
INFO: d:\speech\sphinx3\src\libs3decoder\cont_mgau.c(374): Removing uninitialized Gaussian densities 6 7 8
INFO: d:\speech\sphinx3\src\libs3decoder\cont_mgau.c(404): 24 densities removed
(3 mixtures removed entirely)
INFO: d:\speech\sphinx3\src\libs3decoder\cont_mgau.c(412): Applying variance floor
INFO: d:\speech\sphinx3\src\libs3decoder\cont_mgau.c(424): 0 variance values floored
INFO: d:\speech\sphinx3\src\libs3decoder\cont_mgau.c(470): Precomputing Mahalanobis distance invariants
INFO: d:\speech\sphinx3\src\libs3decoder\tmat.c(135): Reading HMM transition probability matrices: ./model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/transition_matrices
ERROR: "d:\speech\sphinx3\src\libs3decoder\tmat.c", line 197: Normalization failed for tmat 2 from state 0
ERROR: "d:\speech\sphinx3\src\libs3decoder\tmat.c", line 197: Normalization failed for tmat 2 from state 1
ERROR: "d:\speech\sphinx3\src\libs3decoder\tmat.c", line 197: Normalization failed for tmat 2 from state 2
INFO: d:\speech\sphinx3\src\libs3decoder\tmat.c(217): Read 48 transition matrices of size 3x4
INFO: d:\speech\sphinx3\src\libs3decoder\dict2pid.c(254): Building PID tables for dictionary
INFO: d:\speech\sphinx3\src\libs3decoder\dict2pid.c(422): 63 composite states; 21 composite sseq
INFO: d:\speech\sphinx3\src\libs3decoder\kbcore.c(225): Verifying models consistency:
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(197): Building lextrees
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(243): Creating Unigram Table
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(246): Size of word table after unigram + words in class: 9
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(263): Lextrees(3), 112 nodes(ug)
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(291): Lextrees(3), 1 nodes(filler)
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(317): Beam= -307006, PBeam= -230254, WBeam= -153503, SVQBeam= -15350
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(322): Down Sampling Ratio = 2
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(328): Conditional Down Sampling Parameter = 0
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(333): GS map would be used for Gaussian Selection? = 1
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(336): SVQ would be used as Gaussian Score ?= 0
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(339): CI phone beam to prune the number of parent CI phones in CI-base GMM Selection = 23025
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(345): Word-end pruning beam: 7675
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(348): Phoneme look-ahead window size = 1
WARNING: "d:\speech\sphinx3\src\libs3decoder\logs3.c", line 203: logs3 argument: 0.000000e+000; using S3_LOGPROB_ZERO
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(353): Phoneme look-ahead beam = -939524096
INFO: d:\speech\sphinx3\src\libs3decoder\vithist.c(77): Initializing Viterbi-history module
Allocating 32 buffers of 2500 samples each
System will listen for ~ 5.0 sec of speech
Hit <cr> before speaking:
INFO: d:\speech\sphinx3\src\libs3decoder\feat.c(971): Feature buffers initialized to 256 vectors
INFO: d:\speech\sphinx3\src\libs3decoder\cmn_prior.c(72): mean[0]= 12.00, mean[1..12]= 0.0
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 11
INFO: d:\speech\sphinx3\src\libs3decoder\approx_cont_mgau.c(328): Re-normalizing the previous score
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 15
INFO: d:\speech\sphinx3\src\programs\main_live_example.c(128): PARTIAL HYP: <sil>
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 16
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 15
INFO: d:\speech\sphinx3\src\libs3decoder\approx_cont_mgau.c(328): Re-normalizing the previous score
INFO: d:\speech\sphinx3\src\programs\main_live_example.c(128): PARTIAL HYP: <sil>
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 16
ERROR: "d:\speech\sphinx3\src\libs3decoder\vithist.c", line 599: No word exits from last frame in block 72
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 16
INFO: d:\speech\sphinx3\src\programs\main_live_example.c(128): PARTIAL HYP: <sil> NINE
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 15
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 16
INFO: d:\speech\sphinx3\src\programs\main_live_example.c(128): PARTIAL HYP: <sil> NINE <sil> EIGHT
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 16
INFO: d:\speech\sphinx3\src\libs3decoder\approx_cont_mgau.c(328): Re-normalizing the previous score
INFO: d:\speech\sphinx3\src\libs3decoder\approx_cont_mgau.c(328): Re-normalizing the previous score
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 15
INFO: d:\speech\sphinx3\src\programs\main_live_example.c(128): PARTIAL HYP: <sil> NINE <sil> EIGHT TWO TWO
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 16
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 15
INFO: d:\speech\sphinx3\src\programs\main_live_example.c(128): PARTIAL HYP: <sil> NINE <sil> EIGHT TWO TWO
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 16
INFO: d:\speech\sphinx3\src\libs3decoder\approx_cont_mgau.c(328): Re-normalizing the previous score
INFO: d:\speech\sphinx3\src\libs3decoder\approx_cont_mgau.c(328): Re-normalizing the previous score
INFO: d:\speech\sphinx3\src\libs3decoder\approx_cont_mgau.c(328): Re-normalizing the previous score
INFO: d:\speech\sphinx3\src\libs3decoder\approx_cont_mgau.c(328): Re-normalizing the previous score
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 16
INFO: d:\speech\sphinx3\src\programs\main_live_example.c(128): PARTIAL HYP: <sil> NINE <sil> EIGHT TWO TWO <sil>
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 15
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 16
INFO: d:\speech\sphinx3\src\programs\main_live_example.c(128): PARTIAL HYP: <sil> NINE <sil> EIGHT TWO TWO <sil>
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 16
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 15
INFO: d:\speech\sphinx3\src\programs\main_live_example.c(128): PARTIAL HYP: <sil> NINE <sil> EIGHT TWO TWO <sil>
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 16
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 18
Backtrace(null)
LatID SFrm EFrm AScr LScr Type
57 0 58 -844730 -74100 -1 <sil>
83 59 90 -461667 -96036 0 NINE
98 91 105 -417859 -74100 -1 <sil>
110 106 117 -261291 -120128 0 EIGHT
128 118 135 -405479 -120128 0 TWO
181 136 168 -370310 -120128 0 TWO
348 169 309 -1464553 -74100 -1 <sil>
350 310 310 0 -23123 0 </s>
0 310 -4225889 -701843 (Total)
FWDVIT: NINE EIGHT TWO TWO (null)
FWDXCT: null S 0 T -4927732 A -4225889 L -701843 0 -844730 -74100 <sil> 59 -461667 -96036 NINE 91 -417859 -74100 <sil> 106 -261291 -120128 EIGHT 118 -405479 -120128 TWO 136 -370310 -120128 TWO 169 -1464553 -74100 <sil> 310
INFO: d:\speech\sphinx3\src\libs3decoder\utt.c(281): 310 frm; 120 sen, 946gau/fr, Sen 0.10 CPU 0.11 Clk [Ovrhd 0.00 CPU 0.00 Clk]; 55 hmm, 1 wd/fr,0.10 CPU 0.10 Clk (null)
INFO: d:\speech\sphinx3\src\libs3decoder\utt.c(295): HMMHist[0..0](null): 18(5)
INFO: d:\speech\sphinx3\src\libs3decoder\lm.c(823): 440 tg(), 425 tgcache, 14 bo; 6 fills, 1 in mem (9.1%)
INFO: d:\speech\sphinx3\src\libs3decoder\lm.c(826): 127 bg(), 14 bo; 5 fills, 14 in mem (66.7%)
INFO: d:\speech\sphinx3\src\programs\main_live_example.c(114):
FINAL HYP: <sil> NINE <sil> EIGHT TWO TWO <sil> </s>
D:\speech\sphinx3>
btw, i said "Nine Seven Two"
ok, i see why Seven isn't recognized. looks like the generated AX of S EH V AX N, doesn't exist in the hub4opensrc.6000.mdef file.
do i need to generate this file as well given the LM? or where do i get an updated model definition file?
thanks,
mike
Hi Mike,
The included model is a general acoustic model, when it was trained, it was targeted to build a model for broad cast new type language and speech characteristics. The model is obviously much more "flat" than a digit specific HMM model. That's why it doesn't work for you.
Sphinx 3's model is basically just for automatic testing of the software. If you use it to build application, there will always have a lot of problems. We still recommend you to train your own models for your applications.
We have give a lot of disclaimers in README and web pages. However, this is still not a common knowledge for all users. Hopefully, we can come up something later to remind the users about this important fact.
Arthur