CMU Sphinx / Forums / Help: Sphinx3

i created a new set of LM files with just the numbers in them (i.e. one, two, three, etc.). the recognition for this engine is horrible. i tried with the default LM that sphinx3 comes with but it was just as bad.

anyone else have a different experience?

here is the log from running sphinx3 with my customized LM.

D:\speech\sphinx3\win32\batch>.\sphinx3-numbers

D:\speech\sphinx3\win32\batch>echo off
" "
"sphinx3-simple:"
" Demo CMU Sphinx-3 decoder called with command line arguments."
" "
"<executing $S3CONTINUOUS, please wait>"
INFO: d:\speech\sphinx3\src\libutil\cmd_ln.c(276): Parsing command line: \
        -mdef ./model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/hub4opensrc.6000.mdef \
        -fdict ./model/lm/numbers/filler.dict \
        -dict ./model/lm/numbers/numbers.dic \
        -mean ./model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/means \
        -var ./model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/variances \
        -mixw ./model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/mixture_weights \
        -tmat ./model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/transition_matrices\
        -upperf 6855.49756 \
        -lowerf 133.33334 \
        -nfilt 40 \
        -feat 1s_c_d_dd \
        -nfft 512 \
        -wlen 0.025625 \
        -samprate 16000 \
        -agc none \
        -varnorm no \
        -cmn current \
        -subvqbeam 1e-02 \
        -epl 4 \
        -fillprob 0.02 \
        -lw 9.5 \
        -maxwpf 1 \
        -beam 1e-40 \
        -pbeam 1e-30 \
        -wbeam 1e-20 \
        -maxhmmpf 1500 \
        -wend_beam 1e-1 \
        -ci_pbeam 1e-3 \
        -ds 2 \
        -lm ./model/lm/numbers/numbers.lm.DMP

Configuration in effect:
[NAME]          [DEFLT]         [VALUE]
-agc            max             none
-alpha          0.97            9.700000e-001
-beam           1.0e-55         1.000000e-040
-bghist         0               0
-bptbldir
-cepdir
-ci_pbeam       1e-80           1.000000e-003
-cmn            current         current
-cond_ds        0               0
-ctl
-ctlcount       1000000000      1000000000
-ctloffset      0               0
-ctl_lm
-dict                           ./model/lm/numbers/numbers.dic
-ds             1               2
-epl            3               4
-fdict                          ./model/lm/numbers/filler.dict
-feat           1s_c_d_dd       1s_c_d_dd
-fillpen
-fillprob       0.1             2.000000e-002
-frate          100             100
-gs
-gs4gs          1               1
-hmmdump        0               0
-hmmhistbinsize 5000            5000
-hyp
-hypseg
-latext         lat.gz          lat.gz
-lextreedump    0               0
-lm                             ./model/lm/numbers/numbers.lm.DMP
-lmctlfn
-lmdumpdir
-lminmemory     0               0
-log3table      1               1
-logbase        1.0003          1.000300e+000
-lowerf         200             1.333333e+002
-lw             8.5             9.500000e+000
-maxcepvecs     256             256
-maxhistpf      100             100
-maxhmmpf       20000           1500
-maxhyplen      1000            1000
-maxwpf         20              1
-mdef                           ./model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/hub4opensrc.6000.mdef
-mean                           ./model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/means
-mixw                           ./model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/mixture_weights
-mixwfloor      0.0000001       1.000000e-007
-nfft           256             512
-nfilt          31              40
-Nlextree       3               3
-outlatdir
-outlatoldfmt   1               1
-pbeam          1.0e-50         1.000000e-030
-pheurtype      0               0
-pl_beam        1.0e-80         0.000000e+000
-pl_window      1               1
-ptranskip      0               0
-samprate       8000            16000
-senmgau        .cont.          .cont.
-silprob        0.1             1.000000e-001
-subvq
-subvqbeam      3.0e-3          1.000000e-002
-svq4svq        0               0
-tmat                           ./model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/transition_matrices
-tmatfloor      0.0001          1.000000e-004
-treeugprob     1               1
-upperf         3500            6.855498e+003
-utt
-uw             0.7             7.000000e-001
-var                            ./model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/variances
-varfloor       0.0001          1.000000e-004
-varnorm        no              no
-vqeval         3               3
-wbeam          1.0e-35         1.000000e-020
-wend_beam      1.0e-80         1.000000e-001
-wip            0.7             7.000000e-001
-wlen           0.0256          2.562500e-002

INFO: d:\speech\sphinx3\src\libs3decoder\kbcore.c(95): Initializing core models:
INFO: d:\speech\sphinx3\src\libs3decoder\logs3.c(99): Initializing logbase: 1.000300e+000 (add table: 1)
INFO: d:\speech\sphinx3\src\libs3decoder\logs3.c(161): Log-Add table size = 2935
0
INFO: d:\speech\sphinx3\src\libs3decoder\feat.c(642): Initializing feature stream to type: '1s_c_d_dd', CMN='current', VARNORM='no', AGC='none'
INFO: d:\speech\sphinx3\src\libs3decoder\mdef.c(594): Reading model definition: ./model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/hub4opensrc.6000.mdef
INFO: d:\speech\sphinx3\src\libs3decoder\mdef.c(771): 48 CI-phone, 133500 CD-phone, 3 emitstate/phone, 144 CI-sen, 6144 Sen, 32639 Sen-Seq
INFO: d:\speech\sphinx3\src\libs3decoder\dict.c(358): Reading main dictionary: ./model/lm/numbers/numbers.dic
ERROR: "d:\speech\sphinx3\src\libs3decoder\dict.c", line 192: Line 7: Bad ciphone: AX; word SEVEN ignored
INFO: d:\speech\sphinx3\src\libs3decoder\dict.c(361): 11 words read
INFO: d:\speech\sphinx3\src\libs3decoder\dict.c(366): Reading filler dictionary: ./model/lm/numbers/filler.dict
INFO: d:\speech\sphinx3\src\libs3decoder\dict.c(369): 3 words read
INFO: d:\speech\sphinx3\src\libs3decoder\lm.c(739): LM read('./model/lm/numbers/numbers.lm.DMP', lw= 9.50, wip= -1188, uw= 0.70)
INFO: d:\speech\sphinx3\src\libs3decoder\lm.c(553):       12 ug
INFO: d:\speech\sphinx3\src\libs3decoder\lm.c(583):       20 bigrams [on disk]
INFO: d:\speech\sphinx3\src\libs3decoder\lm.c(591):       10 trigrams [on disk]
INFO: d:\speech\sphinx3\src\libs3decoder\lm.c(613):        3 bigram prob entries
INFO: d:\speech\sphinx3\src\libs3decoder\lm.c(631):        3 trigram bowt entries
INFO: d:\speech\sphinx3\src\libs3decoder\lm.c(647):        2 trigram prob entries
INFO: d:\speech\sphinx3\src\libs3decoder\lm.c(662):        1 trigram segtable entries (512 segsize)
INFO: d:\speech\sphinx3\src\libs3decoder\lm.c(696):       12 word strings
ERROR: "d:\speech\sphinx3\src\libs3decoder\wid.c", line 171: SEVEN is not a word in dictionary and it is not a class tag.
INFO: d:\speech\sphinx3\src\libs3decoder\wid.c(178): 1 LM words not in dictionary; ignored
INFO: d:\speech\sphinx3\src\libs3decoder\cont_mgau.c(92): Reading mixture gaussian file './model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/means'
INFO: d:\speech\sphinx3\src\libs3decoder\cont_mgau.c(244): 6144 mixture Gaussians, 8 components, veclen 26688544
INFO: d:\speech\sphinx3\src\libs3decoder\cont_mgau.c(92): Reading mixture gaussian file './model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/variances'
INFO: d:\speech\sphinx3\src\libs3decoder\cont_mgau.c(244): 6144 mixture Gaussians, 8 components, veclen 26688496
INFO: d:\speech\sphinx3\src\libs3decoder\cont_mgau.c(265): Reading mixture weights file './model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/mixture_weights'
ERROR: "d:\speech\sphinx3\src\libs3decoder\cont_mgau.c", line 346: Weight normalization failed for 3 senones
INFO: d:\speech\sphinx3\src\libs3decoder\cont_mgau.c(358): Read 6144 x 8 mixture weights
INFO: d:\speech\sphinx3\src\libs3decoder\cont_mgau.c(374): Removing uninitialized Gaussian densities 6 7 8
INFO: d:\speech\sphinx3\src\libs3decoder\cont_mgau.c(404): 24 densities removed
(3 mixtures removed entirely)
INFO: d:\speech\sphinx3\src\libs3decoder\cont_mgau.c(412): Applying variance floor
INFO: d:\speech\sphinx3\src\libs3decoder\cont_mgau.c(424): 0 variance values floored
INFO: d:\speech\sphinx3\src\libs3decoder\cont_mgau.c(470): Precomputing Mahalanobis distance invariants
INFO: d:\speech\sphinx3\src\libs3decoder\tmat.c(135): Reading HMM transition probability matrices: ./model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/transition_matrices
ERROR: "d:\speech\sphinx3\src\libs3decoder\tmat.c", line 197: Normalization failed for tmat 2 from state 0
ERROR: "d:\speech\sphinx3\src\libs3decoder\tmat.c", line 197: Normalization failed for tmat 2 from state 1
ERROR: "d:\speech\sphinx3\src\libs3decoder\tmat.c", line 197: Normalization failed for tmat 2 from state 2
INFO: d:\speech\sphinx3\src\libs3decoder\tmat.c(217): Read 48 transition matrices of size 3x4
INFO: d:\speech\sphinx3\src\libs3decoder\dict2pid.c(254): Building PID tables for dictionary
INFO: d:\speech\sphinx3\src\libs3decoder\dict2pid.c(422): 63 composite states; 21 composite sseq
INFO: d:\speech\sphinx3\src\libs3decoder\kbcore.c(225): Verifying models consistency:
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(197): Building lextrees
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(243): Creating Unigram Table
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(246): Size of word table after unigram + words in class: 9
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(263): Lextrees(3), 112 nodes(ug)
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(291): Lextrees(3), 1 nodes(filler)

INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(317): Beam= -307006, PBeam= -230254, WBeam= -153503, SVQBeam= -15350
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(322): Down Sampling Ratio = 2
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(328): Conditional Down Sampling Parameter = 0
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(333): GS map would be used for Gaussian Selection? = 1
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(336): SVQ would be used as Gaussian Score ?= 0
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(339): CI phone beam to prune the number of parent CI phones in CI-base GMM Selection = 23025
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(345): Word-end pruning beam: 7675
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(348): Phoneme look-ahead window size = 1
WARNING: "d:\speech\sphinx3\src\libs3decoder\logs3.c", line 203: logs3 argument: 0.000000e+000; using S3_LOGPROB_ZERO
INFO: d:\speech\sphinx3\src\libs3decoder\kb.c(353): Phoneme look-ahead beam = -939524096
INFO: d:\speech\sphinx3\src\libs3decoder\vithist.c(77): Initializing Viterbi-history module
Allocating 32 buffers of 2500 samples each

System will listen for ~ 5.0 sec of speech
Hit <cr> before speaking:
INFO: d:\speech\sphinx3\src\libs3decoder\feat.c(971): Feature buffers initialized to 256 vectors
INFO: d:\speech\sphinx3\src\libs3decoder\cmn_prior.c(72): mean[0]= 12.00, mean[1..12]= 0.0
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 11
INFO: d:\speech\sphinx3\src\libs3decoder\approx_cont_mgau.c(328): Re-normalizing the previous score
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 15
INFO: d:\speech\sphinx3\src\programs\main_live_example.c(128): PARTIAL HYP: <sil>
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 16
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 15
INFO: d:\speech\sphinx3\src\libs3decoder\approx_cont_mgau.c(328): Re-normalizing the previous score
INFO: d:\speech\sphinx3\src\programs\main_live_example.c(128): PARTIAL HYP: <sil>
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 16
ERROR: "d:\speech\sphinx3\src\libs3decoder\vithist.c", line 599: No word exits from last frame in block 72
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 16
INFO: d:\speech\sphinx3\src\programs\main_live_example.c(128): PARTIAL HYP: <sil> NINE
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 15
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 16
INFO: d:\speech\sphinx3\src\programs\main_live_example.c(128): PARTIAL HYP: <sil> NINE <sil> EIGHT
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 16
INFO: d:\speech\sphinx3\src\libs3decoder\approx_cont_mgau.c(328): Re-normalizing the previous score
INFO: d:\speech\sphinx3\src\libs3decoder\approx_cont_mgau.c(328): Re-normalizing the previous score
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 15
INFO: d:\speech\sphinx3\src\programs\main_live_example.c(128): PARTIAL HYP: <sil> NINE <sil> EIGHT TWO TWO
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 16
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 15
INFO: d:\speech\sphinx3\src\programs\main_live_example.c(128): PARTIAL HYP: <sil> NINE <sil> EIGHT TWO TWO
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 16
INFO: d:\speech\sphinx3\src\libs3decoder\approx_cont_mgau.c(328): Re-normalizing the previous score
INFO: d:\speech\sphinx3\src\libs3decoder\approx_cont_mgau.c(328): Re-normalizing the previous score
INFO: d:\speech\sphinx3\src\libs3decoder\approx_cont_mgau.c(328): Re-normalizing the previous score
INFO: d:\speech\sphinx3\src\libs3decoder\approx_cont_mgau.c(328): Re-normalizing the previous score
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 16
INFO: d:\speech\sphinx3\src\programs\main_live_example.c(128): PARTIAL HYP: <sil> NINE <sil> EIGHT TWO TWO <sil>
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 15
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 16
INFO: d:\speech\sphinx3\src\programs\main_live_example.c(128): PARTIAL HYP: <sil> NINE <sil> EIGHT TWO TWO <sil>
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 16
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 15
INFO: d:\speech\sphinx3\src\programs\main_live_example.c(128): PARTIAL HYP: <sil> NINE <sil> EIGHT TWO TWO <sil>
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 16
INFO: d:\speech\sphinx3\src\programs\live.c(268): live_nfeatvec: 18

Backtrace(null)
LatID SFrm EFrm        AScr     LScr Type
    57     0    58     -844730   -74100   -1 <sil>
    83    59    90     -461667   -96036    0 NINE
    98    91   105     -417859   -74100   -1 <sil>
   110   106   117     -261291 -120128    0 EIGHT
   128   118   135     -405479 -120128    0 TWO
   181   136   168     -370310 -120128    0 TWO
   348   169   309    -1464553   -74100   -1 <sil>
   350   310   310           0   -23123    0 </s>
           0   310    -4225889 -701843 (Total)

FWDVIT: NINE EIGHT TWO TWO (null)

FWDXCT: null S 0 T -4927732 A -4225889 L -701843 0 -844730 -74100 <sil> 59 -461667 -96036 NINE 91 -417859 -74100 <sil> 106 -261291 -120128 EIGHT 118 -405479 -120128 TWO 136 -370310 -120128 TWO 169 -1464553 -74100 <sil> 310

INFO: d:\speech\sphinx3\src\libs3decoder\utt.c(281): 310 frm;   120 sen,   946gau/fr, Sen 0.10 CPU 0.11 Clk [Ovrhd 0.00 CPU 0.00 Clk];     55 hmm,   1 wd/fr,0.10 CPU 0.10 Clk (null)
INFO: d:\speech\sphinx3\src\libs3decoder\utt.c(295): HMMHist[0..0](null): 18(5)
INFO: d:\speech\sphinx3\src\libs3decoder\lm.c(823):       440 tg(),       425 tgcache,       14 bo;     6 fills,        1 in mem (9.1%)
INFO: d:\speech\sphinx3\src\libs3decoder\lm.c(826):      127 bg(),       14 bo;    5 fills,       14 in mem (66.7%)
INFO: d:\speech\sphinx3\src\programs\main_live_example.c(114):

FINAL HYP: <sil> NINE <sil> EIGHT TWO TWO <sil> </s>
D:\speech\sphinx3>

Sphinx3 - My experience

Speech Recognition Toolkit

Forums

Help

Sphinx3 - My experience

Sphinx3 - My experience

Speech Recognition Toolkit

Forums

Help

Sphinx3 - My experience document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Sphinx3 - My experience