Hi all,
i'm trying to build an AM for spelling recognition, using OGI speech corpus
(8kHz)
During the training in module 30 and 50, i get errors like
"utt> 7804 call3386.alphabet 3746 0 264 25 ERROR: "backward.c", line 431:
final state not reached
ERROR: "baum_welch.c", line 331: call3386/call3386.alphabet ignored"
What do they mean? that the source audio files are not good? can i ignore
these errors?
Also, when I try to run the decoding it always fails and i see that the sample
rate used is 16 Khz. How can i change the decoding samprate to 8000?
Thanks for help.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Such errors mean that trainer failed to align transcription with the audio
contents
that the source audio files are not good?
Most likely yes
can i ignore these errors?
Yes, you can
How can i change the decoding samprate to 8000?
Sample rate parameters are configured in the file feat.params. See the
tutorial for details
But now I get a lot of these:
ERROR: "gauden.c", line 1700: var (mgau= 22, feat= 0, density=3, component=20)
< 0
It is possible to understand which file causes it? or which word?
This error is caused by unsufficient training data. You don't have enough data
to train the model for the gaussian number 22. You can check in mdef file
which phone does this gaussian correspond to.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for your suggestions!! I followed your indications and I succeeded in
training the AM without errors and warning.
Now i'm trying to decoding, I changed the samprate and now it's correct.
During decoding i get this error:
ERROR: "ptm_mgau.c", line 801: Number of codebooks exceeds 256: 368
and a lot of these warnings:
INFO: ngram_search_fwdtree.c(1513): 15863 words recognized (13/fr)
INFO: ngram_search_fwdtree.c(1515): 68058 senones evaluated (57/fr)
INFO: ngram_search_fwdtree.c(1517): 25074 channels searched (20/fr), 0 1st,
25074 last
INFO: ngram_search_fwdtree.c(1521): 25074 words for which last channels
evaluated (20/fr)
INFO: ngram_search_fwdtree.c(1524): 0 candidate words for entering last phone
(0/fr)
INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 1 words
INFO: ngram_search_fwdflat.c(912): 12434 words recognized (10/fr)
INFO: ngram_search_fwdflat.c(914): 68067 senones evaluated (57/fr)
INFO: ngram_search_fwdflat.c(916): 23113 channels searched (19/fr)
INFO: ngram_search_fwdflat.c(918): 23113 words searched (19/fr)
INFO: ngram_search_fwdflat.c(920): 50 word transitions (0/fr)
WARNING: "ngram_search.c", line 1087: not found in last frame, using
instead
INFO: ngram_search.c(1137): lattice start node .0 end node .989
INFO: ps_lattice.c(1228): Normalizer P(O) = alpha(:989:1196) = -952757
INFO: ps_lattice.c(1266): Joint P(O,S) = -952757 P(S|O) = 0
INFO: batch.c(661): call1013/call1013.spell_lname_pause: 11.97 seconds speech,
0.28 seconds CPU, 0.27 seconds wall
INFO: batch.c(663): call1013/call1013.spell_lname_pause: 0.02 xRT (CPU), 0.02
xRT (elapsed)
INFO: cmn.c(175): CMN: 8.30 -0.57 -0.35 -0.14 -0.35 -0.16 -0.22 -0.18 -0.14
-0.12 -0.11 -0.09 -0.10
The decoding fails and the WER is 100%.
To build up the LM i've used the LM online tool giving it a txt file
containing the list of all words i would like to recognize.
a
accent
again
and
apostrophe
as
b
baby
baker
...
I then converted the resulted .lm file to the .lm.DMP file using the
sphinx_lmconvert script. But if I try to use the .DMP LM the decoding says
that it is not able to find the /data/ in LM. If I simply use .lm LM it works.
What I doing wrong?
Thanks for help again.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The information you provided is not sufficient to understand what have you
done wrong. You need to provide more information or check everything yourself.
During decoding i get this error: 1. ERROR: "ptm_mgau.c", line 801: Number
of codebooks exceeds 256: 368
In recent versions it's not an error. It looks you are using older
pocketsphinx
and a lot of these warnings: INFO: ngram
They are informatino messages, not errors
LM the decoding says that it is not able to find the /data/ in LM.
You can ignore that
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
i'm using pocketsphinx 0.6.1 is not the last one?
"
the warning is " WARNING: "ngram_search.c", line 1087: not found in last frame, using instead "
in each transcription there is in the last frame, why it is not able to
find it? or maybe it is referring to another ?
this is the first part of the decoding log... maybe there is some
configuration errors:
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-dither no yes
-doublebw no no
-feat 1s_c_d_dd 1s_c_d_dd
-frate 100 100
-input_endian little little
-lda
-ldadim 0 0
-lifter 0 0
-logspec no no
-lowerf 133.33334 2.000000e+02
-ncep 13 13
-nfft 512 512
-nfilt 40 31
-remove_dc no no
-round_filters yes yes
-samprate 16000 8.000000e+03
-seed -1 -1
-smoothspec no no
-svspec
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 3.500000e+03
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wlen 0.025625 2.560000e-02
INFO: acmod.c(238): Parsed model-specific feature parameters from
/home/tinnaboo/Desktop/SPELLING-AM/20110128-training/spelling-
db/model_parameters/spelling-db.cd_cont_200/feat.params
INFO: fe_interface.c(288): You are using the internal mechanism to generate
the seed.
INFO: feat.c(848): Initializing feature stream to type: '1s_c_d_dd',
ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(142): mean= 12.00, mean= 0.0
INFO: mdef.c(520): Reading model definition: /home/tinnaboo/Desktop/SPELLING-
AM/20110128-training/spelling-db/model_parameters/spelling-db.cd_cont_200/mdef
INFO: bin_mdef.c(173): Allocating 14047 * 8 bytes (109 KiB) for CD tree
INFO: tmat.c(205): Reading HMM transition probability matrices:
/home/tinnaboo/Desktop/SPELLING-AM/20110128-training/spelling-
db/model_parameters/spelling-db.cd_cont_200/transition_matrices
INFO: acmod.c(117): Attempting to use SCHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/home/tinnaboo/Desktop/SPELLING-AM/20110128-training/spelling-
db/model_parameters/spelling-db.cd_cont_200/means
INFO: ms_gauden.c(292): 368 codebook, 1 feature, size
8x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/home/tinnaboo/Desktop/SPELLING-AM/20110128-training/spelling-
db/model_parameters/spelling-db.cd_cont_200/variances
INFO: ms_gauden.c(292): 368 codebook, 1 feature, size
8x39
INFO: ms_gauden.c(356): 1305 variance values floored
INFO: acmod.c(119): Attempting to use PTHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/home/tinnaboo/Desktop/SPELLING-AM/20110128-training/spelling-
db/model_parameters/spelling-db.cd_cont_200/means
INFO: ms_gauden.c(292): 368 codebook, 1 feature, size
8x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/home/tinnaboo/Desktop/SPELLING-AM/20110128-training/spelling-
db/model_parameters/spelling-db.cd_cont_200/variances
INFO: ms_gauden.c(292): 368 codebook, 1 feature, size
8x39
INFO: ms_gauden.c(356): 1305 variance values floored
ERROR: "ptm_mgau.c", line 801: Number of codebooks exceeds 256: 368
INFO: acmod.c(121): Falling back to general multi-stream GMM computation
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/home/tinnaboo/Desktop/SPELLING-AM/20110128-training/spelling-
db/model_parameters/spelling-db.cd_cont_200/means
INFO: ms_gauden.c(292): 368 codebook, 1 feature, size
8x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/home/tinnaboo/Desktop/SPELLING-AM/20110128-training/spelling-
db/model_parameters/spelling-db.cd_cont_200/variances
INFO: ms_gauden.c(292): 368 codebook, 1 feature, size
8x39
INFO: ms_gauden.c(356): 1305 variance values floored
INFO: ms_senone.c(160): Reading senone mixture weights: /home/tinnaboo/Desktop
/SPELLING-AM/20110128-training/spelling-db/model_parameters/spelling-
db.cd_cont_200/mixture_weights
INFO: ms_senone.c(211): Truncating senone logs3(pdf) values by 10 bits
INFO: ms_senone.c(218): Not transposing mixture weights in memory
INFO: ms_senone.c(277): Read mixture weights for 368 senones: 1 features x 8
codewords
INFO: ms_senone.c(331): Mapping senones to individual codebooks
INFO: ms_mgau.c(123): The value of topn: 4
INFO: dict.c(294): Allocating 4302 * 32 bytes (134 KiB) for word entries
INFO: dict.c(306): Reading main dictionary: /home/tinnaboo/Desktop/SPELLING-
AM/20110128-training/spelling-db/etc/spelling-db.dic
INFO: dict.c(206): Allocated 0 KiB for strings, 1 KiB for phones
INFO: dict.c(309): 185 words read
INFO: dict.c(314): Reading filler dictionary: /home/tinnaboo/Desktop/SPELLING-
AM/20110128-training/spelling-db/model_parameters/spelling-
db.cd_cont_200/noisedict
INFO: dict.c(206): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(317): 21 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(405): Allocating 56^3 * 2 bytes (343 KiB) for word-initial
triphones
INFO: dict2pid.c(131): Allocated 75712 bytes (73 KiB) for word-final triphones
INFO: dict2pid.c(195): Allocated 75712 bytes (73 KiB) for single-phone word
triphones
INFO: ngram_model_arpa.c(476): ngrams 1=186, 2=368, 3=184
INFO: ngram_model_arpa.c(135): Reading unigrams
INFO: ngram_model_arpa.c(515): 186 = #unigrams created
INFO: ngram_model_arpa.c(194): Reading bigrams
INFO: ngram_model_arpa.c(531): 368 = #bigrams created
INFO: ngram_model_arpa.c(532): 3 = #prob2 entries
INFO: ngram_model_arpa.c(539): 3 = #bo_wt2 entries
INFO: ngram_model_arpa.c(291): Reading trigrams
INFO: ngram_model_arpa.c(552): 184 = #trigrams created
INFO: ngram_model_arpa.c(553): 2 = #prob3 entries
INFO: ngram_search_fwdtree.c(99): 137 unique initial diphones
INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 26 single-
phone words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 26
single-phone words
INFO: ngram_search_fwdtree.c(324): after: max nonroot chan increased to 128
INFO: ngram_search_fwdtree.c(333): after: 0 root, 0 non-root channels, 21
single-phone words
INFO: ngram_search_fwdflat.c(153): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: cmn.c(175): CMN: 4.41 -0.59 0.04 -0.12 -0.05 -0.07 -0.16 0.02 -0.02
-0.00 -0.05 0.02 -0.04
INFO: ngram_search.c(407): Resized backpointer table to 10000 entries
INFO: ngram_search_fwdtree.c(1513): 5825 words recognized (13/fr)
INFO: ngram_search_fwdtree.c(1515): 25659 senones evaluated (56/fr)
INFO: ngram_search_fwdtree.c(1517): 9474 channels searched (20/fr), 0 1st,
9474 last
INFO: ngram_search_fwdtree.c(1521): 9474 words for which last channels
evaluated (20/fr)
INFO: ngram_search_fwdtree.c(1524): 0 candidate words for entering last phone
(0/fr)
INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 1 words
INFO: ngram_search_fwdflat.c(912): 4324 words recognized (10/fr)
INFO: ngram_search_fwdflat.c(914): 25716 senones evaluated (57/fr)
INFO: ngram_search_fwdflat.c(916): 8988 channels searched (19/fr)
INFO: ngram_search_fwdflat.c(918): 8988 words searched (19/fr)
INFO: ngram_search_fwdflat.c(920): 50 word transitions (0/fr)
WARNING: "ngram_search.c", line 1087: not found in last frame, using
instead
INFO: ngram_search.c(1137): lattice start node .0 end node .415
INFO: ps_lattice.c(1228): Normalizer P(O) = alpha(:415:453) = -423689
INFO: ps_lattice.c(1266): Joint P(O,S) = -423689 P(S|O) = 0
........
other information:
The acoustic model has been built up using audio files at 8khz, mswav, with
about 22 hours of speech. I've used 200 senones (too less?)
The dictionary contains all the alphabet letters and some other words (not
spelled) (about 180 words in the dictioanry)
The LM has been build up using the LM online tool, giving as input the list of
the letters and of the words in dictionary
Each audio files contains a spelled word.
thanks for help. please let me know which other information you need.
Nadia
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
ok, the error was in the lm. Now i solved it but the accuracy i get is low for
the recognition of spelled names (WER 39%, SER 82%). Do you have any
suggestion to improve these values ?
Also, I continually get the warnings like: WARNING: "ngram_search.c", line
1087: not found in last frame, using instead during decoding.
it seems that the transcription for the files used in the test does not
contain the tag, but it contains it (name-db/etc/name-
db_test.transcriptions). Where the decode is looking for the transcription of
test files?
Thanks for help!
Nadia
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Excuse me, i found in this old post (2010-07-28) that you said that the error
"Failed to read full covariance file" can be ignored. Am I misunderstanding
it?
Excuse me, i found in this old post (2010-07-28) that you said that the
error "Failed to read full covariance file" can be ignored. Am I
misunderstanding it?
If I indeed wrote so then you can ignore it.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi all,
i'm trying to build an AM for spelling recognition, using OGI speech corpus
(8kHz)
During the training in module 30 and 50, i get errors like
"utt> 7804 call3386.alphabet 3746 0 264 25 ERROR: "backward.c", line 431:
final state not reached
ERROR: "baum_welch.c", line 331: call3386/call3386.alphabet ignored"
What do they mean? that the source audio files are not good? can i ignore
these errors?
Also, when I try to run the decoding it always fails and i see that the sample
rate used is 16 Khz. How can i change the decoding samprate to 8000?
Thanks for help.
Ok, I removed the files and then i do not moer get that errors.
But now I get a lot of these:
ERROR: "gauden.c", line 1700: var (mgau= 22, feat= 0, density=3, component=20)
< 0
It is possible to understand which file causes it? or which word?
Thanks for help.
Such errors mean that trainer failed to align transcription with the audio
contents
Most likely yes
Yes, you can
Sample rate parameters are configured in the file feat.params. See the
tutorial for details
This error is caused by unsufficient training data. You don't have enough data
to train the model for the gaussian number 22. You can check in mdef file
which phone does this gaussian correspond to.
Thanks for your suggestions!! I followed your indications and I succeeded in
training the AM without errors and warning.
Now i'm trying to decoding, I changed the samprate and now it's correct.
During decoding i get this error:
and a lot of these warnings:
INFO: ngram_search_fwdtree.c(1513): 15863 words recognized (13/fr)
INFO: ngram_search_fwdtree.c(1515): 68058 senones evaluated (57/fr)
INFO: ngram_search_fwdtree.c(1517): 25074 channels searched (20/fr), 0 1st,
25074 last
INFO: ngram_search_fwdtree.c(1521): 25074 words for which last channels
evaluated (20/fr)
INFO: ngram_search_fwdtree.c(1524): 0 candidate words for entering last phone
(0/fr)
INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 1 words
INFO: ngram_search_fwdflat.c(912): 12434 words recognized (10/fr)
INFO: ngram_search_fwdflat.c(914): 68067 senones evaluated (57/fr)
INFO: ngram_search_fwdflat.c(916): 23113 channels searched (19/fr)
INFO: ngram_search_fwdflat.c(918): 23113 words searched (19/fr)
INFO: ngram_search_fwdflat.c(920): 50 word transitions (0/fr)
WARNING: "ngram_search.c", line 1087: not found in last frame, using
instead
INFO: ngram_search.c(1137): lattice start node
.0 end node .989INFO: ps_lattice.c(1228): Normalizer P(O) = alpha(:989:1196) = -952757
INFO: ps_lattice.c(1266): Joint P(O,S) = -952757 P(S|O) = 0
INFO: batch.c(661): call1013/call1013.spell_lname_pause: 11.97 seconds speech,
0.28 seconds CPU, 0.27 seconds wall
INFO: batch.c(663): call1013/call1013.spell_lname_pause: 0.02 xRT (CPU), 0.02
xRT (elapsed)
INFO: cmn.c(175): CMN: 8.30 -0.57 -0.35 -0.14 -0.35 -0.16 -0.22 -0.18 -0.14
-0.12 -0.11 -0.09 -0.10
The decoding fails and the WER is 100%.
To build up the LM i've used the LM online tool giving it a txt file
containing the list of all words i would like to recognize.
a
accent
again
and
apostrophe
as
b
baby
baker
...
I then converted the resulted .lm file to the .lm.DMP file using the
sphinx_lmconvert script. But if I try to use the .DMP LM the decoding says
that it is not able to find the /data/ in LM. If I simply use .lm LM it works.
What I doing wrong?
Thanks for help again.The information you provided is not sufficient to understand what have you
done wrong. You need to provide more information or check everything yourself.
In recent versions it's not an error. It looks you are using older
pocketsphinx
They are informatino messages, not errors
You can ignore that
Hi,
"
in each transcription there is in the last frame, why it is not able to
find it? or maybe it is referring to another ?
this is the first part of the decoding log... maybe there is some
configuration errors:
INFO: cmd_ln.c(512): Parsing command line:
/home/tinnaboo/Desktop/SPELLING-AM/20110128-training/spelling-db/bin/.libs/lt-
pocketsphinx_batch \
-hmm /home/tinnaboo/Desktop/SPELLING-AM/20110128-training/spelling-db/model_parameters/spelling-db.cd_cont_200 \
-lw 10 \
-feat 1s_c_d_dd \
-beam 1e-80 \
-wbeam 1e-40 \
-dict /home/tinnaboo/Desktop/SPELLING-AM/20110128-training/spelling-db/etc/spelling-db.dic \
-lm /home/tinnaboo/Desktop/SPELLING-AM/20110128-training/spelling-db/etc/spelling-db.lm \
-wip 0.2 \
-ctl /home/tinnaboo/Desktop/SPELLING-AM/20110128-training/spelling-db/etc/spelling-db_test.fileids \
-ctloffset 0 \
-ctlcount 1241 \
-cepdir /home/tinnaboo/Desktop/SPELLING-AM/20110128-training/spelling-db/feat \
-cepext .mfc \
-hyp /home/tinnaboo/Desktop/SPELLING-AM/20110128-training/spelling-db/result/spelling-db-1-1.match \
-agc none \
-varnorm no \
-cmn current
Current configuration:
-adchdr 0 0
-adcin no no
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-argfile
-ascale 20.0 2.000000e+01
-backtrace no no
-beam 1e-48 1.000000e-80
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-bghist no no
-build_outdirs yes yes
-cepdir /home/tinnaboo/Desktop/SPELLING-AM/20110128-training/spelling-db/feat
-cepext .mfc .mfc
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-compallsen no no
-ctl /home/tinnaboo/Desktop/SPELLING-AM/20110128-training/spelling-db/etc/spelling-db_test.fileids
-ctlcount -1 1241
-ctlincr 1 1
-ctloffset 0 0
-ctm
-debug 0
-dict /home/tinnaboo/Desktop/SPELLING-AM/20110128-training/spelling-db/etc/spelling-db.dic
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgctl
-fsgdir
-fsgext
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm /home/tinnaboo/Desktop/SPELLING-AM/20110128-training/spelling-db/model_parameters/spelling-db.cd_cont_200
-hyp /home/tinnaboo/Desktop/SPELLING-AM/20110128-training/spelling-db/result/spelling-db-1-1.match
-hypseg
-input_endian little little
-jsgf
-kdmaxbbi -1 -1
-kdmaxdepth 0 0
-kdtree
-latsize 5000 5000
-lda
-ldadim 0 0
-lextreedump 0 0
-lifter 0 0
-lm /home/tinnaboo/Desktop/SPELLING-AM/20110128-training/spelling-db/etc/spelling-db.lm
-lmctl
-lmname default default
-lmnamectl
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.333333e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 1.000000e+01
-maxhmmpf -1 -1
-maxnewoov 20 20
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mllrctl
-mllrdir
-mllrext
-mmap yes yes
-nbest 0 0
-nbestdir
-nbestext .hyp .hyp
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-nwpen 1.0 1.000000e+00
-outlatdir
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-5 1.000000e-05
-pl_window 0 0
-rawlogdir
-remove_dc no no
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-sendump
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec
-tmat
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-usewdphones no no
-uw 1.0 1.000000e+00
-var
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 1.000000e-40
-wip 0.65 2.000000e-01
-wlen 0.025625 2.562500e-02
INFO: cmd_ln.c(512): Parsing command line:
\
-alpha 0.97 \
-dither yes \
-samprate 8000.0 \
-doublebw no \
-nfilt 31 \
-ncep 13 \
-lowerf 200.00 \
-upperf 3500.00 \
-nfft 512 \
-wlen 0.0256 \
-transform legacy \
-feat 1s_c_d_dd \
-agc none \
-cmn current \
-varnorm no
Current configuration:
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-dither no yes
-doublebw no no
-feat 1s_c_d_dd 1s_c_d_dd
-frate 100 100
-input_endian little little
-lda
-ldadim 0 0
-lifter 0 0
-logspec no no
-lowerf 133.33334 2.000000e+02
-ncep 13 13
-nfft 512 512
-nfilt 40 31
-remove_dc no no
-round_filters yes yes
-samprate 16000 8.000000e+03
-seed -1 -1
-smoothspec no no
-svspec
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 3.500000e+03
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wlen 0.025625 2.560000e-02
INFO: acmod.c(238): Parsed model-specific feature parameters from
/home/tinnaboo/Desktop/SPELLING-AM/20110128-training/spelling-
db/model_parameters/spelling-db.cd_cont_200/feat.params
INFO: fe_interface.c(288): You are using the internal mechanism to generate
the seed.
INFO: feat.c(848): Initializing feature stream to type: '1s_c_d_dd',
ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(142): mean= 12.00, mean= 0.0
INFO: mdef.c(520): Reading model definition: /home/tinnaboo/Desktop/SPELLING-
AM/20110128-training/spelling-db/model_parameters/spelling-db.cd_cont_200/mdef
INFO: bin_mdef.c(173): Allocating 14047 * 8 bytes (109 KiB) for CD tree
INFO: tmat.c(205): Reading HMM transition probability matrices:
/home/tinnaboo/Desktop/SPELLING-AM/20110128-training/spelling-
db/model_parameters/spelling-db.cd_cont_200/transition_matrices
INFO: acmod.c(117): Attempting to use SCHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/home/tinnaboo/Desktop/SPELLING-AM/20110128-training/spelling-
db/model_parameters/spelling-db.cd_cont_200/means
INFO: ms_gauden.c(292): 368 codebook, 1 feature, size
8x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/home/tinnaboo/Desktop/SPELLING-AM/20110128-training/spelling-
db/model_parameters/spelling-db.cd_cont_200/variances
INFO: ms_gauden.c(292): 368 codebook, 1 feature, size
8x39
INFO: ms_gauden.c(356): 1305 variance values floored
INFO: acmod.c(119): Attempting to use PTHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/home/tinnaboo/Desktop/SPELLING-AM/20110128-training/spelling-
db/model_parameters/spelling-db.cd_cont_200/means
INFO: ms_gauden.c(292): 368 codebook, 1 feature, size
8x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/home/tinnaboo/Desktop/SPELLING-AM/20110128-training/spelling-
db/model_parameters/spelling-db.cd_cont_200/variances
INFO: ms_gauden.c(292): 368 codebook, 1 feature, size
8x39
INFO: ms_gauden.c(356): 1305 variance values floored
ERROR: "ptm_mgau.c", line 801: Number of codebooks exceeds 256: 368
INFO: acmod.c(121): Falling back to general multi-stream GMM computation
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/home/tinnaboo/Desktop/SPELLING-AM/20110128-training/spelling-
db/model_parameters/spelling-db.cd_cont_200/means
INFO: ms_gauden.c(292): 368 codebook, 1 feature, size
8x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/home/tinnaboo/Desktop/SPELLING-AM/20110128-training/spelling-
db/model_parameters/spelling-db.cd_cont_200/variances
INFO: ms_gauden.c(292): 368 codebook, 1 feature, size
8x39
INFO: ms_gauden.c(356): 1305 variance values floored
INFO: ms_senone.c(160): Reading senone mixture weights: /home/tinnaboo/Desktop
/SPELLING-AM/20110128-training/spelling-db/model_parameters/spelling-
db.cd_cont_200/mixture_weights
INFO: ms_senone.c(211): Truncating senone logs3(pdf) values by 10 bits
INFO: ms_senone.c(218): Not transposing mixture weights in memory
INFO: ms_senone.c(277): Read mixture weights for 368 senones: 1 features x 8
codewords
INFO: ms_senone.c(331): Mapping senones to individual codebooks
INFO: ms_mgau.c(123): The value of topn: 4
INFO: dict.c(294): Allocating 4302 * 32 bytes (134 KiB) for word entries
INFO: dict.c(306): Reading main dictionary: /home/tinnaboo/Desktop/SPELLING-
AM/20110128-training/spelling-db/etc/spelling-db.dic
INFO: dict.c(206): Allocated 0 KiB for strings, 1 KiB for phones
INFO: dict.c(309): 185 words read
INFO: dict.c(314): Reading filler dictionary: /home/tinnaboo/Desktop/SPELLING-
AM/20110128-training/spelling-db/model_parameters/spelling-
db.cd_cont_200/noisedict
INFO: dict.c(206): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(317): 21 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(405): Allocating 56^3 * 2 bytes (343 KiB) for word-initial
triphones
INFO: dict2pid.c(131): Allocated 75712 bytes (73 KiB) for word-final triphones
INFO: dict2pid.c(195): Allocated 75712 bytes (73 KiB) for single-phone word
triphones
INFO: ngram_model_arpa.c(476): ngrams 1=186, 2=368, 3=184
INFO: ngram_model_arpa.c(135): Reading unigrams
INFO: ngram_model_arpa.c(515): 186 = #unigrams created
INFO: ngram_model_arpa.c(194): Reading bigrams
INFO: ngram_model_arpa.c(531): 368 = #bigrams created
INFO: ngram_model_arpa.c(532): 3 = #prob2 entries
INFO: ngram_model_arpa.c(539): 3 = #bo_wt2 entries
INFO: ngram_model_arpa.c(291): Reading trigrams
INFO: ngram_model_arpa.c(552): 184 = #trigrams created
INFO: ngram_model_arpa.c(553): 2 = #prob3 entries
INFO: ngram_search_fwdtree.c(99): 137 unique initial diphones
INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 26 single-
phone words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 26
single-phone words
INFO: ngram_search_fwdtree.c(324): after: max nonroot chan increased to 128
INFO: ngram_search_fwdtree.c(333): after: 0 root, 0 non-root channels, 21
single-phone words
INFO: ngram_search_fwdflat.c(153): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: cmn.c(175): CMN: 4.41 -0.59 0.04 -0.12 -0.05 -0.07 -0.16 0.02 -0.02
-0.00 -0.05 0.02 -0.04
INFO: ngram_search.c(407): Resized backpointer table to 10000 entries
INFO: ngram_search_fwdtree.c(1513): 5825 words recognized (13/fr)
INFO: ngram_search_fwdtree.c(1515): 25659 senones evaluated (56/fr)
INFO: ngram_search_fwdtree.c(1517): 9474 channels searched (20/fr), 0 1st,
9474 last
INFO: ngram_search_fwdtree.c(1521): 9474 words for which last channels
evaluated (20/fr)
INFO: ngram_search_fwdtree.c(1524): 0 candidate words for entering last phone
(0/fr)
INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 1 words
INFO: ngram_search_fwdflat.c(912): 4324 words recognized (10/fr)
INFO: ngram_search_fwdflat.c(914): 25716 senones evaluated (57/fr)
INFO: ngram_search_fwdflat.c(916): 8988 channels searched (19/fr)
INFO: ngram_search_fwdflat.c(918): 8988 words searched (19/fr)
INFO: ngram_search_fwdflat.c(920): 50 word transitions (0/fr)
WARNING: "ngram_search.c", line 1087: not found in last frame, using
instead
INFO: ngram_search.c(1137): lattice start node
.0 end node .415INFO: ps_lattice.c(1228): Normalizer P(O) = alpha(:415:453) = -423689
INFO: ps_lattice.c(1266): Joint P(O,S) = -423689 P(S|O) = 0
........
The acoustic model has been built up using audio files at 8khz, mswav, with
about 22 hours of speech. I've used 200 senones (too less?)
The dictionary contains all the alphabet letters and some other words (not
spelled) (about 180 words in the dictioanry)
The LM has been build up using the LM online tool, giving as input the list of
the letters and of the words in dictionary
Each audio files contains a spelled word.
thanks for help. please let me know which other information you need.
Nadiaok, the error was in the lm. Now i solved it but the accuracy i get is low for
the recognition of spelled names (WER 39%, SER 82%). Do you have any
suggestion to improve these values ?
Also, I continually get the warnings like: WARNING: "ngram_search.c", line
1087: not found in last frame, using instead during decoding.
it seems that the transcription for the files used in the test does not
contain the tag, but it contains it (name-db/etc/name-
db_test.transcriptions). Where the decode is looking for the transcription of
test files?
Thanks for help!
Nadia
HI nshmyrev,
sorry for continuous helping request.
I'm trying to improve accuracy training the AM with LDAMLLT option.
During training i get this error:
ERROR: "s3gau_full_io.c", line 129: Failed to read full covariance file
/home/tinnaboo/Desktop/SPELLING-AM/final-training-lda/spelling-
db/model_parameters/spelling-db.ci_mllt/variances (expected 90828 values, got
3132)
but at the end I find the message "LDA Training complete"
Why i get this error? The LDA training did complete correctly or not? Can i
ignore that error?
Thanks!
You don't have python configured properly
No
No, if you want to train MLLT model.
i thought to have follow all instructions to setup python correctly.
How can i verify the configuration of python?
thanks!
Nadia
Yes, everyone thinks that
Try to run commands in the logs manually
sorry, which commands in which log file?
Sorry,
I verified that both numpy and scipy have been correclty installed on my Linux
machine.
the file spelling-db.lda is created
the file spelling-db.lda_train.log contains the following lines:
Sw:
[
...,
]
Sb:
[
...,
]
u:
v:
[
...,
]
Thu Feb 10 15:36:51 2011
LDA training complete
the spelling-db.N-1.bw.log contains the following lines:
\
-topn 8 \
-abeam 1e-90 \
-bbeam 1e-10 \
-agc none \
-cmn current \
-varnorm no \
-meanreest yes \
-varreest yes -2passvar yes \
-fullvar yes \
-diagfull yes \
-feat 1s_c_d_dd \
-ceplen 13 \
-timing no
-help no no
-example no no
-hmmdir
-moddeffn /home/tinnaboo/Desktop/SPELLING-AM/final-training-lda/spelling-db/model_architecture/spelling-db.ci.mdef
-tmatfn /home/tinnaboo/Desktop/SPELLING-AM/final-training-lda/spelling-db/model_parameters/spelling-db.ci_lda/transition_matrices
-mixwfn /home/tinnaboo/Desktop/SPELLING-AM/final-training-lda/spelling-db/model_parameters/spelling-db.ci_lda/mixture_weights
-meanfn /home/tinnaboo/Desktop/SPELLING-AM/final-training-lda/spelling-db/model_parameters/spelling-db.ci_lda/means
-varfn /home/tinnaboo/Desktop/SPELLING-AM/final-training-lda/spelling-db/model_parameters/spelling-db.ci_lda/variances
-fullvar no yes
-diagfull no yes
-mwfloor 0.00001 1.000000e-05
-tpfloor 0.0001 1.000000e-05
-varfloor 0.00001 1.000000e-04
-topn 4 8
-dictfn /home/tinnaboo/Desktop/SPELLING-AM/final-training-lda/spelling-db/etc/spelling-db.dic
-fdictfn /home/tinnaboo/Desktop/SPELLING-AM/final-training-lda/spelling-db/etc/spelling-db.filler
-ltsoov no no
-ctlfn /home/tinnaboo/Desktop/SPELLING-AM/final-training-lda/spelling-db/etc/spelling-db_train.fileids
-nskip
-runlen -1 -1
-part 1
-npart 1
-cepext mfc mfc
-cepdir /home/tinnaboo/Desktop/SPELLING-AM/final-training-lda/spelling-db/feat
-phsegext phseg phseg
-phsegdir
-outphsegdir
-sentdir
-sentext sent sent
-lsnfn /home/tinnaboo/Desktop/SPELLING-AM/final-training-lda/spelling-db/etc/spelling-db_train.transcription
-accumdir /home/tinnaboo/Desktop/SPELLING-AM/final-training-lda/spelling-db/bwaccumdir/spelling-db_buff_1
-ceplen 13 13
-cepwin 0 0
-agc max none
-cmn current current
-varnorm no no
-silcomp none none
-sildel no no
-siltag SIL SIL
-abeam 1e-100 1.000000e-90
-bbeam 1e-100 1.000000e-10
-varreest yes yes
-meanreest yes yes
-mixwreest yes yes
-tmatreest yes yes
-mllrmat
-cb2mllrfn .1cls. .1cls.
-ts2cbfn .cont.
-feat 1s_c_d_dd 1s_c_d_dd
-svspec
-ldafn
-ldadim 29 29
-ldaaccum no no
-timing yes no
-viterbi no no
-2passvar no yes
-sildelfn
-spthresh 0.0 0.000000e+00
-maxuttlen 0 0
-ckptintv
-outputfullpath no no
-fullsuffixmatch no no
-pdumpdir
INFO: main.c(255): Reading /home/tinnaboo/Desktop/SPELLING-AM/final-training-
lda/spelling-db/model_architecture/spelling-db.ci.mdef
INFO: model_def_io.c(587): Model definition info:
INFO: model_def_io.c(588): 36 total models defined (36 base, 0 tri)
INFO: model_def_io.c(589): 144 total states
INFO: model_def_io.c(590): 108 total tied states
INFO: model_def_io.c(591): 108 total tied CI states
INFO: model_def_io.c(592): 36 total tied transition matrices
INFO: model_def_io.c(593): 4 max state/model
INFO: model_def_io.c(594): 4 min state/model
INFO: s3mixw_io.c(116): Read /home/tinnaboo/Desktop/SPELLING-AM/final-
training-lda/spelling-db/model_parameters/spelling-db.ci_lda/mixture_weights
INFO: s3tmat_io.c(115): Read /home/tinnaboo/Desktop/SPELLING-AM/final-
training-lda/spelling-db/model_parameters/spelling-
db.ci_lda/transition_matrices
INFO: mod_inv.c(297): inserting tprob floor 1.000000e-05 and renormalizing
INFO: s3gau_io.c(166): Read /home/tinnaboo/Desktop/SPELLING-AM/final-training-
lda/spelling-db/model_parameters/spelling-db.ci_lda/means
ERROR: "s3gau_full_io.c", line 129: Failed to read full covariance file
/home/tinnaboo/Desktop/SPELLING-AM/final-training-lda/spelling-
db/model_parameters/spelling-db.ci_lda/variances (expected 164268 values, got
4212)
INFO: s3gau_io.c(166): Read /home/tinnaboo/Desktop/SPELLING-AM/final-training-
lda/spelling-db/model_parameters/spelling-db.ci_lda/means
INFO: s3gau_io.c(166): Read /home/tinnaboo/Desktop/SPELLING-AM/final-training-
lda/spelling-db/model_parameters/spelling-db.ci_lda/variances
INFO: gauden.c(181): 108 total mgau
INFO: gauden.c(155): 1 feature streams (|0|=39 )
INFO: gauden.c(192): 1 total densities
INFO: gauden.c(98): min_var=1.000000e-04
WARNING: "mod_inv.c", line 257: n_top 8 > n_density 1. n_top <- 1
INFO: gauden.c(170): compute 1 densities/frame
INFO: main.c(363): Will reestimate mixing weights.
INFO: main.c(365): Will reestimate means.
INFO: main.c(367): Will reestimate variances.
INFO: main.c(369): WIll NOT optionally delete silence in Baum Welch or
Viterbi.
INFO: main.c(377): Will reestimate transition matrices
INFO: main.c(390): Reading main lexicon: /home/tinnaboo/Desktop/SPELLING-AM
/final-training-lda/spelling-db/etc/spelling-db.dic
INFO: lexicon.c(233): 30 entries added from /home/tinnaboo/Desktop/SPELLING-AM
/final-training-lda/spelling-db/etc/spelling-db.dic
INFO: main.c(402): Reading filler lexicon: /home/tinnaboo/Desktop/SPELLING-AM
/final-training-lda/spelling-db/etc/spelling-db.filler
INFO: lexicon.c(233): 11 entries added from /home/tinnaboo/Desktop/SPELLING-AM
/final-training-lda/spelling-db/etc/spelling-db.filler
INFO: main.c(423): Silence Tag SIL
INFO: corpus.c(1343): Will process all remaining utts starting at 0
INFO: main.c(622): Reestimation: Baum-Welch
etc.....
please, help me solving this issue!
Nadia
All commands in all log files, there aren't many of them. For example you need
to check invocation of init_gau on mllt stage 0.2
Sorry it's not clear..... I have to check in the logdir/06.mllt_train folder?
which type of log line should I see?
also, did you see my last post on python installation and lda train results
obtained?
Thanks a lot.
Excuse me, i found in this old post (2010-07-28) that you said that the error
"Failed to read full covariance file" can be ignored. Am I misunderstanding
it?
http://sourceforge.net/projects/cmusphinx/forums/forum/5471/topic/3787435
Thanks for you time and help!
Nadia
If I indeed wrote so then you can ignore it.