Re: [Kaldi-users] Bad results problem

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi all,

I have built the segments files like:
S002-U-000300-001280 S002-U 0.30 1.28
S002-U-001370-002780 S002-U 1.37 2.78
S002-U-005640-006030 S002-U 5.64 6.03
S002-U-006090-008680 S002-U 6.09 8.68

and the feats.lengths is like:
S002-U-000300-001280 96
S002-U-001370-002780 139
S002-U-005640-006030 37
S002-U-006090-008680 257

as well as the wav.scp file which is like:
S002-U ~/002.wav

Other settings are like before.

The result is still bad.
%WER 100.46 [ 652 / 649, 16 ins, 250 del, 386 sub ] [PARTIAL]
exp/mono.4k/decode/wer_9

Any suggestions?

Tell me if you need more information.

Thank you very much.

Zibo

On Mon, Aug 4, 2014 at 3:37 PM, Daniel Povey <dp...@gm...> wrote:

> The problem could be what Karel mentioned, that your training sentences
> are too long and too few in number, and your model initialization failed to
> converge well.
> Or it could be a scoring problem.  That output actually looked reasonable
> to me.  It could be that your segments and wav.scp files are wrong similar
> to what Karel was saying, that your data has a segmentation of the wav
> files into utterances, but you are somehow getting the data preparation
> wrong and making the utterances correspond to entire wav files.
> Dan
>
>
>
> On Mon, Aug 4, 2014 at 3:34 PM, Zibo Meng <mzb...@gm...> wrote:
>
>> Dear Dr. Povey,
>>
>> I fixed the problem you mentioned to add space between the <s> and
>> sentence & between the sentence and the </s>, but the result is still bad
>> as follows:
>> %WER 899.39 [ 5900 / 656, 5361 ins, 7 del, 532 sub ]
>> exp/mono_1k/decode/wer_13
>>
>> The decoding took about 19 hours, and here are first a few lines from the
>> log file for your consideration:
>>
>> # gmm-latgen-faster --max-active=7000 --beam=13.0 --lattice-beam=6.0
>> --acoustic-scale=0.083333 --allow-partial=true
>> --word-symbol-table=exp/mono_1k/graph/words.txt exp/mono_1k/final.mdl
>> exp/mono_1k/graph/HCLG.fst "ark,s,cs:apply-cmvn
>>  --utt2spk=ark:data/test/split1/1/utt2spk scp:data/test/split1/1/cmvn.scp
>> scp:data/test/split1/1/feats.scp ark:- | add-deltas ark:- ark:- |"
>> "ark:|gzip -c > exp/mono_1k/decode/lat.1.gz"
>> # Started at Sun Aug  3 19:40:54 EDT 2014
>> #
>> gmm-latgen-faster --max-active=7000 --beam=13.0 --lattice-beam=6.0
>> --acoustic-scale=0.083333 --allow-partial=true
>> --word-symbol-table=exp/mono_1k/graph/words.txt exp/mono_1k/final.mdl
>> exp/mono_1k/graph/HCLG.fst 'ark,s,cs:apply-cmvn
>>  --utt2spk=ark:data/test/split1/1/utt2spk scp:data/test/split1/1/cmvn.scp
>> scp:data/test/split1/1/feats.scp ark:- | add-deltas ark:- ark:- |'
>> 'ark:|gzip -c > exp/mono_1k/decode/lat.1.gz'
>> add-deltas ark:- ark:-
>> apply-cmvn --utt2spk=ark:data/test/split1/1/utt2spk
>> scp:data/test/split1/1/cmvn.scp scp:data/test/split1/1/feats.scp ark:-
>> S002-U-000300-001280 I JUST WANT TO SPEAK TO SOMEONE ELSE IS GOING ON IN
>> YOUR APPROACH TO YOUR EMOTIONS CAN OFTEN DRIVE POSITIVE SIDE OF THINGS MAKE
>> YOU FEEL A BIT OF A JOKER ARE YOU FROM AMERICA OR AUSTRALIA OR A BIT OF A
>> LOT A LOT OF ADVERTISEMENTS BUT ALSO I'LL BE ABLE TO ORDER THINGS OFF A
>> LOG
>> (gmm-latgen-faster:RebuildRepository():determinize-lattice-pruned.cc:289)
>> Rebuilding repository.
>> LOG
>> (gmm-latgen-faster:RebuildRepository():determinize-lattice-pruned.cc:289)
>> Rebuilding repository.
>> LOG
>> (gmm-latgen-faster:RebuildRepository():determinize-lattice-pruned.cc:289)
>> Rebuilding repository.
>> WARNING
>> (gmm-latgen-faster:CheckMemoryUsage():determinize-lattice-pruned.cc:322)
>> Did not reach requested beam in determinize-lattice: size exceeds maximum
>> 50000000 bytes; (repo,arcs,elems) = (38313888,123232,11676216), after
>> rebuilding, repo size was 29163264, effective beam was 3.11803 vs.
>> requested beam 6
>> WARNING
>> (gmm-latgen-faster:DecodeUtteranceLatticeFaster():lattice-faster-decoder.cc:968)
>> Determinization finished earlier than the beam for utterance
>> S002-U-000300-001280
>> LOG
>> (gmm-latgen-faster:DecodeUtteranceLatticeFaster():lattice-faster-decoder.cc:980)
>> Log-like per frame for utterance S002-U-000300-001280 is -8.46 over 30725
>> frames.
>>
>> Thank you so much for your help.
>>
>> Best,
>>
>> Zibo
>>
>>
>>
>> On Sun, Aug 3, 2014 at 6:23 PM, Daniel Povey <dp...@gm...> wrote:
>>
>>> Something that I can see is wrong is that you don't have a space between
>>> the <s> and the text, or the text and the </s>, when preparing the language
>>> modeling data.
>>> If it still doesn't work, it will help if you show us what things are
>>> being decoded as, (e.g. the first 20 lines or so of one of the decode.*.log
>>> files).
>>>
>>> Dan
>>>
>>>
>>>
>>> On Sun, Aug 3, 2014 at 6:18 PM, Zibo Meng <mzb...@gm...> wrote:
>>>
>>>> Hi,
>>>>
>>>> Sorry to bother you, but I got very bad decoding results and here are
>>>> some steps and files I used to train and decode. Any suggestions will be
>>>> appropriated.
>>>>
>>>> 1. Data preparation:
>>>> 1) Training and testing data:
>>>> I have 282 wav files whose properties are like:
>>>> Duration: 00:07:40.31, bitrate: 256 kb/s
>>>> Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono,
>>>> s16, 256 kb/s
>>>> I used all of them as the training data and one of them as the test
>>>> data.
>>>> 2) utterance file:
>>>> the first few lines of utterance file (text):
>>>> for training data:
>>>> S002-O-010830-011320 IT WON'T
>>>> S002-O-011390-012880 ACTUALLY MAKE ANYTHING
>>>> S002-O-012940-014010 WORSE IN THE LONG RUN
>>>> S002-O-014100-015240 EVERYTHING WILL TURN OUT
>>>> S002-O-015260-016470 FINE YOU'LL SEE
>>>> for testing data:
>>>> S002-U-016710-020300 IT WOULD NEED TO IT'S TAKEN US WHAT AN HOUR TO GET
>>>> THIS SET UP AGAIN
>>>> S002-U-024250-025580 I SUPPOSE THAT'S RIGHT
>>>> S002-U-025600-027630 POPPY AND IT IS A FRIDAY AFTERNOON
>>>> S002-U-027910-029580 AND A THERE STILL
>>>> S002-U-029640-031620 TIME FOR ME TO GO CHRISTMAS SHOPPING
>>>> 3) utterance to speakers
>>>> the first lines of utt2spk file are like:
>>>> for training data:
>>>> S002-O-003080-003530 002-O
>>>> S002-O-003620-004510 002-O
>>>> S002-O-004700-005730 002-O
>>>> S002-O-009260-010480 002-O
>>>> S002-O-010830-011320 002-O
>>>> for testing data:
>>>> S002-U-000300-001280 002-U
>>>> S002-U-001370-002780 002-U
>>>> S002-U-005640-006030 002-U
>>>> S002-U-006090-008680 002-U
>>>> S002-U-016710-020300 002-U
>>>> 4) scp file:
>>>> S002-O-003080-003530 ~/data/train/001 O.wav
>>>> S002-O-003620-004510 ~/data/train/001 O.wav
>>>> S002-O-004700-005730 ~/data/train/001 O.wav
>>>> S002-O-009260-010480 ~/data/train/001 O.wav
>>>> S002-O-010830-011320 ~/data/train/001 O.wav
>>>> 5) files in dict
>>>> extra_questions.txt -- empty
>>>> lexicon.txt:
>>>> I'D  AY1 D
>>>> LIKE  L AY1 K
>>>> TO  T UW1
>>>> TO  T IH0
>>>> TO  T AH0
>>>> nonsilence_phones.txt
>>>> AA AA0 AA1 AA2
>>>> AE AE0 AE1 AE2
>>>> AH AH0 AH1 AH2
>>>> AO AO0 AO1 AO2
>>>> AW AW0 AW1 AW2
>>>> optional_silence.txt
>>>> SIL
>>>> silence_phones.txt
>>>> SIL
>>>> LAU
>>>> COU
>>>> BRT
>>>> SIG
>>>> 6) language model
>>>> build the file including all the utterances with start and end signs.
>>>> <s>IT WOULD NEED TO IT'S TAKEN US WHAT AN HOUR TO GET THIS SET UP
>>>> AGAIN</s>
>>>> <s>I SUPPOSE THAT'S RIGHT</s>
>>>> <s>POPPY AND IT IS A FRIDAY AFTERNOON</s>
>>>> <s>AND A THERE STILL</s>
>>>> <s>TIME FOR ME TO GO CHRISTMAS SHOPPING</s>
>>>> using
>>>> export IRSTLM=../../../tools/irstlm
>>>>
>>>> ../../../tools/irstlm/bin/build-lm.sh -i data/local/dict/sentence -o
>>>> train.lm
>>>>
>>>> ../../../tools/irstlm/bin/compile-lm train.lm.gz train.arpa
>>>> gzip -c train.arpa > train.arpa.gz
>>>> A few lines from the language model:
>>>> -4.52257 <s>WHICH -0.425969
>>>> -3.44339 SAID -0.357511
>>>> -2.93711 HE -0.391207
>>>> -4.22155 LUNCH -0.39794
>>>> -4.22155 CATHY</s>
>>>> -4.52257 <s>AS -0.514105
>>>> -3.86936 OUGHT -0.954243
>>>> -2.95437 WHO -0.390935
>>>> -4.22155 CATHY -0.30103
>>>> And then using
>>>>
>>>> gunzip -c train.arpa.gz | utils/find_arpa_oovs.pl data/lang/words.txt
>>>>  > tmp/oovs.txt
>>>> gunzip -c train.arpa.gz | grep -v '<s> <s>' | grep -v '</s> <s>' | grep
>>>> -v '</s> </s>' | ../../../src/bin/arpa2fst - | fstprint | utils/
>>>> remove_oovs.pl ./tmp/oovs.txt | utils/eps2disambig.pl | utils/s2eps.pl
>>>> | fstcompile --isymbols=./data/lang/words.txt
>>>> --osymbols=./data/lang/words.txt --keep_isymbols=false
>>>> --keep_osymbols=false | fstrmepsilon > data/lang/G.fst
>>>> To get G.fst.
>>>> Finally, using
>>>> utils/prepare_lang.sh data/local/dict “<LAUGH>” data/local/lang
>>>> data/lang
>>>> 7) extract features:
>>>> Using:
>>>> steps/make_mfcc.sh --nj 20 data/train exp/make_mfcc/train mfcc
>>>> steps/compute_cmvn_stats.sh data/train exp/make_mfcc/train mfcc
>>>>
>>>> steps/make_mfcc.sh --nj 20 data/test exp/make_mfcc/test mfcc
>>>> steps/compute_cmvn_stats.sh data/test exp/make_mfcc/test mfcc
>>>>
>>>> 2. Training:
>>>> utils/subset_data_dir.sh data/train 1000 data/train_1k
>>>> steps/train_mono.sh --nj 10 data/train_1k data/lang exp/mono_1k
>>>> and I got such prompt:
>>>> 205 warnings in exp/mono_1k/log/update.*.log
>>>> 452 warnings in exp/mono_1k/log/align.*.*.log
>>>> 26 warnings in exp/mono_1k/log/acc.*.*.log
>>>> Done
>>>> I checked the log files, here are some examples:
>>>> acc.1.3.log:WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79)
>>>> No alignment for utterance S035-U-113800-119310
>>>> align.5.5.log: WARNING
>>>> (gmm-boost-silence:main():gmm-boost-silence.cc:82) The pdfs for the silence
>>>> phones may be shared by other phones (note: this probably does not matter.)
>>>> update.10.log: WARNING
>>>> (gmm-est:MleDiagGmmUpdate():mle-diag-gmm.cc:362) Gaussian has too little
>>>> data but not removing it because it is the last Gaussian: i = 0, occ = 0,
>>>> weight = 1 WARNING (gmm-est:MleDiagGmmUpdate():mle-diag-gmm.cc:362)
>>>> Gaussian has too little data but not removing it because it is the last
>>>> Gaussian: i = 0, occ = 0, weight = 1 WARNING
>>>> (gmm-est:MleDiagGmmUpdate():mle-diag-gmm.cc:362) Gaussian has too little
>>>> data but not removing it because it is the last Gaussian: i = 0, occ = 0,
>>>> weight = 1 WARNING (gmm-est:MleDiagGmmUpdate():mle-diag-gmm.cc:362)
>>>> Gaussian has too little data but not removing it because it is the last
>>>> Gaussian: i = 0, occ = 0, weight = 1 WARNING
>>>> (gmm-est:MleDiagGmmUpdate():mle-diag-gmm.cc:362) Gaussian has too little
>>>> data but not removing it because it is the last Gaussian: i = 0, occ = 0,
>>>> weight = 1
>>>>
>>>> 3. Decoding:
>>>> using:
>>>> utils/mkgraph.sh --mono data/lang exp/mono_1k exp/mono_1k/graph
>>>> steps/decode.sh exp/mono_1k/graph data/test exp/mono_1k/decode
>>>> and all parameters are following the default configuration in the rm/s5
>>>> recipe.
>>>> Here is the result:
>>>> %WER 830.95 [ 5451 / 656, 4914 ins, 5 del, 532 sub ]
>>>> exp/mono_1k/decode/wer_13
>>>> seems something definitely went wrong.
>>>>
>>>> Can you please help me out here? Thank you so much for your time and sorry
>>>> for the long content.
>>>>
>>>> Please tell me if you need more information.
>>>>
>>>> Again, thank you so much.
>>>>
>>>> Best regards,
>>>>
>>>> Zibo
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Want fast and easy access to all the code in your enterprise? Index and
>>>> search up to 200,000 lines of code with a free copy of Black Duck
>>>> Code Sight - the same software that powers the world's largest code
>>>> search on Ohloh, the Black Duck Open Hub! Try it now.
>>>> http://p.sf.net/sfu/bds
>>>> _______________________________________________
>>>> Kaldi-users mailing list
>>>> Kal...@li...
>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>>
>>>>
>>>
>>
>