|
From: Zibo M. <mzb...@gm...> - 2014-08-05 01:00:34
|
Hi all, I have built the segments files like: S002-U-000300-001280 S002-U 0.30 1.28 S002-U-001370-002780 S002-U 1.37 2.78 S002-U-005640-006030 S002-U 5.64 6.03 S002-U-006090-008680 S002-U 6.09 8.68 and the feats.lengths is like: S002-U-000300-001280 96 S002-U-001370-002780 139 S002-U-005640-006030 37 S002-U-006090-008680 257 as well as the wav.scp file which is like: S002-U ~/002.wav Other settings are like before. The result is still bad. %WER 100.46 [ 652 / 649, 16 ins, 250 del, 386 sub ] [PARTIAL] exp/mono.4k/decode/wer_9 Any suggestions? Tell me if you need more information. Thank you very much. Zibo On Mon, Aug 4, 2014 at 3:37 PM, Daniel Povey <dp...@gm...> wrote: > The problem could be what Karel mentioned, that your training sentences > are too long and too few in number, and your model initialization failed to > converge well. > Or it could be a scoring problem. That output actually looked reasonable > to me. It could be that your segments and wav.scp files are wrong similar > to what Karel was saying, that your data has a segmentation of the wav > files into utterances, but you are somehow getting the data preparation > wrong and making the utterances correspond to entire wav files. > Dan > > > > On Mon, Aug 4, 2014 at 3:34 PM, Zibo Meng <mzb...@gm...> wrote: > >> Dear Dr. Povey, >> >> I fixed the problem you mentioned to add space between the <s> and >> sentence & between the sentence and the </s>, but the result is still bad >> as follows: >> %WER 899.39 [ 5900 / 656, 5361 ins, 7 del, 532 sub ] >> exp/mono_1k/decode/wer_13 >> >> The decoding took about 19 hours, and here are first a few lines from the >> log file for your consideration: >> >> # gmm-latgen-faster --max-active=7000 --beam=13.0 --lattice-beam=6.0 >> --acoustic-scale=0.083333 --allow-partial=true >> --word-symbol-table=exp/mono_1k/graph/words.txt exp/mono_1k/final.mdl >> exp/mono_1k/graph/HCLG.fst "ark,s,cs:apply-cmvn >> --utt2spk=ark:data/test/split1/1/utt2spk scp:data/test/split1/1/cmvn.scp >> scp:data/test/split1/1/feats.scp ark:- | add-deltas ark:- ark:- |" >> "ark:|gzip -c > exp/mono_1k/decode/lat.1.gz" >> # Started at Sun Aug 3 19:40:54 EDT 2014 >> # >> gmm-latgen-faster --max-active=7000 --beam=13.0 --lattice-beam=6.0 >> --acoustic-scale=0.083333 --allow-partial=true >> --word-symbol-table=exp/mono_1k/graph/words.txt exp/mono_1k/final.mdl >> exp/mono_1k/graph/HCLG.fst 'ark,s,cs:apply-cmvn >> --utt2spk=ark:data/test/split1/1/utt2spk scp:data/test/split1/1/cmvn.scp >> scp:data/test/split1/1/feats.scp ark:- | add-deltas ark:- ark:- |' >> 'ark:|gzip -c > exp/mono_1k/decode/lat.1.gz' >> add-deltas ark:- ark:- >> apply-cmvn --utt2spk=ark:data/test/split1/1/utt2spk >> scp:data/test/split1/1/cmvn.scp scp:data/test/split1/1/feats.scp ark:- >> S002-U-000300-001280 I JUST WANT TO SPEAK TO SOMEONE ELSE IS GOING ON IN >> YOUR APPROACH TO YOUR EMOTIONS CAN OFTEN DRIVE POSITIVE SIDE OF THINGS MAKE >> YOU FEEL A BIT OF A JOKER ARE YOU FROM AMERICA OR AUSTRALIA OR A BIT OF A >> LOT A LOT OF ADVERTISEMENTS BUT ALSO I'LL BE ABLE TO ORDER THINGS OFF A >> LOG >> (gmm-latgen-faster:RebuildRepository():determinize-lattice-pruned.cc:289) >> Rebuilding repository. >> LOG >> (gmm-latgen-faster:RebuildRepository():determinize-lattice-pruned.cc:289) >> Rebuilding repository. >> LOG >> (gmm-latgen-faster:RebuildRepository():determinize-lattice-pruned.cc:289) >> Rebuilding repository. >> WARNING >> (gmm-latgen-faster:CheckMemoryUsage():determinize-lattice-pruned.cc:322) >> Did not reach requested beam in determinize-lattice: size exceeds maximum >> 50000000 bytes; (repo,arcs,elems) = (38313888,123232,11676216), after >> rebuilding, repo size was 29163264, effective beam was 3.11803 vs. >> requested beam 6 >> WARNING >> (gmm-latgen-faster:DecodeUtteranceLatticeFaster():lattice-faster-decoder.cc:968) >> Determinization finished earlier than the beam for utterance >> S002-U-000300-001280 >> LOG >> (gmm-latgen-faster:DecodeUtteranceLatticeFaster():lattice-faster-decoder.cc:980) >> Log-like per frame for utterance S002-U-000300-001280 is -8.46 over 30725 >> frames. >> >> Thank you so much for your help. >> >> Best, >> >> Zibo >> >> >> >> On Sun, Aug 3, 2014 at 6:23 PM, Daniel Povey <dp...@gm...> wrote: >> >>> Something that I can see is wrong is that you don't have a space between >>> the <s> and the text, or the text and the </s>, when preparing the language >>> modeling data. >>> If it still doesn't work, it will help if you show us what things are >>> being decoded as, (e.g. the first 20 lines or so of one of the decode.*.log >>> files). >>> >>> Dan >>> >>> >>> >>> On Sun, Aug 3, 2014 at 6:18 PM, Zibo Meng <mzb...@gm...> wrote: >>> >>>> Hi, >>>> >>>> Sorry to bother you, but I got very bad decoding results and here are >>>> some steps and files I used to train and decode. Any suggestions will be >>>> appropriated. >>>> >>>> 1. Data preparation: >>>> 1) Training and testing data: >>>> I have 282 wav files whose properties are like: >>>> Duration: 00:07:40.31, bitrate: 256 kb/s >>>> Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, >>>> s16, 256 kb/s >>>> I used all of them as the training data and one of them as the test >>>> data. >>>> 2) utterance file: >>>> the first few lines of utterance file (text): >>>> for training data: >>>> S002-O-010830-011320 IT WON'T >>>> S002-O-011390-012880 ACTUALLY MAKE ANYTHING >>>> S002-O-012940-014010 WORSE IN THE LONG RUN >>>> S002-O-014100-015240 EVERYTHING WILL TURN OUT >>>> S002-O-015260-016470 FINE YOU'LL SEE >>>> for testing data: >>>> S002-U-016710-020300 IT WOULD NEED TO IT'S TAKEN US WHAT AN HOUR TO GET >>>> THIS SET UP AGAIN >>>> S002-U-024250-025580 I SUPPOSE THAT'S RIGHT >>>> S002-U-025600-027630 POPPY AND IT IS A FRIDAY AFTERNOON >>>> S002-U-027910-029580 AND A THERE STILL >>>> S002-U-029640-031620 TIME FOR ME TO GO CHRISTMAS SHOPPING >>>> 3) utterance to speakers >>>> the first lines of utt2spk file are like: >>>> for training data: >>>> S002-O-003080-003530 002-O >>>> S002-O-003620-004510 002-O >>>> S002-O-004700-005730 002-O >>>> S002-O-009260-010480 002-O >>>> S002-O-010830-011320 002-O >>>> for testing data: >>>> S002-U-000300-001280 002-U >>>> S002-U-001370-002780 002-U >>>> S002-U-005640-006030 002-U >>>> S002-U-006090-008680 002-U >>>> S002-U-016710-020300 002-U >>>> 4) scp file: >>>> S002-O-003080-003530 ~/data/train/001 O.wav >>>> S002-O-003620-004510 ~/data/train/001 O.wav >>>> S002-O-004700-005730 ~/data/train/001 O.wav >>>> S002-O-009260-010480 ~/data/train/001 O.wav >>>> S002-O-010830-011320 ~/data/train/001 O.wav >>>> 5) files in dict >>>> extra_questions.txt -- empty >>>> lexicon.txt: >>>> I'D AY1 D >>>> LIKE L AY1 K >>>> TO T UW1 >>>> TO T IH0 >>>> TO T AH0 >>>> nonsilence_phones.txt >>>> AA AA0 AA1 AA2 >>>> AE AE0 AE1 AE2 >>>> AH AH0 AH1 AH2 >>>> AO AO0 AO1 AO2 >>>> AW AW0 AW1 AW2 >>>> optional_silence.txt >>>> SIL >>>> silence_phones.txt >>>> SIL >>>> LAU >>>> COU >>>> BRT >>>> SIG >>>> 6) language model >>>> build the file including all the utterances with start and end signs. >>>> <s>IT WOULD NEED TO IT'S TAKEN US WHAT AN HOUR TO GET THIS SET UP >>>> AGAIN</s> >>>> <s>I SUPPOSE THAT'S RIGHT</s> >>>> <s>POPPY AND IT IS A FRIDAY AFTERNOON</s> >>>> <s>AND A THERE STILL</s> >>>> <s>TIME FOR ME TO GO CHRISTMAS SHOPPING</s> >>>> using >>>> export IRSTLM=../../../tools/irstlm >>>> >>>> ../../../tools/irstlm/bin/build-lm.sh -i data/local/dict/sentence -o >>>> train.lm >>>> >>>> ../../../tools/irstlm/bin/compile-lm train.lm.gz train.arpa >>>> gzip -c train.arpa > train.arpa.gz >>>> A few lines from the language model: >>>> -4.52257 <s>WHICH -0.425969 >>>> -3.44339 SAID -0.357511 >>>> -2.93711 HE -0.391207 >>>> -4.22155 LUNCH -0.39794 >>>> -4.22155 CATHY</s> >>>> -4.52257 <s>AS -0.514105 >>>> -3.86936 OUGHT -0.954243 >>>> -2.95437 WHO -0.390935 >>>> -4.22155 CATHY -0.30103 >>>> And then using >>>> >>>> gunzip -c train.arpa.gz | utils/find_arpa_oovs.pl data/lang/words.txt >>>> > tmp/oovs.txt >>>> gunzip -c train.arpa.gz | grep -v '<s> <s>' | grep -v '</s> <s>' | grep >>>> -v '</s> </s>' | ../../../src/bin/arpa2fst - | fstprint | utils/ >>>> remove_oovs.pl ./tmp/oovs.txt | utils/eps2disambig.pl | utils/s2eps.pl >>>> | fstcompile --isymbols=./data/lang/words.txt >>>> --osymbols=./data/lang/words.txt --keep_isymbols=false >>>> --keep_osymbols=false | fstrmepsilon > data/lang/G.fst >>>> To get G.fst. >>>> Finally, using >>>> utils/prepare_lang.sh data/local/dict “<LAUGH>” data/local/lang >>>> data/lang >>>> 7) extract features: >>>> Using: >>>> steps/make_mfcc.sh --nj 20 data/train exp/make_mfcc/train mfcc >>>> steps/compute_cmvn_stats.sh data/train exp/make_mfcc/train mfcc >>>> >>>> steps/make_mfcc.sh --nj 20 data/test exp/make_mfcc/test mfcc >>>> steps/compute_cmvn_stats.sh data/test exp/make_mfcc/test mfcc >>>> >>>> 2. Training: >>>> utils/subset_data_dir.sh data/train 1000 data/train_1k >>>> steps/train_mono.sh --nj 10 data/train_1k data/lang exp/mono_1k >>>> and I got such prompt: >>>> 205 warnings in exp/mono_1k/log/update.*.log >>>> 452 warnings in exp/mono_1k/log/align.*.*.log >>>> 26 warnings in exp/mono_1k/log/acc.*.*.log >>>> Done >>>> I checked the log files, here are some examples: >>>> acc.1.3.log:WARNING (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:79) >>>> No alignment for utterance S035-U-113800-119310 >>>> align.5.5.log: WARNING >>>> (gmm-boost-silence:main():gmm-boost-silence.cc:82) The pdfs for the silence >>>> phones may be shared by other phones (note: this probably does not matter.) >>>> update.10.log: WARNING >>>> (gmm-est:MleDiagGmmUpdate():mle-diag-gmm.cc:362) Gaussian has too little >>>> data but not removing it because it is the last Gaussian: i = 0, occ = 0, >>>> weight = 1 WARNING (gmm-est:MleDiagGmmUpdate():mle-diag-gmm.cc:362) >>>> Gaussian has too little data but not removing it because it is the last >>>> Gaussian: i = 0, occ = 0, weight = 1 WARNING >>>> (gmm-est:MleDiagGmmUpdate():mle-diag-gmm.cc:362) Gaussian has too little >>>> data but not removing it because it is the last Gaussian: i = 0, occ = 0, >>>> weight = 1 WARNING (gmm-est:MleDiagGmmUpdate():mle-diag-gmm.cc:362) >>>> Gaussian has too little data but not removing it because it is the last >>>> Gaussian: i = 0, occ = 0, weight = 1 WARNING >>>> (gmm-est:MleDiagGmmUpdate():mle-diag-gmm.cc:362) Gaussian has too little >>>> data but not removing it because it is the last Gaussian: i = 0, occ = 0, >>>> weight = 1 >>>> >>>> 3. Decoding: >>>> using: >>>> utils/mkgraph.sh --mono data/lang exp/mono_1k exp/mono_1k/graph >>>> steps/decode.sh exp/mono_1k/graph data/test exp/mono_1k/decode >>>> and all parameters are following the default configuration in the rm/s5 >>>> recipe. >>>> Here is the result: >>>> %WER 830.95 [ 5451 / 656, 4914 ins, 5 del, 532 sub ] >>>> exp/mono_1k/decode/wer_13 >>>> seems something definitely went wrong. >>>> >>>> Can you please help me out here? Thank you so much for your time and sorry >>>> for the long content. >>>> >>>> Please tell me if you need more information. >>>> >>>> Again, thank you so much. >>>> >>>> Best regards, >>>> >>>> Zibo >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Want fast and easy access to all the code in your enterprise? Index and >>>> search up to 200,000 lines of code with a free copy of Black Duck >>>> Code Sight - the same software that powers the world's largest code >>>> search on Ohloh, the Black Duck Open Hub! Try it now. >>>> http://p.sf.net/sfu/bds >>>> _______________________________________________ >>>> Kaldi-users mailing list >>>> Kal...@li... >>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >>>> >>>> >>> >> > |