From: Daniel P. <dp...@gm...> - 2012-09-02 15:49:40
|
I notice it says Not creating raw N-gram counts ngrams.gz and heldout_ngrams.gz since they already exist in /data/gaoxinglong/kaldi/trunk/egs/timit/s3/local/lm/biphone I'm concerned that one of these files might be empty. [after unzipping], as one of the programs is reporting zero words. Perhaps if you rm -r /data/gaoxinglong/kaldi/trunk/egs/timit/s3/local/lm and then do it again it might work-- perhaps you created those files at a time when your training data didn't exist or something like that. Also there is nothing special about the kaldi_lm toolkit. If you use SRILM it will also produce an ARPA-format LM that Kaldi can use. Dan On Sat, Sep 1, 2012 at 10:20 AM, xinglong gao <gao...@gm...>wrote: > Hello, > Thank you very much first, and when I use timit database as train and test > database and I have gotten such basic data for lm training: > phones.txt , lexicon.txt and train_trans.txt as appendix this email > and when I use kaldi_lm to train biphone lm, and some thing wrong happened > as below: > > > this is the detailed log: > > Not installing the kaldi_lm toolkit since it is already there. > Creating phones file, and monophone lexicon (mapping phones to itself). > Creating biphone model > Training biphone language model in folder > /data/gaoxinglong/kaldi/trunk/egs/timit/s3/local/lm > Creating directory > /data/gaoxinglong/kaldi/trunk/egs/timit/s3/local/lm/biphone > Not creating raw N-gram counts ngrams.gz and heldout_ngrams.gz since they > already exist in /data/gaoxinglong/kaldi/trunk/egs/timit/s3/local/lm/biphone > (remove them if you want them regenerated) > Iteration 1/7 of optimizing discounting parameters > discount_ngrams: for n-gram order 1, D=0.400000, tau=0.675000 phi=2.000000 > discount_ngrams: for n-gram order 2, D=0.600000, tau=0.675000 phi=2.000000 > discount_ngrams: for n-gram order 3, D=0.800000, tau=0.825000 phi=2.000000 > interpolate_ngrams: 60 words in wordslist > Perplexity over 0.000000 words is nan > Perplexity over 0.000000 words (excluding 0.000000 OOVs) is nan > > real 0m0.017s > user 0m0.000s > sys 0m0.024s > discount_ngrams: for n-gram order 1, D=0.400000, tau=0.900000 phi=2.000000 > discount_ngrams: for n-gram order 2, D=0.600000, tau=0.900000 phi=2.000000 > discount_ngrams: for n-gram order 3, D=0.800000, tau=1.100000 phi=2.000000 > interpolate_ngrams: 60 words in wordslist > Perplexity over 0.000000 words is nan > Perplexity over 0.000000 words (excluding 0.000000 OOVs) is nan > discount_ngrams: for n-gram order 1, D=0.400000, tau=1.215000 phi=2.000000 > discount_ngrams: for n-gram order 2, D=0.600000, tau=1.215000 phi=2.000000 > discount_ngrams: for n-gram order 3, D=0.800000, tau=1.485000 phi=2.000000 > > real 0m0.019s > user 0m0.000s > sys 0m0.032s > interpolate_ngrams: 60 words in wordslist > Perplexity over 0.000000 words is nan > Perplexity over 0.000000 words (excluding 0.000000 OOVs) is nan > > real 0m0.016s > user 0m0.008s > sys 0m0.020s > Bad perplexities . at > /data/gaoxinglong/kaldi/trunk/egs/timit/s3/local/kaldi_lm/ > optimize_alpha.pl line 30. > > > and I have checked the value of perplexities, and its value is : "nan", I > don't know what is happened ? > > and I think the word_map may be wrong is it true? > > > > thanks > best regards! > > > > Xinglong Gao > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > > |