From: Andrew R. <an...@cs...> - 2012-06-24 13:29:02
|
Quite right. I was stepping through run.sh line by line, and must have missed path.sh that time. (what a time sink!) Thank you. (I'll update my version of the repo now to get s5.) -Andrew On Sun, Jun 24, 2012 at 4:41 AM, Vassil Panayotov <vas...@gm...> wrote: > Hi, > > I don't have much experience with wsj/s3 myself, but as far as I can > see the version in Kaldi's trunk has this environment variable set in > egs/wsj/s3/path.sh . I think if you "source" (. ./path.sh) this script > before running the rest of the recipe LC_ALL should be already set for > you. > By the way the currently recommended version of WSJ recipe is "s5", > which I think is stable already. > > Vassil > > On Sat, Jun 23, 2012 at 5:14 PM, Andrew Rosenberg <an...@cs...> wrote: >> Hi all, >> >> I've run into a problem with language model training in the wsj >> training recipe s3. During the LM training an error shows up that i'm >> not quite sure how to fix. >> >> Train LM 3gram min count >> Getting raw N-gram counts >> generating n grams >> discount_ngrams: for n-gram order 1, D=0.000000, tau=0.000000 phi=1.000000 >> discount_ngrams: for n-gram order 2, D=0.000000, tau=0.000000 phi=1.000000 >> discount_ngrams: for n-gram order 3, D=1.000000, tau=0.000000 phi=1.000000 >> error: histories are not in sorted order, "?? ۰" > "? ??" >> merge_ngrams: merge_ngrams.cc:141: void process_line(char*): Assertion >> `comp > 0 || (comp == 0 && >> entry.predicted.compare(stack.back().predicted) >= 0)' failed. >> >> Looking at the "error: ..." line in emacs rather than the console, to >> see what the '?' characters actually were, it was clear that the issue >> was with the sorting of the ngram tokens. >> >> The recipe runs error free until >> local/wsj_train_lms.sh >> >> within this the line that gives the problem (first) is >> >> train_lm.sh --arpa --lmtype 3gram-mincount $dir >> >> digging deeper into train_lm.sh the line that generates the error is >> >> gunzip -c $dir/train.gz | tail -n +$heldout_sents | \ >> get_raw_ngrams 3 | sort | uniq -c | uniq_to_ngrams | \ >> sort | discount_ngrams $subdir/config.get_ngrams | \ >> sort | merge_ngrams | gzip -c > $subdir/ngrams.gz >> >> Suspecting this is a problem with how sort is working, i tried sort >> -n to see if it would fix the issue to no avail. >> >> Ultimately the fix was to "export LC_ALL=C" to ensure POSIX style sorting. >> >> This is clearly an environment problem, but I figured you guys would >> want to know about it. I'm doing this in bash on CentOS. (if more >> environment information would be useful, let me know.) >> >> Thanks very much for putting this tool together. I'm really enjoying >> getting to know it. >> >> Andrew >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Kaldi-developers mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-developers |