From: Xavier A. <xan...@gm...> - 2013-12-29 18:31:46
|
Hi, I doublechecked that the svn was done correctly (if it was not before, it is indeed now) and everything looks the same, with the same problem as I reported above. Comparing the error I get with another (successful) run on another database I am suspicious of the LG.fst FST file. Is there a way to check it? Thanks, X. On Sat, Dec 28, 2013 at 9:37 PM, Daniel Povey <dp...@gm...> wrote: > Could it be that you did "svn up" only in the fstbin directory and not in > src? Do "svn up" in src/, and see if you get further updates. > Dan > > > > On Sat, Dec 28, 2013 at 12:35 PM, Xavier Anguera <xan...@gm...>wrote: > >> Sure, here it is: >> >> The error: >> # utils/mkgraph.sh --mono data/lang_test_phn-mono exp/mono >> exp/mono/graph_phn >> # Started at Sat Dec 28 20:47:57 CET 2013 >> # >> fstminimizeencoded >> fsttablecompose data/lang_test_phn-mono/L_disambig.fst >> data/lang_test_phn-mono/G.fst >> fstdeterminizestar --use-log=true >> fstisstochastic data/lang_test_phn-mono/tmp/LG.fst >> 0.000358155 -0.000356635 >> fstcomposecontext --context-size=1 --central-position=0 >> --read-disambig-syms=data/lang_test_phn-mono/phones/disambig.int--write-disambig-syms=data/lang_test_ >> phn-mono/tmp/disambig_ilabels_1_0.intdata/lang_test_phn-mono/tmp/ilabels_1_0 >> WARNING (fstcomposecontext:main():fstcomposecontext.cc:130) >> Disambiguation symbols list is empty; this likely indicates an error in >> data preparation. >> fstcomposecontext: ../fstext/context-fst-inl.h:105: >> fst::ContextFstImpl<Arc, LabelT>::ContextFstImpl(typename Arc::Label, const >> std::vector<B, std::allocator< >> _T2> >&, const std::vector<B, std::allocator<_T2> >&, int, int) [with Arc >> = fst::ArcTpl<fst::TropicalWeightTpl<float> >, LabelT = int]: Assertion >> `subsequenti >> al_symbol != 0 && disambig_syms_.count(subsequential_symbol) == 0 && >> phone_syms_.count(subsequential_symbol) == 0' failed. >> utils/mkgraph.sh: line 76: 7661 Aborted >> fstcomposecontext --context-size=$N --central-position=$P >> --read-disambig-syms=$lang/phones/disambig. >> int --write-disambig-syms=$lang/tmp/disambig_ilabels_${N}_${P}.int >> $lang/tmp/ilabels_${N}_${P} < $lang/tmp/LG.fst > $clg >> fstisstochastic data/lang_test_phn-mono/tmp/CLG_1_0.fst >> ERROR: FstHeader::Read: Bad FST header: >> data/lang_test_phn-mono/tmp/CLG_1_0.fst >> ERROR (fstisstochastic:ReadFstKaldi():fstext/fstext-utils-inl.h:1183) >> Reading FST: error reading FST header from >> data/lang_test_phn-mono/tmp/CLG_1_0.fst >> ERROR (fstisstochastic:ReadFstKaldi():fstext/fstext-utils-inl.h:1183) >> Reading FST: error reading FST header from >> data/lang_test_phn-mono/tmp/CLG_1_0.fst >> >> The execution of gdb: >> (gdb) where >> #0 0x00007ffff6be9475 in *__GI_raise (sig=<optimized out>) at >> ../nptl/sysdeps/unix/sysv/linux/raise.c:64 >> #1 0x00007ffff6bec6f0 in *__GI_abort () at abort.c:92 >> #2 0x00007ffff6be2621 in *__GI___assert_fail ( >> assertion=0x498448 "subsequential_symbol != 0 && >> disambig_syms_.count(subsequential_symbol) == 0 && >> phone_syms_.count(subsequential_symbol) == 0", >> file=<optimized out>, line=105, >> function=0x499700 "fst::ContextFstImpl<Arc, >> LabelT>::ContextFstImpl(typename Arc::Label, const std::vector<B, >> std::allocator<_T2> >&, const std::vector<B, std::allocator<_T2> >&, int, >> int) [with Arc = fst::ArcTpl<fst::T"...) at assert.c:81 >> #3 0x000000000045b419 in >> fst::ContextFstImpl<fst::ArcTpl<fst::TropicalWeightTpl<float> >, >> int>::ContextFstImpl (this=0x6bd520, subsequential_symbol=97, >> phone_syms=..., disambig_syms=..., N=1, P=0) at >> ../fstext/context-fst-inl.h:103 >> #4 0x0000000000457610 in >> fst::ContextFst<fst::ArcTpl<fst::TropicalWeightTpl<float> >, >> int>::ContextFst (this=0x7fffffffd100, subsequential_symbol=97, >> phones=..., disambig_syms=..., N=1, P=0) at >> ../fstext/context-fst.h:223 >> #5 0x0000000000455b95 in fst::ComposeContext (disambig_syms_in=..., N=1, >> P=0, ifst=0x6c5be0, ofst=0x7fffffffd390, ilabels_out=0x7fffffffd3a0) >> at ../fstext/context-fst-inl.h:522 >> #6 0x00000000004522a3 in main (argc=7, argv=0x7fffffffdaa8) at >> fstcomposecontext.cc:138 >> (gdb) up >> #1 0x00007ffff6bec6f0 in *__GI_abort () at abort.c:92 >> 92 abort.c: No such file or directory. >> (gdb) up >> #2 0x00007ffff6be2621 in *__GI___assert_fail ( >> assertion=0x498448 "subsequential_symbol != 0 && >> disambig_syms_.count(subsequential_symbol) == 0 && >> phone_syms_.count(subsequential_symbol) == 0", >> file=<optimized out>, line=105, >> function=0x499700 "fst::ContextFstImpl<Arc, >> LabelT>::ContextFstImpl(typename Arc::Label, const std::vector<B, >> std::allocator<_T2> >&, const std::vector<B, std::allocator<_T2> >&, int, >> int) [with Arc = fst::ArcTpl<fst::T"...) at assert.c:81 >> 81 assert.c: No such file or directory. >> (gdb) p subsequential_symbol >> No symbol "subsequential_symbol" in current context. >> (gdb) up >> #3 0x000000000045b419 in >> fst::ContextFstImpl<fst::ArcTpl<fst::TropicalWeightTpl<float> >, >> int>::ContextFstImpl (this=0x6bd520, subsequential_symbol=97, >> phone_syms=..., disambig_syms=..., N=1, P=0) at >> ../fstext/context-fst-inl.h:103 >> 103 assert(subsequential_symbol != 0 >> (gdb) p subsequential_symbol >> $1 = 97 >> (gdb) p disambig_syms_.count(subsequential_symbol) >> $2 = 0 >> (gdb) p phone_syms_.count(subsequential_symbol) >> $3 = 1 >> (gdb) p phone_syms_.size() >> $4 = 78 >> (gdb) p disambig_syms_.size() >> $5 = 0 >> >> >> Thanks >> >> X. >> >> >> >> On Sat, Dec 28, 2013 at 9:01 PM, Daniel Povey <dp...@gm...> wrote: >> >>> The same error should not have happened. Can you please do the same >>> steps in gdb as last time, and paste the screen from gdb? >>> Dan >>> >>> >>> >>> On Sat, Dec 28, 2013 at 11:49 AM, Xavier Anguera <xan...@gm...>wrote: >>> >>>> Dan, >>>> the same error occurred, just that now I got the extra Warning you >>>> inserted. >>>> Should I maybe modify the make_phone_bigram_lang.sh script to copy the >>>> current disambig.* files into the new lang directory? >>>> >>>> Thanks, >>>> >>>> X. >>>> >>>> >>>> >>>> On Sat, Dec 28, 2013 at 8:03 PM, Daniel Povey <dp...@gm...> wrote: >>>> >>>>> OK, then try running the script with the code fix I checked in. I >>>>> forgot about the existence of that script. Possibly it will work. I'll >>>>> have to modify validate_lang.pl in that case. >>>>> Dan >>>>> >>>>> >>>>> >>>>> On Sat, Dec 28, 2013 at 7:02 AM, Xavier Anguera <xan...@gm...>wrote: >>>>> >>>>>> Dan, >>>>>> there must be something I do not do correctly in my current setup, or >>>>>> you did not understand where my problem is. >>>>>> I am currently calling the script mkgraph.sh (that is crashing) in >>>>>> the following context: >>>>>> >>>>>> # Create phone-bigram grammar (unsmoothed) estimated from >>>>>> alignments >>>>>> utils/make_phone_bigram_lang.sh data/lang exp/mono_ali_all >>>>>> data/lang_test_phn-mono || exit 1; >>>>>> # Create phone recognition graph >>>>>> $train_cmd exp/mono/graph/mkgraph_phn.log \ >>>>>> utils/mkgraph.sh --mono data/lang_test_phn-mono exp/mono >>>>>> exp/mono/graph_phn || exit 1 >>>>>> >>>>>> As you can see, first the script make_phone_bigram_lang.sh is called, >>>>>> which takes as an input a lang directory and creates a "test" lang >>>>>> directory. Looking into this script I see that the disambig.* files are >>>>>> left empty in purpose in the new directory (they are not empty in the >>>>>> original lang directory, in fact, they have the #0 #1 values you proposed >>>>>> in the previous email). >>>>>> Then, when calling the mkgraph.sh script with this test_lang >>>>>> directory it complaints as stated in my previous emails. >>>>>> The question is then whether I should modify >>>>>> make_phone_bigram_lang.sh to copy the original disambig.* files or should I >>>>>> pass the original lang directory to the mkgraph.sh script, or am I doing >>>>>> something else very wrong? >>>>>> >>>>>> Thanks for your help. >>>>>> >>>>>> Xavier Anguera >>>>>> >>>>>> >>>>>> On Sat, Dec 28, 2013 at 1:43 AM, Daniel Povey <dp...@gm...>wrote: >>>>>> >>>>>>> OK, I just committed a fix because it should not have crashed at >>>>>>> that particular point in the code, but the underlying error is with your >>>>>>> lang directory. You do need to have the disambiguation symbols >>>>>>> "disambig.txt", with at least #0 and #1. You should probably be creating >>>>>>> the lang directory with the prepare_lang.sh script, and if not, at least >>>>>>> you should validate it with the validate_lang.pl script. Also, >>>>>>> there is no reason to have a separate "lang" directory for the monophone >>>>>>> setup, the same directory is valid for monophone or triphone setups. >>>>>>> >>>>>>> Dan >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Dec 27, 2013 at 4:18 PM, Xavier Anguera <xan...@gm...>wrote: >>>>>>> >>>>>>>> Dear Dan, >>>>>>>> thank you for your help. >>>>>>>> Next are the tests you asked me to perform: >>>>>>>> >>>>>>>> Running utils/validate_lang.pl data/lang_test_phn-mono/ gives: >>>>>>>> >>>>>>>> Checking data/lang_test_phn-mono//phones/roots.{txt, int} ... >>>>>>>> --> 30 entry/entries in data/lang_test_phn-mono//phones/roots.txt >>>>>>>> --> data/lang_test_phn-mono//phones/roots.int corresponds to >>>>>>>> data/lang_test_phn-mono//phones/roots.txt >>>>>>>> --> data/lang_test_phn-mono//phones/roots.{txt, int} are OK >>>>>>>> >>>>>>>> Checking data/lang_test_phn-mono//phones/sets.{txt, int} ... >>>>>>>> --> 30 entry/entries in data/lang_test_phn-mono//phones/sets.txt >>>>>>>> --> data/lang_test_phn-mono//phones/sets.int corresponds to >>>>>>>> data/lang_test_phn-mono//phones/sets.txt >>>>>>>> --> data/lang_test_phn-mono//phones/sets.{txt, int} are OK >>>>>>>> >>>>>>>> Checking data/lang_test_phn-mono//phones/extra_questions.{txt, int} >>>>>>>> ... >>>>>>>> --> 9 entry/entries in >>>>>>>> data/lang_test_phn-mono//phones/extra_questions.txt >>>>>>>> --> data/lang_test_phn-mono//phones/extra_questions.intcorresponds to data/lang_test_phn-mono//phones/extra_questions.txt >>>>>>>> --> data/lang_test_phn-mono//phones/extra_questions.{txt, int} are >>>>>>>> OK >>>>>>>> >>>>>>>> Checking disjoint: silence.txt, nosilenct.txt, disambig.txt ... >>>>>>>> --> silence.txt and nonsilence.txt are disjoint >>>>>>>> --> silence.txt and disambig.txt are disjoint >>>>>>>> --> disambig.txt and nonsilence.txt are disjoint >>>>>>>> --> disjoint property is OK >>>>>>>> >>>>>>>> Checking sumation: silence.txt, nonsilence.txt, disambig.txt ... >>>>>>>> --> ERROR: data/lang_test_phn-mono//phones/disambig.txt is empty or >>>>>>>> not exists >>>>>>>> >>>>>>>> Checking optional_silence.txt ... >>>>>>>> --> reading data/lang_test_phn-mono//phones/optional_silence.txt >>>>>>>> --> data/lang_test_phn-mono//phones/optional_silence.txt is OK >>>>>>>> >>>>>>>> Checking disambiguation symbols: #0 and #1 >>>>>>>> --> ERROR: data/lang_test_phn-mono//phones/disambig.txt is empty or >>>>>>>> not exists >>>>>>>> --> ERROR: data/lang_test_phn-mono//phones/disambig.txt doesn't >>>>>>>> have "#0" or "#1" >>>>>>>> Checking topo ... >>>>>>>> --> data/lang_test_phn-mono//topo's nonsilence section is OK >>>>>>>> --> data/lang_test_phn-mono//topo's silence section is OK >>>>>>>> --> data/lang_test_phn-mono//topo is OK >>>>>>>> >>>>>>>> Checking data/lang_test_phn-mono//oov.{txt, int} ... >>>>>>>> --> ERROR: fail to open data/lang_test_phn-mono//oov.txt >>>>>>>> >>>>>>>> --> ERROR >>>>>>>> >>>>>>>> Apparently I do not have either oov.txt nore disambig.txt >>>>>>>> Probably the test data I am using does not have any OOV in it. I >>>>>>>> can add it artificially, but I guess this is not the main problem here... >>>>>>>> regarding the disambig.txt file, what should it contain? >>>>>>>> >>>>>>>> I did run gdb as you indicated (thank you for such detailed info) >>>>>>>> and gives me: >>>>>>>> (gdb) p subsequential_symbol >>>>>>>> $1 = 97 >>>>>>>> (gdb) p disambig_syms_.count(subsequential_symbol) >>>>>>>> $2 = 0 >>>>>>>> (gdb) p phone_syms_.count(subsequential_symbol) >>>>>>>> $3 = 1 >>>>>>>> (gdb) p phone_syms_.size() >>>>>>>> $4 = 78 >>>>>>>> (gdb) p disambig_syms_.size() >>>>>>>> $5 = 0 >>>>>>>> >>>>>>>> Finally, the contents of cat data/lang_test_phn-mono/phones/ >>>>>>>> disambig.int is also empty. >>>>>>>> >>>>>>>> Thanks again for your help! >>>>>>>> >>>>>>>> yours, >>>>>>>> >>>>>>>> Xavier Anguera >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Dec 27, 2013 at 10:26 PM, Daniel Povey <dp...@gm...>wrote: >>>>>>>> >>>>>>>>> Could you please do the following. [apologies if you already know >>>>>>>>> gdb] >>>>>>>>> >>>>>>>>> First do utils/validate_lang.pl data/lang_test_phn-mono/ >>>>>>>>> and let me know if it fails. >>>>>>>>> If it doesn't fail, do: >>>>>>>>> >>>>>>>>> gdb --args fstcomposecontext --context-size=1 >>>>>>>>> --central-position=0 --read-disambig-syms=data/ >>>>>>>>> lang_test_phn-mono/phones/disambig.int --write-disambig-syms=data/ >>>>>>>>> lang_test_ >>>>>>>>> phn-mono/tmp/disambig_ilabels_1_0.int data/lang_test_phn-mono/tmp/ilabels_1_0 >>>>>>>>> data/lang_test_phn-mono/tmp/LG.fst >>>>>>>>> >>>>>>>>> (gdb) r >>>>>>>>> # wait till it crashes >>>>>>>>> # go up the stack by typing "up" until you get to the right frame; >>>>>>>>> type "down" if you go too far >>>>>>>>> >>>>>>>>> (gdb) p subsequential_symbol >>>>>>>>> (gdb) p disambig_syms_.count(subsequential_symbol) >>>>>>>>> (gdb) p phone_syms_.count(subsequential_symbol) >>>>>>>>> (gdb) p phone_syms_.size() >>>>>>>>> (gdb) p disambig_syms_.size() >>>>>>>>> (gdb) quit >>>>>>>>> >>>>>>>>> [I hope this works; sometimes it will fail because functions are >>>>>>>>> inlined]. >>>>>>>>> Anyway, send the output, and also >>>>>>>>> cat data/lang_test_phn-mono/phones/disambig.int >>>>>>>>> and show me that output too. >>>>>>>>> >>>>>>>>> Dan >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Dec 27, 2013 at 10:23 AM, Xavier Anguera < >>>>>>>>> xan...@gm...> wrote: >>>>>>>>> >>>>>>>>>> Dear all, >>>>>>>>>> I am encounering a problem when training mono-state NN using a >>>>>>>>>> recipe adapted from the SWBD S5 recipe. I am able to train, decode and >>>>>>>>>> phone-align a GMM system, but when I use these results to train the NN I >>>>>>>>>> get the following error (see below). I have used this recipe in the past to >>>>>>>>>> successfully train one ASR system and now the only difference is that I am >>>>>>>>>> trying to train a similar system using graphemes are phonemes (for which I >>>>>>>>>> have assigned the graphemes of the words as transcriptions to each word). >>>>>>>>>> Any help is appreciated. >>>>>>>>>> >>>>>>>>>> This is the beginning of the file exp/mono/graph/mkgraph_phn.log: >>>>>>>>>> >>>>>>>>>> # utils/mkgraph.sh --mono data/lang_test_phn-mono exp/mono >>>>>>>>>> exp/mono/graph_phn >>>>>>>>>> # Started at Fri Dec 27 18:57:19 CET 2013 >>>>>>>>>> # >>>>>>>>>> fsttablecompose data/lang_test_phn-mono/L_disambig.fst >>>>>>>>>> data/lang_test_phn-mono/G.fst >>>>>>>>>> fstdeterminizestar --use-log=true >>>>>>>>>> fstminimizeencoded >>>>>>>>>> fstisstochastic data/lang_test_phn-mono/tmp/LG.fst >>>>>>>>>> 0.000358155 -0.000356635 >>>>>>>>>> fstcomposecontext --context-size=1 --central-position=0 >>>>>>>>>> --read-disambig-syms=data/lang_test_phn-mono/phones/disambig.int--write-disambig-syms=data/lang_test_ >>>>>>>>>> phn-mono/tmp/disambig_ilabels_1_0.intdata/lang_test_phn-mono/tmp/ilabels_1_0 >>>>>>>>>> fstcomposecontext: ../fstext/context-fst-inl.h:105: >>>>>>>>>> fst::ContextFstImpl<Arc, LabelT>::ContextFstImpl(typename Arc::Label, const >>>>>>>>>> std::vector<B, std::allocator< >>>>>>>>>> _T2> >&, const std::vector<B, std::allocator<_T2> >&, int, int) >>>>>>>>>> [with Arc = fst::ArcTpl<fst::TropicalWeightTpl<float> >, LabelT = int]: >>>>>>>>>> Assertion `subsequenti >>>>>>>>>> al_symbol != 0 && disambig_syms_.count(subsequential_symbol) == 0 >>>>>>>>>> && phone_syms_.count(subsequential_symbol) == 0' failed. >>>>>>>>>> utils/mkgraph.sh: line 76: 6263 Aborted >>>>>>>>>> fstcomposecontext --context-size=$N --central-position=$P >>>>>>>>>> --read-disambig-syms=$lang/phones/disambig. >>>>>>>>>> int >>>>>>>>>> --write-disambig-syms=$lang/tmp/disambig_ilabels_${N}_${P}.int >>>>>>>>>> $lang/tmp/ilabels_${N}_${P} < $lang/tmp/LG.fst > $clg >>>>>>>>>> fstisstochastic data/lang_test_phn-mono/tmp/CLG_1_0.fst >>>>>>>>>> ERROR: FstHeader::Read: Bad FST header: >>>>>>>>>> data/lang_test_phn-mono/tmp/CLG_1_0.fst >>>>>>>>>> ERROR >>>>>>>>>> (fstisstochastic:ReadFstKaldi():fstext/fstext-utils-inl.h:1183) Reading >>>>>>>>>> FST: error reading FST header from data/lang_test_phn-mono/tmp/CLG_1_0.fst >>>>>>>>>> ERROR >>>>>>>>>> (fstisstochastic:ReadFstKaldi():fstext/fstext-utils-inl.h:1183) Reading >>>>>>>>>> FST: error reading FST header from data/lang_test_phn-mono/tmp/CLG_1_0.fst >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ------------------------------------------------------------------------------ >>>>>>>>>> Rapidly troubleshoot problems before they affect your business. >>>>>>>>>> Most IT >>>>>>>>>> organizations don't have a clear picture of how application >>>>>>>>>> performance >>>>>>>>>> affects their revenue. With AppDynamics, you get 100% visibility >>>>>>>>>> into your >>>>>>>>>> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of >>>>>>>>>> AppDynamics Pro! >>>>>>>>>> >>>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk >>>>>>>>>> _______________________________________________ >>>>>>>>>> Kaldi-developers mailing list >>>>>>>>>> Kal...@li... >>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > |