Menu

decode_biglm.sh No arc available in LM

Help
L-A
2015-07-11
2015-07-13
  • L-A

    L-A - 2015-07-11

    Hi,
    I have some troubles while using the script "decdeo_biglm.sh".


    time ./decode_biglm.sh.org tri2b_mmi_1200/graph_80k_173 lang_80k_tg173/G.fst lang_80k_tg172/G.fst test_utt tri2b_mmi_1200/test
    WARNING (gmm-latgen-biglm-faster:PropagateLm():decoder/lattice-biglm-faster-decoder.h:673) No arc available in LM (unlikely to be correct if a statistical language model); will not warn again
    KALDI_ASSERT: at gmm-latgen-biglm-faster:PruneForwardLinks:decoder/lattice-biglm-faster-decoder.h:394, failed: link_extra_cost == link_extra_cost


    The difference between newlm and oldlm is the cutoff size and the vocabulary size is the same.
    Should I remove the disambiguous symbol #0 after making G.fst ?
    Please give me some suggestions, thanks!

     
    • Gilles Boulianne

      Does your new lm contain all unigrams that were present in the old lm?
      Or maybe your words.txt file was changed inadvertently between compiling the old and the new lm?
      The decoder is looking for a word probability (following an history) and cannot find it in the new lm.
      A correct LM is supposed to provide such a word probability for any word that appears in the oldlm.

      Le 2015-07-11 à 10:54, L-A laou@users.sf.net a écrit :

      Hi,
      I have some troubles while using the script "decdeo_biglm.sh".

      time ./decode_biglm.sh.org tri2b_mmi_1200/graph_80k_173 lang_80k_tg173/G.fst lang_80k_tg172/G.fst test_utt tri2b_mmi_1200/test
      WARNING (gmm-latgen-biglm-faster:PropagateLm():decoder/lattice-biglm-faster-decoder.h:673) No arc available in LM (unlikely to be correct if a statistical language model); will not warn again
      KALDI_ASSERT: at gmm-latgen-biglm-faster:PruneForwardLinks:decoder/lattice-biglm-faster-decoder.h:394, failed: link_extra_cost == link_extra_cost

      The difference between newlm and oldlm is the cutoff size and the vocabulary size is the same.
      Should I remove the disambiguous symbol #0 after making G.fst ?
      Please give me some suggestions, thanks!

      decode_biglm.sh No arc available in LM

      Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/kaldi/discussion/1355348/

      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

       
  • L-A

    L-A - 2015-07-12

    hi,

    Thanks for your replies.
    I'm sure that the new lm contains all unigrams which are presented in the old lm.
    The words.txt is the same.
    I have successfully decoded new lm and old lm with same words.txt while using "gmm-latgen-faster" separately.
    But there still are some troubles with "gmm-latgen-biglm-faster".

     
    • Daniel Povey

      Daniel Povey - 2015-07-12

      How did you create the LM?
      Also, you might want to get it in a debugger and figure out which word
      it was complaining about. (It will be in integer but you can check in
      words.txt)
      Dan

      On Sat, Jul 11, 2015 at 10:28 PM, L-A laou@users.sf.net wrote:

      hi,

      Thanks for your replies.
      I'm sure that the new lm contains all unigrams which are presented in the
      old lm.
      The words.txt is the same.
      I have successfully decoded new lm and old lm with same words.txt while
      using "gmm-latgen-faster" separately.
      But there still are some troubles with "gmm-latgen-biglm-faster".


      decode_biglm.sh No arc available in LM


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/kaldi/discussion/1355348/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       
      • L-A

        L-A - 2015-07-13

        Hi,

        LM is created in ARPA format and changed into fst int the following steps.

        cat lm.arpa | \ grep -v ' ' | \ grep -v ' ' | \ grep -v ' ' | \ arpa2fst - | \
        fstprint | \ eps2disambig.pl |\
        s2eps.pl | \
        fstcompile --isymbols=words.txt \ --osymbols=words.txt \ --keep_isymbols=false --keep_osymbols=false | \ fstrmepsilon > G.fst

         
        • Daniel Povey

          Daniel Povey - 2015-07-13

          Yes, what I meant is, how did you create the ARPA-format LM? Also,
          make sure your Kaldi code is fully up to date; we do more checking
          recently in arpa2fst which might catch certain problems.
          You should get in a debugger and figure out which word is involved. Do
          gdb --args (program) (args)
          commands you will need include
          run (r)
          catch throw
          continue (c)
          up
          list
          print (e.g. print some-c++-expression)

          Dan

          On Mon, Jul 13, 2015 at 5:21 AM, L-A laou@users.sf.net wrote:

          Hi,

          LM is created in ARPA format and changed into fst int the following steps.

          cat lm.arpa | \ grep -v ' ' | \ grep -v ' ' | \ grep -v ' ' | \ arpa2fst - |
          \ fstprint | \ eps2disambig.pl |\ s2eps.pl | \ fstcompile --isymbols=words.txt \ --osymbols=words.txt \ --keep_isymbols=false --keep_osymbols=false | \ fstrmepsilon > G.fst


          decode_biglm.sh No arc available in LM


          Sent from sourceforge.net because you indicated interest in
          https://sourceforge.net/p/kaldi/discussion/1355348/

          To unsubscribe from further messages, please visit
          https://sourceforge.net/auth/subscriptions/