Menu

LM format issues?

2013-07-24
2013-07-28
  • Sundararajan Srinivasan

    I think Bavieca is a cool project, with a lot of potential.

    I was able to install this fairly easily. But on a toy data setup modified from $BAVIECA/tasks/wsj/scripts/test/20k/, when I run decode.pl, I get the following error:

    dynamicdecoder: ../common/base/LMFSM.cpp:545: int Bavieca::LMFSM::updateLMState(int, int, float*): Assertion `arcBackoff->iLexUnit == 2147483647' failed.
    Aborted (core dumped)

    Details:
    AM=WSJ SI acoustic model from this site
    phoneSet=$BAVIECA/tasks/wsj/scripts/test/20k/languageModel/CMU/phoneset.txt
    fileLexicon=$BAVIECA/tasks/wsj/scripts/test/20k/languageModel/CMU/lexicon20k.txt
    fileLM=$BAVIECA/tasks/wsj/scripts/test/20k/languageModel/digit.arpa

    Content of digit.arpa:

    \data\ ngram 1=14

    \1-grams:
    -0.373581 </s>
    -99 <s>
    -1.318063 EIGHT
    -1.318063 FIVE
    -1.318063 FOUR
    -1.318063 NINE
    -1.318063 OH
    -1.318063 ONE
    -1.318063 SEVEN
    -1.318063 SIX
    -1.318063 TEN
    -1.318063 THREE
    -1.318063 TWO
    -1.318063 ZERO

    \end\

    I also compiled this grammar using lmfsm and used with languageModel.format=FSM, but got some other error.

    Am I doing something wrong here?

    Thanks,
    -Sundar.

     
  • Daniel Bolanos

    Daniel Bolanos - 2013-07-27

    Hello Sundar,

    The problem is that your lexicon does not match the language model (you need to have the same words in the lexicon and in the language model). Ideally the toolkit would check that, or would ignore words in the lexicon that are not represented by unigrams in the language model. I will add that feature to the next release.

    For the moment you can just change your lexicon to match the words in digit.arpa

    Dani

     
    • Sundararajan Srinivasan

      Hi Dani,

      Great! Once I modified the lexicon to contain all and only the unigrams in LM, the decoder works.

      Thanks,
      -Sundar.

       
  • Daniel Bolanos

    Daniel Bolanos - 2013-07-28

    I'm glad to hear that! :)

    Dani

     

Log in to post a comment.