Menu

Custom language model on already-build online-nnet2

Help
Orest
2015-06-12
2015-06-16
  • Orest

    Orest - 2015-06-12

    Hi, I'm trying to familiarize with the Kaldi toolkit so I'm trying the voxforge s5 training scripts using the voxforge data, and I tried the pre-built model situated at the end of http://kaldi.sourceforge.net/online_decoding.html on my own audio.

    My question is: can I use that model with my own language model or that would involve re-training the model?

    I see that in the graph/ folder there is a file words.txt. Does it make sense to create an ARPA Language model (that uses only the words contained in words.txt), convert it to the Finite State Transducer form of and proceed with decoding?

    I see 3 FST files in the graph/ folder (Ha.fst HCLGa.fst HCLG.fst), are all of those 3 files used in decoding?

     
    • Jan "yenda" Trmal

      If you have just a new language model (in arpa format), but the lexicon is
      the same, then you just use the script called arpa2G.sh (you can find that
      script it in some recipes in egs/), Babel for example.

      If the lexicon changed (but the set of phonemes is the same), then you will
      have to first regenerate L.fst, while making sure that the indices of the
      phonemes are the same as in the original L.fst (and perhaps there are some
      other conditions to fulfill as well). I'm not sure if there is a script
      which would help you with that (or which you could use as an example), I
      think I'd just start with make_lang.sh. After generating the L.fst you can
      go ahead and generate G.fst. It's been a while since I needed this, so I
      might not be completely right.

      If the lexicon adds new phones, then you would have to retrain.

      After generating G.fst (and possibly L.fst), you will have the re-generate
      the decoding graphs again and decode the audio again.

      y.

      y.

      On Fri, Jun 12, 2015 at 7:37 AM, Orest ori553@users.sf.net wrote:

      Hi, I'm trying to familiarize with the Kaldi toolkit so I'm trying the
      voxforge s5 training scripts using the voxforge data, and I tried the
      pre-built model situated at the end of
      http://kaldi.sourceforge.net/online_decoding.html on my own audio.

      My question is: can I use that model with my own language model or that
      would involve re-training the model?

      I see that in the graph/ folder there is a file words.txt. Does it make
      sense to create an ARPA Language model (that uses only the words contained
      in words.txt), convert it to the Finite State Transducer form of and
      proceed with decoding?

      I see 3 FST files in the graph/ folder (Ha.fst HCLGa.fst HCLG.fst), are
      all of those 3 files used in decoding?


      Custom language model on already-build online-nnet2


      Sent from sourceforge.net because you indicated interest in <
      https://sourceforge.net/p/kaldi/discussion/1355348/>

      To unsubscribe from further messages, please visit <
      https://sourceforge.net/auth/subscriptions/>

       
      • Daniel Povey

        Daniel Povey - 2015-06-12

        Yes, prepare_lang.sh can be used for this, you need to use the
        --phone-symbol-table option to ensure the generated phones.txt is
        compatible.
        Of course, the lexicon needs to have been prepared with the same
        conventions as the baseline lexicon- this means you can add new word
        and maybe take away words, but not change the pronunciations of
        existing words, and to use the same conventions as the existing words.
        Dan

        On Fri, Jun 12, 2015 at 9:46 AM, Jan jtrmal@users.sf.net wrote:

        ERROR! The markdown supplied could not be parsed correctly. Did you forget
        to surround a code snippet with "~~~~"?

        If you have just a new language model (in arpa format), but the lexicon is
        the same, then you just use the script called arpa2G.sh (you can find that
        script it in some recipes in egs/), Babel for example.

        If the lexicon changed (but the set of phonemes is the same), then you will
        have to first regenerate L.fst, while making sure that the indices of the
        phonemes are the same as in the original L.fst (and perhaps there are some
        other conditions to fulfill as well). I'm not sure if there is a script
        which would help you with that (or which you could use as an example), I
        think I'd just start with make_lang.sh. After generating the L.fst you can
        go ahead and generate G.fst. It's been a while since I needed this, so I
        might not be completely right.

        If the lexicon adds new phones, then you would have to retrain.

        After generating G.fst (and possibly L.fst), you will have the re-generate
        the decoding graphs again and decode the audio again.

        y.

        y.

        On Fri, Jun 12, 2015 at 7:37 AM, Orest ori553@users.sf.net wrote:

        Hi, I'm trying to familiarize with the Kaldi toolkit so I'm trying the
        voxforge s5 training scripts using the voxforge data, and I tried the
        pre-built model situated at the end of
        http://kaldi.sourceforge.net/online_decoding.html on my own audio.

        My question is: can I use that model with my own language model or that
        would involve re-training the model?

        I see that in the graph/ folder there is a file words.txt. Does it make
        sense to create an ARPA Language model (that uses only the words contained
        in words.txt), convert it to the Finite State Transducer form of and
        proceed with decoding?

        I see 3 FST files in the graph/ folder (Ha.fst HCLGa.fst HCLG.fst), are
        all of those 3 files used in decoding?


        [Custom language model on already-build online-nnet2](

        https://sourceforge.net/p/kaldi/discussion/1355348/thread/4ea4ee8a/?limit=25#b9ce
        )


        Sent from sourceforge.net because you indicated interest in <
        https://sourceforge.net/p/kaldi/discussion/1355348/>

        To unsubscribe from further messages, please visit <
        https://sourceforge.net/auth/subscriptions/>


        Custom language model on already-build
        online-nnet2


        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/kaldi/discussion/1355348/

        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/

         
  • Orest

    Orest - 2015-06-16

    Thanks for your replies "Jan "yenda" Trmal" and "Dan". I tried the arpa2G.sh script (with the idea of using the same Lexicon) but I am having some issues:

    I created a very small ARPA language model using MITLM, (random words, just for testing) by using only the words contained in words.txt, this is the ARPA LM:

    http://pastebin.com/JftmN4yr

    I then proceeded with:

    ./arpa2G.sh textARPA.lm /path-to-already-built-online-nnet2/graph /Destination-Directory-where-I-Wanted-G.fst
    

    the Scripts outputs:

    arpa2fst - 
    Processing 1-grams
    Processing 2-grams
    Processing 3-grams
    Connected 0 states without outgoing arcs.
    fstisstochastic /home/TransformArpaToFST2/G.fst 
    -8.56817e-07 -0.169189
    

    Which looks like a reasonable output to me. I fstprint the newly-created G.fst and I got;

    http://pastebin.com/5frRpfEq

    I then decode my wav file with the command suggested in http://kaldi.sourceforge.net/online_decoding.html (at the end), this is the suggested command:

    ~/kaldi-online/src/online2bin/online2-wav-nnet2-latgen-faster --do-endpointing=false \
        --online=false \
        --config=nnet_a_gpu_online/conf/online_nnet2_decoding.conf \
        --max-active=7000 --beam=15.0 --lattice-beam=6.0 \
        --acoustic-scale=0.1 --word-symbol-table=graph/words.txt \
       nnet_a_gpu_online/smbr_epoch2.mdl graph/HCLG.fst "ark:echo utterance-id1 utterance-id1|" "scp:echo utterance-id1 ENG_M.wav|" \
       ark:/dev/null
    

    So, this time, instead of having as parameter graph/HCLG.fst I use the path to my newly-created G.fst, and I get:

    LOG (online2-wav-nnet2-latgen-faster:ComputeDerivedVars():ivector-extractor.cc:180) Computing derived variables for iVector extractor
    LOG (online2-wav-nnet2-latgen-faster:ComputeDerivedVars():ivector-extractor.cc:201) Done.
    KALDI_ASSERT: at online2-wav-nnet2-latgen-faster:TransitionIdToPdf:hmm/transition-model.h:316, failed: static_cast<size_t>(trans_id) < id2state_.size() && "Likely graph/model mismatch (graph built from wrong model?)"
    Stack trace is:
    kaldi::KaldiGetStackTrace()
    kaldi::KaldiAssertFailure_(char const*, char const*, int, char const*)
    kaldi::TransitionModel::TransitionIdToPdf(int) const
    kaldi::nnet2::DecodableNnet2Online::LogLikelihood(int, int)
    kaldi::LatticeFasterOnlineDecoder::ProcessEmitting(kaldi::DecodableInterface*)
    kaldi::LatticeFasterOnlineDecoder::AdvanceDecoding(kaldi::DecodableInterface*, int)
    kaldi::SingleUtteranceNnet2Decoder::AdvanceDecoding()
    /home/sites/Kaldi2Attempt/kaldi-trunk/src/online2bin/online2-wav-nnet2-latgen-faster(main+0xb67) [0x702c2d]
    /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd) [0x7f4861da6ead]
    /home/sites/Kaldi2Attempt/kaldi-trunk/src/online2bin/online2-wav-nnet2-latgen-faster() [0x701ad5]
    WARNING (online2-wav-nnet2-latgen-faster:~HashList():util/hash-list-inl.h:116) Possible memory leak: 1023 != 1024: you might have forgotten to call Delete on some Elems
    KALDI_ASSERT: at online2-wav-nnet2-latgen-faster:TransitionIdToPdf:hmm/transition-model.h:316, failed: static_cast<size_t>(trans_id) < id2state_.size() && "Likely graph/model mismatch (graph built from wrong model?)"
    Stack trace is:
    kaldi::KaldiGetStackTrace()
    kaldi::KaldiAssertFailure_(char const*, char const*, int, char const*)
    kaldi::TransitionModel::TransitionIdToPdf(int) const
    kaldi::nnet2::DecodableNnet2Online::LogLikelihood(int, int)
    kaldi::LatticeFasterOnlineDecoder::ProcessEmitting(kaldi::DecodableInterface*)
    kaldi::LatticeFasterOnlineDecoder::AdvanceDecoding(kaldi::DecodableInterface*, int)
    kaldi::SingleUtteranceNnet2Decoder::AdvanceDecoding()
    /home/sites/Kaldi2Attempt/kaldi-trunk/src/online2bin/online2-wav-nnet2-latgen-faster(main+0xb67) [0x702c2d]
    /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd) [0x7f4861da6ead]
    /home/sites/Kaldi2Attempt/kaldi-trunk/src/online2bin/online2-wav-nnet2-latgen-faster() [0x701ad5]
    

    Anyone knows what I did wrong?

    EDIT: That ARPA language model I posted looks a bit ambiguous, I tried with a bigger one that includes Back-off weights for 1-gram and 2-grams, but I get the same error

     

    Last edit: Orest 2015-06-16
    • Daniel Povey

      Daniel Povey - 2015-06-16

      You are treating G.fst as HCLG.fst, they are different types of graph,
      G.fst contains only words and HCLG.fst is compiled with the context
      dependency and so on. have a look at hbka.pdf (search online) for an
      intro to that stuff.
      You need to run the mkgraph script.

      Dan

      On Tue, Jun 16, 2015 at 7:51 AM, Orest ori553@users.sf.net wrote:

      Thanks for your replies "Jan "yenda" Trmal" and "Dan". I tried the arpa2G.sh
      script (with the idea of using the same Lexicon) but I am having some
      issues:

      I created a very small ARPA language model using MITLM, (random words, just
      for testing) by using only the words contained in words.txt, this is the
      ARPA LM:

      http://pastebin.com/JftmN4yr

      I then proceeded with:

      ./arpa2G.sh textARPA.lm /path-to-already-built-online-nnet2/graph
      /Destination-Directory-where-I-Wanted-G.fst

      the Scripts outputs:

      arpa2fst -
      Processing 1-grams
      Processing 2-grams
      Processing 3-grams
      Connected 0 states without outgoing arcs.
      fstisstochastic /home/TransformArpaToFST2/G.fst
      -8.56817e-07 -0.169189

      Which looks like a reasonable output to me. I fstprint the newly-created
      G.fst and I got;

      http://pastebin.com/5frRpfEq

      I then decode my wav file with the command suggested in
      http://kaldi.sourceforge.net/online_decoding.html (at the end), this is the
      suggested command:

      ~/kaldi-online/src/online2bin/online2-wav-nnet2-latgen-faster
      --do-endpointing=false \
      --online=false \
      --config=nnet_a_gpu_online/conf/online_nnet2_decoding.conf \
      --max-active=7000 --beam=15.0 --lattice-beam=6.0 \
      --acoustic-scale=0.1 --word-symbol-table=graph/words.txt \
      nnet_a_gpu_online/smbr_epoch2.mdl graph/HCLG.fst "ark:echo utterance-id1
      utterance-id1|" "scp:echo utterance-id1 ENG_M.wav|" \
      ark:/dev/null

      So, this time, instead of having as parameter graph/HCLG.fst I use the path
      to my newly-created G.fst, and I get:

      LOG
      (online2-wav-nnet2-latgen-faster:ComputeDerivedVars():ivector-extractor.cc:180)
      Computing derived variables for iVector extractor
      LOG
      (online2-wav-nnet2-latgen-faster:ComputeDerivedVars():ivector-extractor.cc:201)
      Done.
      KALDI_ASSERT: at
      online2-wav-nnet2-latgen-faster:TransitionIdToPdf:hmm/transition-model.h:316,
      failed: static_cast<size_t>(trans_id) < id2state_.size() && "Likely
      graph/model mismatch (graph built from wrong model?)"
      Stack trace is:
      kaldi::KaldiGetStackTrace()
      kaldi::KaldiAssertFailure_(char const, char const, int, char const)
      kaldi::TransitionModel::TransitionIdToPdf(int) const
      kaldi::nnet2::DecodableNnet2Online::LogLikelihood(int, int)
      kaldi::LatticeFasterOnlineDecoder::ProcessEmitting(kaldi::DecodableInterface
      )
      kaldi::LatticeFasterOnlineDecoder::AdvanceDecoding(kaldi::DecodableInterface,
      int)
      kaldi::SingleUtteranceNnet2Decoder::AdvanceDecoding()
      /home/sites/Kaldi2Attempt/kaldi-trunk/src/online2bin/online2-wav-nnet2-latgen-faster(main+0xb67)
      [0x702c2d]
      /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd) [0x7f4861da6ead]
      /home/sites/Kaldi2Attempt/kaldi-trunk/src/online2bin/online2-wav-nnet2-latgen-faster()
      [0x701ad5]
      WARNING
      (online2-wav-nnet2-latgen-faster:~HashList():util/hash-list-inl.h:116)
      Possible memory leak: 1023 != 1024: you might have forgotten to call Delete
      on some Elems
      KALDI_ASSERT: at
      online2-wav-nnet2-latgen-faster:TransitionIdToPdf:hmm/transition-model.h:316,
      failed: static_cast<size_t>(trans_id) < id2state_.size() && "Likely
      graph/model mismatch (graph built from wrong model?)"
      Stack trace is:
      kaldi::KaldiGetStackTrace()
      kaldi::KaldiAssertFailure_(char const</size_t>
      , char const, int, char const)
      kaldi::TransitionModel::TransitionIdToPdf(int) const
      kaldi::nnet2::DecodableNnet2Online::LogLikelihood(int, int)
      kaldi::LatticeFasterOnlineDecoder::ProcessEmitting(kaldi::DecodableInterface)
      kaldi::LatticeFasterOnlineDecoder::AdvanceDecoding(kaldi::DecodableInterface
      ,
      int)
      kaldi::SingleUtteranceNnet2Decoder::AdvanceDecoding()
      /home/sites/Kaldi2Attempt/kaldi-trunk/src/online2bin/online2-wav-nnet2-latgen-faster(main+0xb67)
      [0x702c2d]
      /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd) [0x7f4861da6ead]
      /home/sites/Kaldi2Attempt/kaldi-trunk/src/online2bin/online2-wav-nnet2-latgen-faster()
      [0x701ad5]</size_t>

      Anyone knows what I did wrong?


      Custom language model on already-build online-nnet2


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/kaldi/discussion/1355348/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       
MongoDB Logo MongoDB