Kaldi / Discussion / Help: Custom language model on already-build online-nnet2

Orest - 2015-06-12

Hi, I'm trying to familiarize with the Kaldi toolkit so I'm trying the voxforge s5 training scripts using the voxforge data, and I tried the pre-built model situated at the end of http://kaldi.sourceforge.net/online_decoding.html on my own audio.

My question is: can I use that model with my own language model or that would involve re-training the model?

I see that in the graph/ folder there is a file words.txt. Does it make sense to create an ARPA Language model (that uses only the words contained in words.txt), convert it to the Finite State Transducer form of and proceed with decoding?

I see 3 FST files in the graph/ folder (Ha.fst HCLGa.fst HCLG.fst), are all of those 3 files used in decoding?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Jan "yenda" Trmal - 2015-06-12
  
  If you have just a new language model (in arpa format), but the lexicon is
  the same, then you just use the script called arpa2G.sh (you can find that
  script it in some recipes in egs/), Babel for example.
  
  If the lexicon changed (but the set of phonemes is the same), then you will
  have to first regenerate L.fst, while making sure that the indices of the
  phonemes are the same as in the original L.fst (and perhaps there are some
  other conditions to fulfill as well). I'm not sure if there is a script
  which would help you with that (or which you could use as an example), I
  think I'd just start with make_lang.sh. After generating the L.fst you can
  go ahead and generate G.fst. It's been a while since I needed this, so I
  might not be completely right.
  
  If the lexicon adds new phones, then you would have to retrain.
  
  After generating G.fst (and possibly L.fst), you will have the re-generate
  the decoding graphs again and decode the audio again.
  
  y.
  
  y.
  
  On Fri, Jun 12, 2015 at 7:37 AM, Orest ori553@users.sf.net wrote:
  
  Hi, I'm trying to familiarize with the Kaldi toolkit so I'm trying the
  voxforge s5 training scripts using the voxforge data, and I tried the
  pre-built model situated at the end of
  http://kaldi.sourceforge.net/online_decoding.html on my own audio.
  
  My question is: can I use that model with my own language model or that
  would involve re-training the model?
  
  I see that in the graph/ folder there is a file words.txt. Does it make
  sense to create an ARPA Language model (that uses only the words contained
  in words.txt), convert it to the Finite State Transducer form of and
  proceed with decoding?
  
  I see 3 FST files in the graph/ folder (Ha.fst HCLGa.fst HCLG.fst), are
  all of those 3 files used in decoding?
  
  Custom language model on already-build online-nnet2
  
  Sent from sourceforge.net because you indicated interest in <
  https://sourceforge.net/p/kaldi/discussion/1355348/>
  
  To unsubscribe from further messages, please visit <
  https://sourceforge.net/auth/subscriptions/>
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Daniel Povey - 2015-06-12
    
    Yes, prepare_lang.sh can be used for this, you need to use the
    --phone-symbol-table option to ensure the generated phones.txt is
    compatible.
    Of course, the lexicon needs to have been prepared with the same
    conventions as the baseline lexicon- this means you can add new word
    and maybe take away words, but not change the pronunciations of
    existing words, and to use the same conventions as the existing words.
    Dan
    
    On Fri, Jun 12, 2015 at 9:46 AM, Jan jtrmal@users.sf.net wrote:
    
    ERROR! The markdown supplied could not be parsed correctly. Did you forget
    to surround a code snippet with "~~~~"?
    
    If you have just a new language model (in arpa format), but the lexicon is
    the same, then you just use the script called arpa2G.sh (you can find that
    script it in some recipes in egs/), Babel for example.
    
    If the lexicon changed (but the set of phonemes is the same), then you will
    have to first regenerate L.fst, while making sure that the indices of the
    phonemes are the same as in the original L.fst (and perhaps there are some
    other conditions to fulfill as well). I'm not sure if there is a script
    which would help you with that (or which you could use as an example), I
    think I'd just start with make_lang.sh. After generating the L.fst you can
    go ahead and generate G.fst. It's been a while since I needed this, so I
    might not be completely right.
    
    If the lexicon adds new phones, then you would have to retrain.
    
    After generating G.fst (and possibly L.fst), you will have the re-generate
    the decoding graphs again and decode the audio again.
    
    y.
    
    y.
    
    On Fri, Jun 12, 2015 at 7:37 AM, Orest ori553@users.sf.net wrote:
    
    Hi, I'm trying to familiarize with the Kaldi toolkit so I'm trying the
    voxforge s5 training scripts using the voxforge data, and I tried the
    pre-built model situated at the end of
    http://kaldi.sourceforge.net/online_decoding.html on my own audio.
    
    My question is: can I use that model with my own language model or that
    would involve re-training the model?
    
    I see that in the graph/ folder there is a file words.txt. Does it make
    sense to create an ARPA Language model (that uses only the words contained
    in words.txt), convert it to the Finite State Transducer form of and
    proceed with decoding?
    
    I see 3 FST files in the graph/ folder (Ha.fst HCLGa.fst HCLG.fst), are
    all of those 3 files used in decoding?
    
    [Custom language model on already-build online-nnet2](
    
    https://sourceforge.net/p/kaldi/discussion/1355348/thread/4ea4ee8a/?limit=25#b9ce
    )
    
    Sent from sourceforge.net because you indicated interest in <
    https://sourceforge.net/p/kaldi/discussion/1355348/>
    
    To unsubscribe from further messages, please visit <
    https://sourceforge.net/auth/subscriptions/>
    
    Custom language model on already-build
    online-nnet2
    
    Sent from sourceforge.net because you indicated interest in
    https://sourceforge.net/p/kaldi/discussion/1355348/
    
    To unsubscribe from further messages, please visit
    https://sourceforge.net/auth/subscriptions/
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Orest - 2015-06-16

Thanks for your replies "Jan "yenda" Trmal" and "Dan". I tried the arpa2G.sh script (with the idea of using the same Lexicon) but I am having some issues:

I created a very small ARPA language model using MITLM, (random words, just for testing) by using only the words contained in words.txt, this is the ARPA LM:

http://pastebin.com/JftmN4yr

I then proceeded with:

./arpa2G.sh textARPA.lm /path-to-already-built-online-nnet2/graph /Destination-Directory-where-I-Wanted-G.fst

the Scripts outputs:

arpa2fst - Processing 1-grams Processing 2-grams Processing 3-grams Connected 0 states without outgoing arcs. fstisstochastic /home/TransformArpaToFST2/G.fst -8.56817e-07 -0.169189

Which looks like a reasonable output to me. I fstprint the newly-created G.fst and I got;

http://pastebin.com/5frRpfEq

I then decode my wav file with the command suggested in http://kaldi.sourceforge.net/online_decoding.html (at the end), this is the suggested command:

~/kaldi-online/src/online2bin/online2-wav-nnet2-latgen-faster --do-endpointing=false \ --online=false \ --config=nnet_a_gpu_online/conf/online_nnet2_decoding.conf \ --max-active=7000 --beam=15.0 --lattice-beam=6.0 \ --acoustic-scale=0.1 --word-symbol-table=graph/words.txt \ nnet_a_gpu_online/smbr_epoch2.mdl graph/HCLG.fst "ark:echo utterance-id1 utterance-id1|" "scp:echo utterance-id1 ENG_M.wav|" \ ark:/dev/null

So, this time, instead of having as parameter graph/HCLG.fst I use the path to my newly-created G.fst, and I get:

LOG (online2-wav-nnet2-latgen-faster:ComputeDerivedVars():ivector-extractor.cc:180) Computing derived variables for iVector extractor LOG (online2-wav-nnet2-latgen-faster:ComputeDerivedVars():ivector-extractor.cc:201) Done. KALDI_ASSERT: at online2-wav-nnet2-latgen-faster:TransitionIdToPdf:hmm/transition-model.h:316, failed: static_cast<size_t>(trans_id) < id2state_.size() && "Likely graph/model mismatch (graph built from wrong model?)" Stack trace is: kaldi::KaldiGetStackTrace() kaldi::KaldiAssertFailure_(char const*, char const*, int, char const*) kaldi::TransitionModel::TransitionIdToPdf(int) const kaldi::nnet2::DecodableNnet2Online::LogLikelihood(int, int) kaldi::LatticeFasterOnlineDecoder::ProcessEmitting(kaldi::DecodableInterface*) kaldi::LatticeFasterOnlineDecoder::AdvanceDecoding(kaldi::DecodableInterface*, int) kaldi::SingleUtteranceNnet2Decoder::AdvanceDecoding() /home/sites/Kaldi2Attempt/kaldi-trunk/src/online2bin/online2-wav-nnet2-latgen-faster(main+0xb67) [0x702c2d] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd) [0x7f4861da6ead] /home/sites/Kaldi2Attempt/kaldi-trunk/src/online2bin/online2-wav-nnet2-latgen-faster() [0x701ad5] WARNING (online2-wav-nnet2-latgen-faster:~HashList():util/hash-list-inl.h:116) Possible memory leak: 1023 != 1024: you might have forgotten to call Delete on some Elems KALDI_ASSERT: at online2-wav-nnet2-latgen-faster:TransitionIdToPdf:hmm/transition-model.h:316, failed: static_cast<size_t>(trans_id) < id2state_.size() && "Likely graph/model mismatch (graph built from wrong model?)" Stack trace is: kaldi::KaldiGetStackTrace() kaldi::KaldiAssertFailure_(char const*, char const*, int, char const*) kaldi::TransitionModel::TransitionIdToPdf(int) const kaldi::nnet2::DecodableNnet2Online::LogLikelihood(int, int) kaldi::LatticeFasterOnlineDecoder::ProcessEmitting(kaldi::DecodableInterface*) kaldi::LatticeFasterOnlineDecoder::AdvanceDecoding(kaldi::DecodableInterface*, int) kaldi::SingleUtteranceNnet2Decoder::AdvanceDecoding() /home/sites/Kaldi2Attempt/kaldi-trunk/src/online2bin/online2-wav-nnet2-latgen-faster(main+0xb67) [0x702c2d] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd) [0x7f4861da6ead] /home/sites/Kaldi2Attempt/kaldi-trunk/src/online2bin/online2-wav-nnet2-latgen-faster() [0x701ad5]

Anyone knows what I did wrong?

EDIT: That ARPA language model I posted looks a bit ambiguous, I tried with a bigger one that includes Back-off weights for 1-gram and 2-grams, but I get the same error

Last edit: Orest 2015-06-16
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Daniel Povey - 2015-06-16
  
  You are treating G.fst as HCLG.fst, they are different types of graph,
  G.fst contains only words and HCLG.fst is compiled with the context
  dependency and so on. have a look at hbka.pdf (search online) for an
  intro to that stuff.
  You need to run the mkgraph script.
  
  Dan
  
  On Tue, Jun 16, 2015 at 7:51 AM, Orest ori553@users.sf.net wrote:
  
  Thanks for your replies "Jan "yenda" Trmal" and "Dan". I tried the arpa2G.sh
  script (with the idea of using the same Lexicon) but I am having some
  issues:
  
  I created a very small ARPA language model using MITLM, (random words, just
  for testing) by using only the words contained in words.txt, this is the
  ARPA LM:
  
  http://pastebin.com/JftmN4yr
  
  I then proceeded with:
  
  ./arpa2G.sh textARPA.lm /path-to-already-built-online-nnet2/graph
  /Destination-Directory-where-I-Wanted-G.fst
  
  the Scripts outputs:
  
  arpa2fst -
  Processing 1-grams
  Processing 2-grams
  Processing 3-grams
  Connected 0 states without outgoing arcs.
  fstisstochastic /home/TransformArpaToFST2/G.fst
  -8.56817e-07 -0.169189
  
  Which looks like a reasonable output to me. I fstprint the newly-created
  G.fst and I got;
  
  http://pastebin.com/5frRpfEq
  
  I then decode my wav file with the command suggested in
  http://kaldi.sourceforge.net/online_decoding.html (at the end), this is the
  suggested command:
  
  ~/kaldi-online/src/online2bin/online2-wav-nnet2-latgen-faster
  --do-endpointing=false \
  --online=false \
  --config=nnet_a_gpu_online/conf/online_nnet2_decoding.conf \
  --max-active=7000 --beam=15.0 --lattice-beam=6.0 \
  --acoustic-scale=0.1 --word-symbol-table=graph/words.txt \
  nnet_a_gpu_online/smbr_epoch2.mdl graph/HCLG.fst "ark:echo utterance-id1
  utterance-id1|" "scp:echo utterance-id1 ENG_M.wav|" \
  ark:/dev/null
  
  So, this time, instead of having as parameter graph/HCLG.fst I use the path
  to my newly-created G.fst, and I get:
  
  LOG
  (online2-wav-nnet2-latgen-faster:ComputeDerivedVars():ivector-extractor.cc:180)
  Computing derived variables for iVector extractor
  LOG
  (online2-wav-nnet2-latgen-faster:ComputeDerivedVars():ivector-extractor.cc:201)
  Done.
  KALDI_ASSERT: at
  online2-wav-nnet2-latgen-faster:TransitionIdToPdf:hmm/transition-model.h:316,
  failed: static_cast<size_t>(trans_id) < id2state_.size() && "Likely
  graph/model mismatch (graph built from wrong model?)"
  Stack trace is:
  kaldi::KaldiGetStackTrace()
  kaldi::KaldiAssertFailure_(char const, char const, int, char const)
  kaldi::TransitionModel::TransitionIdToPdf(int) const
  kaldi::nnet2::DecodableNnet2Online::LogLikelihood(int, int)
  kaldi::LatticeFasterOnlineDecoder::ProcessEmitting(kaldi::DecodableInterface)
  kaldi::LatticeFasterOnlineDecoder::AdvanceDecoding(kaldi::DecodableInterface,
  int)
  kaldi::SingleUtteranceNnet2Decoder::AdvanceDecoding()
  /home/sites/Kaldi2Attempt/kaldi-trunk/src/online2bin/online2-wav-nnet2-latgen-faster(main+0xb67)
  [0x702c2d]
  /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd) [0x7f4861da6ead]
  /home/sites/Kaldi2Attempt/kaldi-trunk/src/online2bin/online2-wav-nnet2-latgen-faster()
  [0x701ad5]
  WARNING
  (online2-wav-nnet2-latgen-faster:~HashList():util/hash-list-inl.h:116)
  Possible memory leak: 1023 != 1024: you might have forgotten to call Delete
  on some Elems
  KALDI_ASSERT: at
  online2-wav-nnet2-latgen-faster:TransitionIdToPdf:hmm/transition-model.h:316,
  failed: static_cast<size_t>(trans_id) < id2state_.size() && "Likely
  graph/model mismatch (graph built from wrong model?)"
  Stack trace is:
  kaldi::KaldiGetStackTrace()
  kaldi::KaldiAssertFailure_(char const</size_t>, char const, int, char const)
  kaldi::TransitionModel::TransitionIdToPdf(int) const
  kaldi::nnet2::DecodableNnet2Online::LogLikelihood(int, int)
  kaldi::LatticeFasterOnlineDecoder::ProcessEmitting(kaldi::DecodableInterface)
  kaldi::LatticeFasterOnlineDecoder::AdvanceDecoding(kaldi::DecodableInterface,
  int)
  kaldi::SingleUtteranceNnet2Decoder::AdvanceDecoding()
  /home/sites/Kaldi2Attempt/kaldi-trunk/src/online2bin/online2-wav-nnet2-latgen-faster(main+0xb67)
  [0x702c2d]
  /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd) [0x7f4861da6ead]
  /home/sites/Kaldi2Attempt/kaldi-trunk/src/online2bin/online2-wav-nnet2-latgen-faster()
  [0x701ad5]</size_t>
  
  Anyone knows what I did wrong?
  
  Custom language model on already-build online-nnet2
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/kaldi/discussion/1355348/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Custom language model on already-build online-nnet2

Forums

Help

Custom language model on already-build online-nnet2 document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Custom language model on already-build online-nnet2