Hi, I'm trying to familiarize with the Kaldi toolkit so I'm trying the voxforge s5 training scripts using the voxforge data, and I tried the pre-built model situated at the end of http://kaldi.sourceforge.net/online_decoding.html on my own audio.
My question is: can I use that model with my own language model or that would involve re-training the model?
I see that in the graph/ folder there is a file words.txt. Does it make sense to create an ARPA Language model (that uses only the words contained in words.txt), convert it to the Finite State Transducer form of and proceed with decoding?
I see 3 FST files in the graph/ folder (Ha.fst HCLGa.fst HCLG.fst), are all of those 3 files used in decoding?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
If you have just a new language model (in arpa format), but the lexicon is
the same, then you just use the script called arpa2G.sh (you can find that
script it in some recipes in egs/), Babel for example.
If the lexicon changed (but the set of phonemes is the same), then you will
have to first regenerate L.fst, while making sure that the indices of the
phonemes are the same as in the original L.fst (and perhaps there are some
other conditions to fulfill as well). I'm not sure if there is a script
which would help you with that (or which you could use as an example), I
think I'd just start with make_lang.sh. After generating the L.fst you can
go ahead and generate G.fst. It's been a while since I needed this, so I
might not be completely right.
If the lexicon adds new phones, then you would have to retrain.
After generating G.fst (and possibly L.fst), you will have the re-generate
the decoding graphs again and decode the audio again.
Hi, I'm trying to familiarize with the Kaldi toolkit so I'm trying the
voxforge s5 training scripts using the voxforge data, and I tried the
pre-built model situated at the end of http://kaldi.sourceforge.net/online_decoding.html on my own audio.
My question is: can I use that model with my own language model or that
would involve re-training the model?
I see that in the graph/ folder there is a file words.txt. Does it make
sense to create an ARPA Language model (that uses only the words contained
in words.txt), convert it to the Finite State Transducer form of and
proceed with decoding?
I see 3 FST files in the graph/ folder (Ha.fst HCLGa.fst HCLG.fst), are
all of those 3 files used in decoding?
Yes, prepare_lang.sh can be used for this, you need to use the
--phone-symbol-table option to ensure the generated phones.txt is
compatible.
Of course, the lexicon needs to have been prepared with the same
conventions as the baseline lexicon- this means you can add new word
and maybe take away words, but not change the pronunciations of
existing words, and to use the same conventions as the existing words.
Dan
ERROR! The markdown supplied could not be parsed correctly. Did you forget
to surround a code snippet with "~~~~"?
If you have just a new language model (in arpa format), but the lexicon is
the same, then you just use the script called arpa2G.sh (you can find that
script it in some recipes in egs/), Babel for example.
If the lexicon changed (but the set of phonemes is the same), then you will
have to first regenerate L.fst, while making sure that the indices of the
phonemes are the same as in the original L.fst (and perhaps there are some
other conditions to fulfill as well). I'm not sure if there is a script
which would help you with that (or which you could use as an example), I
think I'd just start with make_lang.sh. After generating the L.fst you can
go ahead and generate G.fst. It's been a while since I needed this, so I
might not be completely right.
If the lexicon adds new phones, then you would have to retrain.
After generating G.fst (and possibly L.fst), you will have the re-generate
the decoding graphs again and decode the audio again.
Hi, I'm trying to familiarize with the Kaldi toolkit so I'm trying the
voxforge s5 training scripts using the voxforge data, and I tried the
pre-built model situated at the end of http://kaldi.sourceforge.net/online_decoding.html on my own audio.
My question is: can I use that model with my own language model or that
would involve re-training the model?
I see that in the graph/ folder there is a file words.txt. Does it make
sense to create an ARPA Language model (that uses only the words contained
in words.txt), convert it to the Finite State Transducer form of and
proceed with decoding?
I see 3 FST files in the graph/ folder (Ha.fst HCLGa.fst HCLG.fst), are
all of those 3 files used in decoding?
[Custom language model on already-build online-nnet2](
Thanks for your replies "Jan "yenda" Trmal" and "Dan". I tried the arpa2G.sh script (with the idea of using the same Lexicon) but I am having some issues:
I created a very small ARPA language model using MITLM, (random words, just for testing) by using only the words contained in words.txt, this is the ARPA LM:
So, this time, instead of having as parameter graph/HCLG.fst I use the path to my newly-created G.fst, and I get:
LOG(online2-wav-nnet2-latgen-faster:ComputeDerivedVars():ivector-extractor.cc:180)ComputingderivedvariablesforiVectorextractorLOG(online2-wav-nnet2-latgen-faster:ComputeDerivedVars():ivector-extractor.cc:201)Done.KALDI_ASSERT:atonline2-wav-nnet2-latgen-faster:TransitionIdToPdf:hmm/transition-model.h:316,failed:static_cast<size_t>(trans_id)<id2state_.size()&&"Likely graph/model mismatch (graph built from wrong model?)"Stacktraceis:kaldi::KaldiGetStackTrace()kaldi::KaldiAssertFailure_(charconst*,charconst*,int,charconst*)kaldi::TransitionModel::TransitionIdToPdf(int)constkaldi::nnet2::DecodableNnet2Online::LogLikelihood(int,int)kaldi::LatticeFasterOnlineDecoder::ProcessEmitting(kaldi::DecodableInterface*)kaldi::LatticeFasterOnlineDecoder::AdvanceDecoding(kaldi::DecodableInterface*,int)kaldi::SingleUtteranceNnet2Decoder::AdvanceDecoding()/home/sites/Kaldi2Attempt/kaldi-trunk/src/online2bin/online2-wav-nnet2-latgen-faster(main+0xb67)[0x702c2d]/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd)[0x7f4861da6ead]/home/sites/Kaldi2Attempt/kaldi-trunk/src/online2bin/online2-wav-nnet2-latgen-faster()[0x701ad5]WARNING(online2-wav-nnet2-latgen-faster:~HashList():util/hash-list-inl.h:116)Possiblememoryleak:1023!=1024:youmighthaveforgottentocallDeleteonsomeElemsKALDI_ASSERT:atonline2-wav-nnet2-latgen-faster:TransitionIdToPdf:hmm/transition-model.h:316,failed:static_cast<size_t>(trans_id)<id2state_.size()&&"Likely graph/model mismatch (graph built from wrong model?)"Stacktraceis:kaldi::KaldiGetStackTrace()kaldi::KaldiAssertFailure_(charconst*,charconst*,int,charconst*)kaldi::TransitionModel::TransitionIdToPdf(int)constkaldi::nnet2::DecodableNnet2Online::LogLikelihood(int,int)kaldi::LatticeFasterOnlineDecoder::ProcessEmitting(kaldi::DecodableInterface*)kaldi::LatticeFasterOnlineDecoder::AdvanceDecoding(kaldi::DecodableInterface*,int)kaldi::SingleUtteranceNnet2Decoder::AdvanceDecoding()/home/sites/Kaldi2Attempt/kaldi-trunk/src/online2bin/online2-wav-nnet2-latgen-faster(main+0xb67)[0x702c2d]/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd)[0x7f4861da6ead]/home/sites/Kaldi2Attempt/kaldi-trunk/src/online2bin/online2-wav-nnet2-latgen-faster()[0x701ad5]
Anyone knows what I did wrong?
EDIT: That ARPA language model I posted looks a bit ambiguous, I tried with a bigger one that includes Back-off weights for 1-gram and 2-grams, but I get the same error
Last edit: Orest 2015-06-16
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You are treating G.fst as HCLG.fst, they are different types of graph,
G.fst contains only words and HCLG.fst is compiled with the context
dependency and so on. have a look at hbka.pdf (search online) for an
intro to that stuff.
You need to run the mkgraph script.
Thanks for your replies "Jan "yenda" Trmal" and "Dan". I tried the arpa2G.sh
script (with the idea of using the same Lexicon) but I am having some
issues:
I created a very small ARPA language model using MITLM, (random words, just
for testing) by using only the words contained in words.txt, this is the
ARPA LM:
Hi, I'm trying to familiarize with the Kaldi toolkit so I'm trying the voxforge s5 training scripts using the voxforge data, and I tried the pre-built model situated at the end of http://kaldi.sourceforge.net/online_decoding.html on my own audio.
My question is: can I use that model with my own language model or that would involve re-training the model?
I see that in the graph/ folder there is a file words.txt. Does it make sense to create an ARPA Language model (that uses only the words contained in words.txt), convert it to the Finite State Transducer form of and proceed with decoding?
I see 3 FST files in the graph/ folder (Ha.fst HCLGa.fst HCLG.fst), are all of those 3 files used in decoding?
If you have just a new language model (in arpa format), but the lexicon is
the same, then you just use the script called arpa2G.sh (you can find that
script it in some recipes in egs/), Babel for example.
If the lexicon changed (but the set of phonemes is the same), then you will
have to first regenerate L.fst, while making sure that the indices of the
phonemes are the same as in the original L.fst (and perhaps there are some
other conditions to fulfill as well). I'm not sure if there is a script
which would help you with that (or which you could use as an example), I
think I'd just start with make_lang.sh. After generating the L.fst you can
go ahead and generate G.fst. It's been a while since I needed this, so I
might not be completely right.
If the lexicon adds new phones, then you would have to retrain.
After generating G.fst (and possibly L.fst), you will have the re-generate
the decoding graphs again and decode the audio again.
y.
y.
On Fri, Jun 12, 2015 at 7:37 AM, Orest ori553@users.sf.net wrote:
Yes, prepare_lang.sh can be used for this, you need to use the
--phone-symbol-table option to ensure the generated phones.txt is
compatible.
Of course, the lexicon needs to have been prepared with the same
conventions as the baseline lexicon- this means you can add new word
and maybe take away words, but not change the pronunciations of
existing words, and to use the same conventions as the existing words.
Dan
On Fri, Jun 12, 2015 at 9:46 AM, Jan jtrmal@users.sf.net wrote:
Thanks for your replies "Jan "yenda" Trmal" and "Dan". I tried the arpa2G.sh script (with the idea of using the same Lexicon) but I am having some issues:
I created a very small ARPA language model using MITLM, (random words, just for testing) by using only the words contained in words.txt, this is the ARPA LM:
http://pastebin.com/JftmN4yr
I then proceeded with:
the Scripts outputs:
Which looks like a reasonable output to me. I fstprint the newly-created G.fst and I got;
http://pastebin.com/5frRpfEq
I then decode my wav file with the command suggested in http://kaldi.sourceforge.net/online_decoding.html (at the end), this is the suggested command:
So, this time, instead of having as parameter graph/HCLG.fst I use the path to my newly-created G.fst, and I get:
Anyone knows what I did wrong?
EDIT: That ARPA language model I posted looks a bit ambiguous, I tried with a bigger one that includes Back-off weights for 1-gram and 2-grams, but I get the same error
Last edit: Orest 2015-06-16
You are treating G.fst as HCLG.fst, they are different types of graph,
G.fst contains only words and HCLG.fst is compiled with the context
dependency and so on. have a look at hbka.pdf (search online) for an
intro to that stuff.
You need to run the mkgraph script.
Dan
On Tue, Jun 16, 2015 at 7:51 AM, Orest ori553@users.sf.net wrote: