Hi,
I have some troubles while using the script "decdeo_biglm.sh".
time ./decode_biglm.sh.org tri2b_mmi_1200/graph_80k_173 lang_80k_tg173/G.fst lang_80k_tg172/G.fst test_utt tri2b_mmi_1200/test
WARNING (gmm-latgen-biglm-faster:PropagateLm():decoder/lattice-biglm-faster-decoder.h:673) No arc available in LM (unlikely to be correct if a statistical language model); will not warn again
KALDI_ASSERT: at gmm-latgen-biglm-faster:PruneForwardLinks:decoder/lattice-biglm-faster-decoder.h:394, failed: link_extra_cost == link_extra_cost
The difference between newlm and oldlm is the cutoff size and the vocabulary size is the same.
Should I remove the disambiguous symbol #0 after making G.fst ?
Please give me some suggestions, thanks!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Does your new lm contain all unigrams that were present in the old lm?
Or maybe your words.txt file was changed inadvertently between compiling the old and the new lm?
The decoder is looking for a word probability (following an history) and cannot find it in the new lm.
A correct LM is supposed to provide such a word probability for any word that appears in the oldlm.
Hi,
I have some troubles while using the script "decdeo_biglm.sh".
time ./decode_biglm.sh.org tri2b_mmi_1200/graph_80k_173 lang_80k_tg173/G.fst lang_80k_tg172/G.fst test_utt tri2b_mmi_1200/test
WARNING (gmm-latgen-biglm-faster:PropagateLm():decoder/lattice-biglm-faster-decoder.h:673) No arc available in LM (unlikely to be correct if a statistical language model); will not warn again
KALDI_ASSERT: at gmm-latgen-biglm-faster:PruneForwardLinks:decoder/lattice-biglm-faster-decoder.h:394, failed: link_extra_cost == link_extra_cost
The difference between newlm and oldlm is the cutoff size and the vocabulary size is the same.
Should I remove the disambiguous symbol #0 after making G.fst ?
Please give me some suggestions, thanks!
Thanks for your replies.
I'm sure that the new lm contains all unigrams which are presented in the old lm.
The words.txt is the same.
I have successfully decoded new lm and old lm with same words.txt while using "gmm-latgen-faster" separately.
But there still are some troubles with "gmm-latgen-biglm-faster".
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
How did you create the LM?
Also, you might want to get it in a debugger and figure out which word
it was complaining about. (It will be in integer but you can check in
words.txt)
Dan
Thanks for your replies.
I'm sure that the new lm contains all unigrams which are presented in the
old lm.
The words.txt is the same.
I have successfully decoded new lm and old lm with same words.txt while
using "gmm-latgen-faster" separately.
But there still are some troubles with "gmm-latgen-biglm-faster".
Yes, what I meant is, how did you create the ARPA-format LM? Also,
make sure your Kaldi code is fully up to date; we do more checking
recently in arpa2fst which might catch certain problems.
You should get in a debugger and figure out which word is involved. Do
gdb --args (program) (args)
commands you will need include
run (r)
catch throw
continue (c)
up
list
print (e.g. print some-c++-expression)
Hi,
I have some troubles while using the script "decdeo_biglm.sh".
time ./decode_biglm.sh.org tri2b_mmi_1200/graph_80k_173 lang_80k_tg173/G.fst lang_80k_tg172/G.fst test_utt tri2b_mmi_1200/test
WARNING (gmm-latgen-biglm-faster:PropagateLm():decoder/lattice-biglm-faster-decoder.h:673) No arc available in LM (unlikely to be correct if a statistical language model); will not warn again
KALDI_ASSERT: at gmm-latgen-biglm-faster:PruneForwardLinks:decoder/lattice-biglm-faster-decoder.h:394, failed: link_extra_cost == link_extra_cost
The difference between newlm and oldlm is the cutoff size and the vocabulary size is the same.
Should I remove the disambiguous symbol #0 after making G.fst ?
Please give me some suggestions, thanks!
Does your new lm contain all unigrams that were present in the old lm?
Or maybe your words.txt file was changed inadvertently between compiling the old and the new lm?
The decoder is looking for a word probability (following an history) and cannot find it in the new lm.
A correct LM is supposed to provide such a word probability for any word that appears in the oldlm.
Le 2015-07-11 à 10:54, L-A laou@users.sf.net a écrit :
hi,
Thanks for your replies.
I'm sure that the new lm contains all unigrams which are presented in the old lm.
The words.txt is the same.
I have successfully decoded new lm and old lm with same words.txt while using "gmm-latgen-faster" separately.
But there still are some troubles with "gmm-latgen-biglm-faster".
How did you create the LM?
Also, you might want to get it in a debugger and figure out which word
it was complaining about. (It will be in integer but you can check in
words.txt)
Dan
On Sat, Jul 11, 2015 at 10:28 PM, L-A laou@users.sf.net wrote:
Hi,
LM is created in ARPA format and changed into fst int the following steps.
cat lm.arpa | \ grep -v '' | \ arpa2fst - | \
' | \ grep -v '' | \ grep -v 'fstprint | \ eps2disambig.pl |\
s2eps.pl | \
fstcompile --isymbols=words.txt \ --osymbols=words.txt \ --keep_isymbols=false --keep_osymbols=false | \ fstrmepsilon > G.fst
Yes, what I meant is, how did you create the ARPA-format LM? Also,
make sure your Kaldi code is fully up to date; we do more checking
recently in arpa2fst which might catch certain problems.
You should get in a debugger and figure out which word is involved. Do
gdb --args (program) (args)
commands you will need include
run (r)
catch throw
continue (c)
up
list
print (e.g. print some-c++-expression)
Dan
On Mon, Jul 13, 2015 at 5:21 AM, L-A laou@users.sf.net wrote: