I have been trying to use the train_mmi.sh to train a model using boosted mmi, withouth the boosted flag, it seems to be working properly but once I gave it a value the following error came up
lattice-boost-ali --b=0.3 --silence-phones=1:2:3:4:5:6:7:8:9:10:11:12:13:14:15 exp/tri4b_dnn_multi_ali/final.mdl scp:exp/tri4b_dnn_multi_bmmi/lat.scp 'ark,p:gunzip -c exp/tri4b_dnn_multi_ali/ali..gz |' ark:-
WARNING (lattice-boost-ali:LoadCurrent():util/kaldi-table-inl.h:224) TableReader: failed to open file gunzip
ERROR (lattice-boost-ali:Value():util/kaldi-table-inl.h:143) TableReader: failed to load object from gunzip (to suppress this error, add the permissive (p, ) option to the rspecifier.
WARNING (lattice-boost-ali:Close():kaldi-io.cc:446) Pipe gunzip -c exp/tri4b_dnn_multi_ali/ali..gz | had nonzero return status 13
ERROR (lattice-boost-ali:Value():util/kaldi-table-inl.h:143) TableReader: failed to load object from gunzip (to suppress this error, add the permissive (p, ) option to the rspecifier.
At the beginning I didn't have the p option in the ark:gunzip ... rspecifier but then I added and still won't change a thing, I also double checked that the alignments didn't contain any errors.
Could anyone please tell me what am I doing wrong?
Cheers,
Angel
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have been trying to use the train_mmi.sh to train a model using boosted
mmi, withouth the boosted flag, it seems to be working properly but once I
gave it a value the following error came up
lattice-boost-ali --b=0.3
--silence-phones=1:2:3:4:5:6:7:8:9:10:11:12:13:14:15
exp/tri4b_dnn_multi_ali/final.mdl scp:exp/tri4b_dnn_multi_bmmi/lat.scp
'ark,p:gunzip -c exp/tri4b_dnn_multi_ali/ali..gz |' ark:-
WARNING (lattice-boost-ali:LoadCurrent():util/kaldi-table-inl.h:224)
TableReader: failed to open file gunzip
ERROR (lattice-boost-ali:Value():util/kaldi-table-inl.h:143) TableReader:
failed to load object from gunzip (to suppress this error, add the
permissive (p, ) option to the rspecifier.
WARNING (lattice-boost-ali:Close():kaldi-io.cc:446) Pipe gunzip -c
exp/tri4b_dnn_multi_ali/ali..gz | had nonzero return status 13
ERROR (lattice-boost-ali:Value():util/kaldi-table-inl.h:143) TableReader:
failed to load object from gunzip (to suppress this error, add the
permissive (p, ) option to the rspecifier.
At the beginning I didn't have the p option in the ark:gunzip ...
rspecifier but then I added and still won't change a thing, I also double
checked that the alignments didn't contain any errors.
Could anyone please tell me what am I doing wrong?
I don't think his issue is coming from his archive that starts with
"gunzip". I think it's more likely that his file
exp/tri4b_dnn_multi_bmmi/lat.scp contains something like "foo gunzip"
as one of its lines.
I have been trying to use the train_mmi.sh to train a model using boosted
mmi, withouth the boosted flag, it seems to be working properly but once I
gave it a value the following error came up
lattice-boost-ali --b=0.3
--silence-phones=1:2:3:4:5:6:7:8:9:10:11:12:13:14:15
exp/tri4b_dnn_multi_ali/final.mdl scp:exp/tri4b_dnn_multi_bmmi/lat.scp
'ark,p:gunzip -c exp/tri4b_dnn_multi_ali/ali..gz |' ark:-
WARNING (lattice-boost-ali:LoadCurrent():util/kaldi-table-inl.h:224)
TableReader: failed to open file gunzip
ERROR (lattice-boost-ali:Value():util/kaldi-table-inl.h:143) TableReader:
failed to load object from gunzip (to suppress this error, add the
permissive (p, ) option to the rspecifier.
WARNING (lattice-boost-ali:Close():kaldi-io.cc:446) Pipe gunzip -c
exp/tri4b_dnn_multi_ali/ali..gz | had nonzero return status 13
ERROR (lattice-boost-ali:Value():util/kaldi-table-inl.h:143) TableReader:
failed to load object from gunzip (to suppress this error, add the
permissive (p, ) option to the rspecifier.
At the beginning I didn't have the p option in the ark:gunzip ...
rspecifier but then I added and still won't change a thing, I also double
checked that the alignments didn't contain any errors.
Could anyone please tell me what am I doing wrong?
Cheers,
Angel
[train_mmi.sh Issue: lattice-boost-ali is unable to open zip alignments](
you were absolutely right the problem was that the $dir/lat.scp produced by the train_mmi.sh script when the option --boost had a non-zero value was something like:
UTT-ID gunzip:
After further inspection, the bug derives from:
if [[ "$boost" != "0.0" && "$boost" != 0 ]]; then
#make lattice scp with same order as the shuffled feature scp
awk '{ if(r==0) { latH[$1]=$2; }
if(r==1) { if(latH[$1] != "") { print $1" "latH[$1] } }
}' $denlatdir/lat.scp r=1 $dir/train.scp > $dir/lat.scp
so the awk command was only copying the var $2 but not the rest and since the line has actually up to $5 variables because of the default field separator
So the substr($0,length($1) + 2) replacement takes the whole string - the length of the first column $1 + 2; + 1 because length is zero indexed and substr starting point is not and the other + 1 to avoid repeating the space.
Thank you Dan and Yenda for your help
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Karel, could you please fix this?
I think a comment explaining what the "r=1" thing is doing would be
helpful, too; that seems like a quite obscure feature of awk.
Dan
you were absolutely right the problem was that the $dir/lat.scp produced
by the train_mmi.sh script when the option --boost had a non-zero value was
something like:
UTT-ID gunzip:
After further inspection, the bug derives from:
if [[ "$boost" != "0.0" && "$boost" != 0 ]]; then
make lattice scp with same order as the shuffled feature scp
so the awk command was only copying the var $2 but not the rest and since
the line has actually up to $5 variables because of the default field
separator
So the substr($0,length($1) + 2) replacement takes the whole string - the
length of the first column $1 + 2; + 1 because length is zero indexed and
substr starting point is not and the other + 1 to avoid repeating the
space.
Karel, could you please fix this?
I think a comment explaining what the "r=1" thing is doing would be
helpful, too; that seems like a quite obscure feature of awk.
Dan
On Wed, Jul 15, 2015 at 1:09 PM, Angel Castro
angel-castro@users.sf.net angel-castro@users.sf.net
wrote:
Yes, it is actually a very cool feature from awk that lets you control the parsing of different files, I didn't know you could do that. Thanks Karel for the lesson.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes it is. We should both thank to 'Lukas Burget', he is the original
author of the trick ;)
K.
Dne 15. 7. 2015 v 15:04 Angel Castro napsal(a):
Yes, it is actually a very cool feature from awk that lets you control
the parsing of different files, I didn't know you could do that.
Thanks Karel for the lesson.
Then check the scp file as dan suggested.
Also, just an idea... How many files is there? there is some built-in
limitation on the length of the command line in the kernel or there might
be a problem with the wildcard substitution -- please just try one file
(without wildcards) to see if it relates to this.
y.
Hi everyone,
I have been trying to use the train_mmi.sh to train a model using boosted mmi, withouth the boosted flag, it seems to be working properly but once I gave it a value the following error came up
lattice-boost-ali --b=0.3 --silence-phones=1:2:3:4:5:6:7:8:9:10:11:12:13:14:15 exp/tri4b_dnn_multi_ali/final.mdl scp:exp/tri4b_dnn_multi_bmmi/lat.scp 'ark,p:gunzip -c exp/tri4b_dnn_multi_ali/ali..gz |' ark:-
WARNING (lattice-boost-ali:LoadCurrent():util/kaldi-table-inl.h:224) TableReader: failed to open file gunzip
ERROR (lattice-boost-ali:Value():util/kaldi-table-inl.h:143) TableReader: failed to load object from gunzip (to suppress this error, add the permissive (p, ) option to the rspecifier.
WARNING (lattice-boost-ali:Close():kaldi-io.cc:446) Pipe gunzip -c exp/tri4b_dnn_multi_ali/ali..gz | had nonzero return status 13
ERROR (lattice-boost-ali:Value():util/kaldi-table-inl.h:143) TableReader: failed to load object from gunzip (to suppress this error, add the permissive (p, ) option to the rspecifier.
At the beginning I didn't have the p option in the ark:gunzip ... rspecifier but then I added and still won't change a thing, I also double checked that the alignments didn't contain any errors.
Could anyone please tell me what am I doing wrong?
Cheers,
Angel
Do you have gunzip on PATH?
y.
On Wed, Jul 15, 2015 at 7:29 AM, Angel Castro angel-castro@users.sf.net
wrote:
I don't think his issue is coming from his archive that starts with
"gunzip". I think it's more likely that his file
exp/tri4b_dnn_multi_bmmi/lat.scp contains something like "foo gunzip"
as one of its lines.
Dan
On Wed, Jul 15, 2015 at 9:05 AM, Jan jtrmal@users.sf.net wrote:
Hi Dan,
you were absolutely right the problem was that the $dir/lat.scp produced by the train_mmi.sh script when the option --boost had a non-zero value was something like:
UTT-ID gunzip:
After further inspection, the bug derives from:
if [[ "$boost" != "0.0" && "$boost" != 0 ]]; then
#make lattice scp with same order as the shuffled feature scp
awk '{ if(r==0) { latH[$1]=$2; }
if(r==1) { if(latH[$1] != "") { print $1" "latH[$1] } }
}' $denlatdir/lat.scp r=1 $dir/train.scp > $dir/lat.scp
so the awk command was only copying the var $2 but not the rest and since the line has actually up to $5 variables because of the default field separator
I made this small change and now it works:
awk '{ if(r==0) { latH[$1]=substr($0,length($1) + 2); }
if(r==1) { if(latH[$1] != "") { print $1" "latH[$1] } }
So the substr($0,length($1) + 2) replacement takes the whole string - the length of the first column $1 + 2; + 1 because length is zero indexed and substr starting point is not and the other + 1 to avoid repeating the space.
Thank you Dan and Yenda for your help
Karel, could you please fix this?
I think a comment explaining what the "r=1" thing is doing would be
helpful, too; that seems like a quite obscure feature of awk.
Dan
On Wed, Jul 15, 2015 at 1:09 PM, Angel Castro angel-castro@users.sf.net
wrote:
Ok, I'll fix that. Thanks for finding the bug!
K.
Dne 15. 7. 2015 v 13:32 Daniel Povey napsal(a):
Yes, it is actually a very cool feature from awk that lets you control the parsing of different files, I didn't know you could do that. Thanks Karel for the lesson.
Yes it is. We should both thank to 'Lukas Burget', he is the original
author of the trick ;)
K.
Dne 15. 7. 2015 v 15:04 Angel Castro napsal(a):
Hi Yenda,
Yes gunzip is on the path. I even unzip the files into a common one and try to parse it directly and still will show the same message.
Then check the scp file as dan suggested.
Also, just an idea... How many files is there? there is some built-in
limitation on the length of the command line in the kernel or there might
be a problem with the wildcard substitution -- please just try one file
(without wildcards) to see if it relates to this.
y.
On Wed, Jul 15, 2015 at 2:16 PM, Angel Castro angel-castro@users.sf.net
wrote:
Hi, it sholud be working well now! Thanks for finding the bug!
K.
No problem thanks for fixing it