I am a new user of Kaldi and have little background knowledge of the toolkit. Right now I am trying some benchmark setup to compare Kaldi and our own DNN training toolkit. So I am thinking about some quick plan to do below for the comparison,
1) convert our data alignment files to Kaldi format
2) do DNN training with Kaldi
3) convert Kaldi-trained DNN model back to our own DNN format for testing
However, when I took a look at DNN training scripts (for Dan's implementation) in swbd 'steps/nnet2/train_pnorm_accel2.sh', I noticed the below initialization of shallow network seems not done from alignment, and tree is needed.
I am wondering, for Dan's DNN implementation, how DNN model is initialized from algorithm side in Kaldi and why tree is needed. The Kaldi homepage "Dan's DNN implementation" section does not have enough information regarding this.
Thanks,
Yan
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It needs that stuff because the .nnet files contain the
transition-model as well as the actual neural net. In most situations
the transition model is not used though. Getting rid of this might
require writing new binaries.
Also the nnet2 setup uses nonlinearity types that probably do not
exist in your setup (p-norm, normalize-layer, splicing layers). If it
is a speech task it would probably be much less work to just train a
Kaldi acoustic model, and the performance will probably be better
also.
Dan
I am a new user of Kaldi and have little background knowledge of the
toolkit. Right now I am trying some benchmark setup to compare Kaldi and our
own DNN training toolkit. So I am thinking about some quick plan to do below
for the comparison,
1) convert our data alignment files to Kaldi format
2) do DNN training with Kaldi
3) convert Kaldi-trained DNN model back to our own DNN format for testing
However, when I took a look at DNN training scripts (for Dan's
implementation) in swbd 'steps/nnet2/train_pnorm_accel2.sh', I noticed the
below initialization of shallow network seems not done from alignment, and
tree is needed.
I am wondering, for Dan's DNN implementation, how DNN model is initialized
from algorithm side in Kaldi and why tree is needed. The Kaldi homepage
"Dan's DNN implementation" section does not have enough information
regarding this.
does your implementation only support the configuration you mentioned (p-norm, norm layer, slicing layer), or also support pretty standard DNN configuration?
Yan
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It does support more standard configurations but the performance of
those is not always quite as good, and it hasn't been tuned as
recently. Actually ReLUs sometimes give better performance than
p-norm, but we always train them with the normalization layer to
ensure stability during training, and you can't test without that
layer being included. So without adding that to your toolkit you
wouldn't be able to do the comparison. Anyway that would probably be
the least of your problems.
does your implementation only support the configuration you mentioned
(p-norm, norm layer, slicing layer), or also support pretty standard DNN
configuration?
So in case of relu, you are saying, without normalziation, the training is not stable. We have been training relu net without any normalization and did not see the stability issue. we tried mean-normalized SGD as well and did not turned out to help. So does the stability issue without normalization layer has something to do with your parallelization and optimization methods (parameter averaging and natural gradient)?
I am OK with the relu net with normalization layer, and decode with our decoder. In decoding, the normalization layer should be treated as standard layer without extra support needed from our decoder side. Is there any existing recipe with relu net? It is ok if it is not well tuned.
regarding slicing layers, I know this is to handle the left and right feature context. But by looking at the nnet.config this still looks confusing to me. We are using pretty standard approach to directly feed feature with context to input layer. At this point I am trying to quickly get some idea without the need to look into source codes (I am pretty new to Kaldi), I want to see whether extra support is needed from our decoder for slicing layers. Overall my first goal is to set up plan quickly. Moving forward, I will for sure need to look into more code details.
By the way, do you have a sample run with all output dirs for either your wsj or switchboard DNN receipt somewhere that I can access?
thanks,
Yan
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
So in case of relu, you are saying, without normalziation, the training is
not stable. We have been training relu net without any normalization and did
not see the stability issue. we tried mean-normalized SGD as well and did
not turned out to help. So does the stability issue without normalization
layer has something to do with your parallelization and optimization methods
(parameter averaging and natural gradient)?
Not really. The natural gradient actually improves the stability.
People who train ReLUs with many layers usually have to resort to some
kind of trick to stabilize it, this happens to be the trick we have
chosen.
I am OK with the relu net with normalization layer, and decode with our
decoder. In decoding, the normalization layer should be treated as standard
layer without extra support needed from our decoder side. Is there any
existing recipe with relu net? It is ok if it is not well tuned.
regarding slicing layers, I know this is to handle the left and right
feature context. But by looking at the nnet.config this still looks
confusing to me. We are using pretty standard approach to directly feed
feature with context to input layer. At this point I am trying to quickly
get some idea without the need to look into source codes (I am pretty new to
Kaldi), I want to see whether extra support is needed from our decoder for
slicing layers. Overall my first goal is to set up plan quickly. Moving
forward, I will for sure need to look into more code details.
In the nnet2 code the splicing is done internally to the network but
you could just discard the SpliceComponent and do it from externally.
However the current ReLU recipes that we are using (e.g.
steps/nnet2/train_multisplice_accel2.sh if you set --pnorm-input-dim
and --pnorm-output-dim to the same value) actually also do splicing at
intermediate layers so your framework wouldn't be able to handle it.
We don't have any ReLU recipes currently, that don't do that.
By the way, do you have a sample run with all output dirs for either your
wsj or switchboard DNN receipt somewhere that I can access?
You can look on kaldi-asr.org and see if there is something.
You obviously have a lot of questions, because you've chosen to use
Kaldi in a way that is inherently quite difficult. I'm a busy person
and I'm not going to be able to hold your hand and take you through
all the things you need to do.
I understand, I will not expect that much help when I really start playing with the tool. Currently all the general questions are to estimate the effort we need for the work.
<< However the current ReLU recipes that we are using (e.g.steps/nnet2
<< /train_multisplice_accel2.sh if you set --pnorm-input-dim and --pnorm-output-dim to the << same value) actually also do splicing at
<< intermediate layers so your framework wouldn't be able to handle it.
<< We don't have any ReLU recipes currently, that don't do that.
Just want to confirm, I expect the modification of relu multi-splice receipt (so internal splicing of intermediate layer not done) to be just at shell script level with configuration changes, is it right? or c++ code level as well?
Yan
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, the changes are at the command line level, you would just remove
all the splicing specifications that say layer1/xxx and layer2/xxx and
so on, leaving only the layer0 one.
I understand, I will not expect that much help when I really start playing
with the tool. Currently all the general questions are to estimate the
effort we need for the work.
<< However the current ReLU recipes that we are using (e.g.steps/nnet2
<< /train_multisplice_accel2.sh if you set --pnorm-input-dim and
--pnorm-output-dim to the << same value) actually also do splicing at
<< intermediate layers so your framework wouldn't be able to handle it.
<< We don't have any ReLU recipes currently, that don't do that.
Just want to confirm, I expect the modification of relu multi-splice receipt
(so internal splicing of intermediate layer not done) to be just at shell
script level with configuration changes, is it right? or c++ code level as
well?
regarding setting relu activation component instead of pnorm, from what you mentioned earlier in this thread (--pnorm-input-dim and --pnorm-output-dim to the same value), I can add below in nnet.config
I believe what p value is does not really matter in this case, or I do not need to specify p=? in above line?
in the meanwhile I am wondering how such pnorm (same input and output dim) setting will ends up with same as relu activation given y = max(0,x) for relu while y = (|x|^p)^(1/p) for such pnorm?
thanks,
Yan
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
regarding setting relu activation component instead of pnorm, from what you
mentioned earlier in this thread (--pnorm-input-dim and --pnorm-output-dim
to the same value), I can add below in nnet.config
I believe what p value is does not really matter in this case, or I do not
need to specify p=? in above line?
in the meanwhile I am wondering how such pnorm (same input and output dim)
setting will ends up with same as relu activation given y = max(0,x) for
relu while y = (|x|^p)^(1/p) for such pnorm?
No, what I was talking about related to the TDNN scripts, which use
the RectfiedLinearComponent in that case. You have to use the
RectifiedLinearComponent if you want ReLU.
Hi All,
I am a new user of Kaldi and have little background knowledge of the toolkit. Right now I am trying some benchmark setup to compare Kaldi and our own DNN training toolkit. So I am thinking about some quick plan to do below for the comparison,
1) convert our data alignment files to Kaldi format
2) do DNN training with Kaldi
3) convert Kaldi-trained DNN model back to our own DNN format for testing
However, when I took a look at DNN training scripts (for Dan's implementation) in swbd 'steps/nnet2/train_pnorm_accel2.sh', I noticed the below initialization of shallow network seems not done from alignment, and tree is needed.
nnet-am-init $alidir/tree $lang/topo "nnet-init --srand=$srand $dir/nnet.config -|" $dir/0.mdl
I am wondering, for Dan's DNN implementation, how DNN model is initialized from algorithm side in Kaldi and why tree is needed. The Kaldi homepage "Dan's DNN implementation" section does not have enough information regarding this.
Thanks,
Yan
It needs that stuff because the .nnet files contain the
transition-model as well as the actual neural net. In most situations
the transition model is not used though. Getting rid of this might
require writing new binaries.
Also the nnet2 setup uses nonlinearity types that probably do not
exist in your setup (p-norm, normalize-layer, splicing layers). If it
is a speech task it would probably be much less work to just train a
Kaldi acoustic model, and the performance will probably be better
also.
Dan
Thanks Dan!
what is 'splicing layers'? it seems not mentioned in the documentation, maybe my misunderstanding.
SpliceComponent. For this type of thing you will have to search the
code, not the documentation; the documentation is only very
high-level.
Dan
On Tue, Jun 2, 2015 at 5:00 PM, Yan Yin riyijiye1976@users.sf.net wrote:
Thanks Dan.
does your implementation only support the configuration you mentioned (p-norm, norm layer, slicing layer), or also support pretty standard DNN configuration?
Yan
It does support more standard configurations but the performance of
those is not always quite as good, and it hasn't been tuned as
recently. Actually ReLUs sometimes give better performance than
p-norm, but we always train them with the normalization layer to
ensure stability during training, and you can't test without that
layer being included. So without adding that to your toolkit you
wouldn't be able to do the comparison. Anyway that would probably be
the least of your problems.
Dan
On Thu, Jun 4, 2015 at 12:53 PM, Yan Yin riyijiye1976@users.sf.net wrote:
Thanks Dan.
So in case of relu, you are saying, without normalziation, the training is not stable. We have been training relu net without any normalization and did not see the stability issue. we tried mean-normalized SGD as well and did not turned out to help. So does the stability issue without normalization layer has something to do with your parallelization and optimization methods (parameter averaging and natural gradient)?
I am OK with the relu net with normalization layer, and decode with our decoder. In decoding, the normalization layer should be treated as standard layer without extra support needed from our decoder side. Is there any existing recipe with relu net? It is ok if it is not well tuned.
regarding slicing layers, I know this is to handle the left and right feature context. But by looking at the nnet.config this still looks confusing to me. We are using pretty standard approach to directly feed feature with context to input layer. At this point I am trying to quickly get some idea without the need to look into source codes (I am pretty new to Kaldi), I want to see whether extra support is needed from our decoder for slicing layers. Overall my first goal is to set up plan quickly. Moving forward, I will for sure need to look into more code details.
By the way, do you have a sample run with all output dirs for either your wsj or switchboard DNN receipt somewhere that I can access?
thanks,
Yan
Not really. The natural gradient actually improves the stability.
People who train ReLUs with many layers usually have to resort to some
kind of trick to stabilize it, this happens to be the trick we have
chosen.
In the nnet2 code the splicing is done internally to the network but
you could just discard the SpliceComponent and do it from externally.
However the current ReLU recipes that we are using (e.g.
steps/nnet2/train_multisplice_accel2.sh if you set --pnorm-input-dim
and --pnorm-output-dim to the same value) actually also do splicing at
intermediate layers so your framework wouldn't be able to handle it.
We don't have any ReLU recipes currently, that don't do that.
You can look on kaldi-asr.org and see if there is something.
You obviously have a lot of questions, because you've chosen to use
Kaldi in a way that is inherently quite difficult. I'm a busy person
and I'm not going to be able to hold your hand and take you through
all the things you need to do.
Dan
Thanks Dan.
I understand, I will not expect that much help when I really start playing with the tool. Currently all the general questions are to estimate the effort we need for the work.
<< However the current ReLU recipes that we are using (e.g.steps/nnet2
<< /train_multisplice_accel2.sh if you set --pnorm-input-dim and --pnorm-output-dim to the << same value) actually also do splicing at
<< intermediate layers so your framework wouldn't be able to handle it.
<< We don't have any ReLU recipes currently, that don't do that.
Just want to confirm, I expect the modification of relu multi-splice receipt (so internal splicing of intermediate layer not done) to be just at shell script level with configuration changes, is it right? or c++ code level as well?
Yan
Yes, the changes are at the command line level, you would just remove
all the splicing specifications that say layer1/xxx and layer2/xxx and
so on, leaving only the layer0 one.
Dan
On Thu, Jun 4, 2015 at 4:50 PM, Yan Yin riyijiye1976@users.sf.net wrote:
Hi Dan,
regarding setting relu activation component instead of pnorm, from what you mentioned earlier in this thread (--pnorm-input-dim and --pnorm-output-dim to the same value), I can add below in nnet.config
PnormComponent input-dim=$pnorm_input_dim output-dim=$pnorm_input_dim p=?
I believe what p value is does not really matter in this case, or I do not need to specify p=? in above line?
in the meanwhile I am wondering how such pnorm (same input and output dim) setting will ends up with same as relu activation given y = max(0,x) for relu while y = (|x|^p)^(1/p) for such pnorm?
thanks,
Yan
No, what I was talking about related to the TDNN scripts, which use
the RectfiedLinearComponent in that case. You have to use the
RectifiedLinearComponent if you want ReLU.
Dan