Menu

question about DNN model initialization

Help
Yan Yin
2015-06-02
2015-07-07
  • Yan Yin

    Yan Yin - 2015-06-02

    Hi All,

    I am a new user of Kaldi and have little background knowledge of the toolkit. Right now I am trying some benchmark setup to compare Kaldi and our own DNN training toolkit. So I am thinking about some quick plan to do below for the comparison,
    1) convert our data alignment files to Kaldi format
    2) do DNN training with Kaldi
    3) convert Kaldi-trained DNN model back to our own DNN format for testing

    However, when I took a look at DNN training scripts (for Dan's implementation) in swbd 'steps/nnet2/train_pnorm_accel2.sh', I noticed the below initialization of shallow network seems not done from alignment, and tree is needed.

    nnet-am-init $alidir/tree $lang/topo "nnet-init --srand=$srand $dir/nnet.config -|" $dir/0.mdl

    I am wondering, for Dan's DNN implementation, how DNN model is initialized from algorithm side in Kaldi and why tree is needed. The Kaldi homepage "Dan's DNN implementation" section does not have enough information regarding this.

    Thanks,
    Yan

     
    • Daniel Povey

      Daniel Povey - 2015-06-02

      It needs that stuff because the .nnet files contain the
      transition-model as well as the actual neural net. In most situations
      the transition model is not used though. Getting rid of this might
      require writing new binaries.
      Also the nnet2 setup uses nonlinearity types that probably do not
      exist in your setup (p-norm, normalize-layer, splicing layers). If it
      is a speech task it would probably be much less work to just train a
      Kaldi acoustic model, and the performance will probably be better
      also.

      Dan

      I am a new user of Kaldi and have little background knowledge of the
      toolkit. Right now I am trying some benchmark setup to compare Kaldi and our
      own DNN training toolkit. So I am thinking about some quick plan to do below
      for the comparison,
      1) convert our data alignment files to Kaldi format
      2) do DNN training with Kaldi
      3) convert Kaldi-trained DNN model back to our own DNN format for testing

      However, when I took a look at DNN training scripts (for Dan's
      implementation) in swbd 'steps/nnet2/train_pnorm_accel2.sh', I noticed the
      below initialization of shallow network seems not done from alignment, and
      tree is needed.

      nnet-am-init $alidir/tree $lang/topo "nnet-init --srand=$srand
      $dir/nnet.config -|" $dir/0.mdl

      I am wondering, for Dan's DNN implementation, how DNN model is initialized
      from algorithm side in Kaldi and why tree is needed. The Kaldi homepage
      "Dan's DNN implementation" section does not have enough information
      regarding this.

      Thanks,
      Yan


      question about DNN model initialization


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/kaldi/discussion/1355348/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       
  • Yan Yin

    Yan Yin - 2015-06-02

    Thanks Dan!

    what is 'splicing layers'? it seems not mentioned in the documentation, maybe my misunderstanding.

     
    • Daniel Povey

      Daniel Povey - 2015-06-02

      SpliceComponent. For this type of thing you will have to search the
      code, not the documentation; the documentation is only very
      high-level.
      Dan

      On Tue, Jun 2, 2015 at 5:00 PM, Yan Yin riyijiye1976@users.sf.net wrote:

      Thanks Dan!

      what is 'splicing layers'? it seems not mentioned in the documentation,
      maybe my misunderstanding.


      question about DNN model initialization


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/kaldi/discussion/1355348/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       
  • Yan Yin

    Yan Yin - 2015-06-04

    Thanks Dan.

    does your implementation only support the configuration you mentioned (p-norm, norm layer, slicing layer), or also support pretty standard DNN configuration?

    Yan

     
    • Daniel Povey

      Daniel Povey - 2015-06-04

      It does support more standard configurations but the performance of
      those is not always quite as good, and it hasn't been tuned as
      recently. Actually ReLUs sometimes give better performance than
      p-norm, but we always train them with the normalization layer to
      ensure stability during training, and you can't test without that
      layer being included. So without adding that to your toolkit you
      wouldn't be able to do the comparison. Anyway that would probably be
      the least of your problems.

      Dan

      On Thu, Jun 4, 2015 at 12:53 PM, Yan Yin riyijiye1976@users.sf.net wrote:

      Thanks Dan.

      does your implementation only support the configuration you mentioned
      (p-norm, norm layer, slicing layer), or also support pretty standard DNN
      configuration?

      Yan


      question about DNN model initialization


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/kaldi/discussion/1355348/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       
      • Yan Yin

        Yan Yin - 2015-06-04

        Thanks Dan.

        So in case of relu, you are saying, without normalziation, the training is not stable. We have been training relu net without any normalization and did not see the stability issue. we tried mean-normalized SGD as well and did not turned out to help. So does the stability issue without normalization layer has something to do with your parallelization and optimization methods (parameter averaging and natural gradient)?

        I am OK with the relu net with normalization layer, and decode with our decoder. In decoding, the normalization layer should be treated as standard layer without extra support needed from our decoder side. Is there any existing recipe with relu net? It is ok if it is not well tuned.

        regarding slicing layers, I know this is to handle the left and right feature context. But by looking at the nnet.config this still looks confusing to me. We are using pretty standard approach to directly feed feature with context to input layer. At this point I am trying to quickly get some idea without the need to look into source codes (I am pretty new to Kaldi), I want to see whether extra support is needed from our decoder for slicing layers. Overall my first goal is to set up plan quickly. Moving forward, I will for sure need to look into more code details.

        By the way, do you have a sample run with all output dirs for either your wsj or switchboard DNN receipt somewhere that I can access?

        thanks,
        Yan

         
        • Daniel Povey

          Daniel Povey - 2015-06-04

          So in case of relu, you are saying, without normalziation, the training is
          not stable. We have been training relu net without any normalization and did
          not see the stability issue. we tried mean-normalized SGD as well and did
          not turned out to help. So does the stability issue without normalization
          layer has something to do with your parallelization and optimization methods
          (parameter averaging and natural gradient)?

          Not really. The natural gradient actually improves the stability.
          People who train ReLUs with many layers usually have to resort to some
          kind of trick to stabilize it, this happens to be the trick we have
          chosen.

          I am OK with the relu net with normalization layer, and decode with our
          decoder. In decoding, the normalization layer should be treated as standard
          layer without extra support needed from our decoder side. Is there any
          existing recipe with relu net? It is ok if it is not well tuned.

          regarding slicing layers, I know this is to handle the left and right
          feature context. But by looking at the nnet.config this still looks
          confusing to me. We are using pretty standard approach to directly feed
          feature with context to input layer. At this point I am trying to quickly
          get some idea without the need to look into source codes (I am pretty new to
          Kaldi), I want to see whether extra support is needed from our decoder for
          slicing layers. Overall my first goal is to set up plan quickly. Moving
          forward, I will for sure need to look into more code details.

          In the nnet2 code the splicing is done internally to the network but
          you could just discard the SpliceComponent and do it from externally.
          However the current ReLU recipes that we are using (e.g.
          steps/nnet2/train_multisplice_accel2.sh if you set --pnorm-input-dim
          and --pnorm-output-dim to the same value) actually also do splicing at
          intermediate layers so your framework wouldn't be able to handle it.
          We don't have any ReLU recipes currently, that don't do that.

          By the way, do you have a sample run with all output dirs for either your
          wsj or switchboard DNN receipt somewhere that I can access?

          You can look on kaldi-asr.org and see if there is something.

          You obviously have a lot of questions, because you've chosen to use
          Kaldi in a way that is inherently quite difficult. I'm a busy person
          and I'm not going to be able to hold your hand and take you through
          all the things you need to do.

          Dan

          question about DNN model initialization


          Sent from sourceforge.net because you indicated interest in
          https://sourceforge.net/p/kaldi/discussion/1355348/

          To unsubscribe from further messages, please visit
          https://sourceforge.net/auth/subscriptions/

           
  • Yan Yin

    Yan Yin - 2015-06-04

    Thanks Dan.

    I understand, I will not expect that much help when I really start playing with the tool. Currently all the general questions are to estimate the effort we need for the work.

    << However the current ReLU recipes that we are using (e.g.steps/nnet2
    << /train_multisplice_accel2.sh if you set --pnorm-input-dim and --pnorm-output-dim to the << same value) actually also do splicing at
    << intermediate layers so your framework wouldn't be able to handle it.
    << We don't have any ReLU recipes currently, that don't do that.

    Just want to confirm, I expect the modification of relu multi-splice receipt (so internal splicing of intermediate layer not done) to be just at shell script level with configuration changes, is it right? or c++ code level as well?

    Yan

     
    • Daniel Povey

      Daniel Povey - 2015-06-04

      Yes, the changes are at the command line level, you would just remove
      all the splicing specifications that say layer1/xxx and layer2/xxx and
      so on, leaving only the layer0 one.

      Dan

      On Thu, Jun 4, 2015 at 4:50 PM, Yan Yin riyijiye1976@users.sf.net wrote:

      Thanks Dan.

      I understand, I will not expect that much help when I really start playing
      with the tool. Currently all the general questions are to estimate the
      effort we need for the work.

      << However the current ReLU recipes that we are using (e.g.steps/nnet2
      << /train_multisplice_accel2.sh if you set --pnorm-input-dim and
      --pnorm-output-dim to the << same value) actually also do splicing at
      << intermediate layers so your framework wouldn't be able to handle it.
      << We don't have any ReLU recipes currently, that don't do that.

      Just want to confirm, I expect the modification of relu multi-splice receipt
      (so internal splicing of intermediate layer not done) to be just at shell
      script level with configuration changes, is it right? or c++ code level as
      well?

      Yan


      question about DNN model initialization


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/kaldi/discussion/1355348/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       
  • Yan Yin

    Yan Yin - 2015-07-07

    Hi Dan,

    regarding setting relu activation component instead of pnorm, from what you mentioned earlier in this thread (--pnorm-input-dim and --pnorm-output-dim to the same value), I can add below in nnet.config

    PnormComponent input-dim=$pnorm_input_dim output-dim=$pnorm_input_dim p=?

    I believe what p value is does not really matter in this case, or I do not need to specify p=? in above line?

    in the meanwhile I am wondering how such pnorm (same input and output dim) setting will ends up with same as relu activation given y = max(0,x) for relu while y = (|x|^p)^(1/p) for such pnorm?

    thanks,
    Yan

     
    • Daniel Povey

      Daniel Povey - 2015-07-07

      regarding setting relu activation component instead of pnorm, from what you
      mentioned earlier in this thread (--pnorm-input-dim and --pnorm-output-dim
      to the same value), I can add below in nnet.config

      PnormComponent input-dim=$pnorm_input_dim output-dim=$pnorm_input_dim p=?

      I believe what p value is does not really matter in this case, or I do not
      need to specify p=? in above line?

      in the meanwhile I am wondering how such pnorm (same input and output dim)
      setting will ends up with same as relu activation given y = max(0,x) for
      relu while y = (|x|^p)^(1/p) for such pnorm?

      No, what I was talking about related to the TDNN scripts, which use
      the RectfiedLinearComponent in that case. You have to use the
      RectifiedLinearComponent if you want ReLU.

      Dan


      question about DNN model initialization


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/kaldi/discussion/1355348/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/