Menu

Copied Network Weights Change after running a matrix through the network

Dave
2020-11-16
2020-11-18
  • Dave

    Dave - 2020-11-16

    Davis,

    I've noticed an odd behavior after copying network weights from one network to another.

    This might just be my own ignorance on how the network is working when data first runs through a network. Here's what I'm doing: I'm playing around with a simple autoencoder and I want to separate the original network into its encoder/decoder set and work with each half independently

    Original Net:

    layer<0>        loss_mean_squared_multioutput
    layer<1>        fc       (num_outputs=2048) learning_rate_mult=1 weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0
    layer<2>        prelu    (initial_param_value=0.25)
    layer<3>        fc       (num_outputs=256) learning_rate_mult=1 weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0
    layer<4>        prelu    (initial_param_value=0.25)
    layer<5>        fc       (num_outputs=128) learning_rate_mult=1 weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0
    layer<6>        fc       (num_outputs=16) learning_rate_mult=1 weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0
    layer<7>        prelu    (initial_param_value=0.25)
    layer<8>        fc       (num_outputs=128) learning_rate_mult=1 weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0
    layer<9>        prelu    (initial_param_value=0.25)
    layer<10>       fc       (num_outputs=256) learning_rate_mult=1 weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0
    layer<11>       input<matrix>
    

    Encoder Net:

    layer<0>        loss_mean_squared_multioutput
    layer<1>        fc       (num_outputs=16) learning_rate_mult=1 weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0
    layer<2>        prelu    (initial_param_value=0.25)
    layer<3>        fc       (num_outputs=128) learning_rate_mult=1 weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0
    layer<4>        prelu    (initial_param_value=0.25)
    layer<5>        fc       (num_outputs=256) learning_rate_mult=1 weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0
    layer<6>        input<matrix>
    

    I train the original net without issue and then I copy layers 6-10 of the original network to layers 1-5 of the Encoder Net. I can verify that everything copies correctly. When I run a random input through the network the weights in the encoder network change.

    dlib::matrix<float> td = dlib::matrix_cast<float>(dlib::randm(2048, 1));
    
    auto result_en = encoder_net(td);
    

    However, if I run the input through the network and then copy the weights from the original to the encoder and then run another random input through the encoder the weights don't change. Is this behavior expected or am I doing something wrong with the copying of the network (missing some flag??).

    My copying is just recursive templated functions that do basically this:

    to.layer_details() = from.layer_details();
    

    I thought this copied everything. Any help would be greatly appreciated.

    Thanks,
    Dave

     
    • Davis

      Davis - 2020-11-18

      This is expected. The first time you run a network it will call the
      .setup() function on each layer. Although if you copied the whole layer
      state you would copy over the flag that says setup was already run. But if
      you copy only some subset of a layer's state, like only the
      parameter tensor, then this is an expected outcome.

      On Mon, Nov 16, 2020 at 5:35 PM Dave dremerso@users.sourceforge.net wrote:

      Davis,

      I've noticed an odd behavior after copying network weights from one
      network to another.

      This might just be my own ignorance on how the network is working when
      data first runs through a network. Here's what I'm doing: I'm playing
      around with a simple autoencoder and I want to separate the original
      network into its encoder/decoder set and work with each half independently

      Original Net:
      ~~~
      layer<0> loss_mean_squared_multioutput
      layer<1> fc (num_outputs=2048) learning_rate_mult=1
      weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0
      layer<2> prelu (initial_param_value=0.25)
      layer<3> fc (num_outputs=256) learning_rate_mult=1
      weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0
      layer<4> prelu (initial_param_value=0.25)
      layer<5> fc (num_outputs=128) learning_rate_mult=1
      weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0
      layer<6> fc (num_outputs=16) learning_rate_mult=1
      weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0
      layer<7> prelu (initial_param_value=0.25)
      layer<8> fc (num_outputs=128) learning_rate_mult=1
      weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0
      layer<9> prelu (initial_param_value=0.25)
      layer<10> fc (num_outputs=256) learning_rate_mult=1
      weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0
      layer<11> input<matrix>
      ~~~</matrix>

      Encoder Net:
      ~~~
      layer<0> loss_mean_squared_multioutput
      layer<1> fc (num_outputs=16) learning_rate_mult=1
      weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0
      layer<2> prelu (initial_param_value=0.25)
      layer<3> fc (num_outputs=128) learning_rate_mult=1
      weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0
      layer<4> prelu (initial_param_value=0.25)
      layer<5> fc (num_outputs=256) learning_rate_mult=1
      weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0
      layer<6> input<matrix>
      ~~~</matrix>

      I train the original net without issue and then I copy layers 6-10 of the
      original network to layers 1-5 of the Encoder Net. I can verify that
      everything copies correctly. When I run a random input through the network
      the weights in the encoder network change.

      ~~~
      dlib::matrix<float> td = dlib::matrix_cast<float>(dlib::randm(2048, 1));</float></float>

      auto result_en = encoder_net(td);
      ~~~

      However, if I run the input through the network and then copy the weights
      from the original to the encoder and then run another random input through
      the encoder the weights don't change. Is this behavior expected or am I
      doing something wrong with the copying of the network (missing some
      flag??).

      My copying is just recursive templated functions that do basically this:
      ~~~
      to.layer_details() = from.layer_details();
      ~~~

      I thought this copied everything. Any help would be greatly appreciated.

      Thanks,
      Dave


      Copied Network Weights Change after running a matrix through the network


      Sent from sourceforge.net because you indicated interest in <
      https://sourceforge.net/p/dclib/discussion/442517/>

      To unsubscribe from further messages, please visit <
      https://sourceforge.net/auth/subscriptions/>

       
  • Dave

    Dave - 2020-11-18

    OK. I see it now:

    this_layer_setup_called
    

    is being tracked and used to call setup if needed. So it looks like copying the layer details doesn't copy everything for the layer. But at least I've got an easy work around for now.

    Thanks,
    Dave

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.