dlib C++ Library / Discussion / Open Discussion: Copied Network Weights Change after running a matrix through the network

Dave - 2020-11-16

Davis,

I've noticed an odd behavior after copying network weights from one network to another.

This might just be my own ignorance on how the network is working when data first runs through a network. Here's what I'm doing: I'm playing around with a simple autoencoder and I want to separate the original network into its encoder/decoder set and work with each half independently

Original Net:

layer<0> loss_mean_squared_multioutput layer<1> fc (num_outputs=2048) learning_rate_mult=1 weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0 layer<2> prelu (initial_param_value=0.25) layer<3> fc (num_outputs=256) learning_rate_mult=1 weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0 layer<4> prelu (initial_param_value=0.25) layer<5> fc (num_outputs=128) learning_rate_mult=1 weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0 layer<6> fc (num_outputs=16) learning_rate_mult=1 weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0 layer<7> prelu (initial_param_value=0.25) layer<8> fc (num_outputs=128) learning_rate_mult=1 weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0 layer<9> prelu (initial_param_value=0.25) layer<10> fc (num_outputs=256) learning_rate_mult=1 weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0 layer<11> input<matrix>

Encoder Net:

layer<0> loss_mean_squared_multioutput layer<1> fc (num_outputs=16) learning_rate_mult=1 weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0 layer<2> prelu (initial_param_value=0.25) layer<3> fc (num_outputs=128) learning_rate_mult=1 weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0 layer<4> prelu (initial_param_value=0.25) layer<5> fc (num_outputs=256) learning_rate_mult=1 weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0 layer<6> input<matrix>

I train the original net without issue and then I copy layers 6-10 of the original network to layers 1-5 of the Encoder Net. I can verify that everything copies correctly. When I run a random input through the network the weights in the encoder network change.

dlib::matrix<float> td = dlib::matrix_cast<float>(dlib::randm(2048, 1)); auto result_en = encoder_net(td);

However, if I run the input through the network and then copy the weights from the original to the encoder and then run another random input through the encoder the weights don't change. Is this behavior expected or am I doing something wrong with the copying of the network (missing some flag??).

My copying is just recursive templated functions that do basically this:

to.layer_details() = from.layer_details();

I thought this copied everything. Any help would be greatly appreciated.

Thanks,
Dave
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Davis - 2020-11-18
  
  This is expected. The first time you run a network it will call the
  .setup() function on each layer. Although if you copied the whole layer
  state you would copy over the flag that says setup was already run. But if
  you copy only some subset of a layer's state, like only the
  parameter tensor, then this is an expected outcome.
  
  On Mon, Nov 16, 2020 at 5:35 PM Dave dremerso@users.sourceforge.net wrote:
  
  Davis,
  
  I've noticed an odd behavior after copying network weights from one
  network to another.
  
  This might just be my own ignorance on how the network is working when
  data first runs through a network. Here's what I'm doing: I'm playing
  around with a simple autoencoder and I want to separate the original
  network into its encoder/decoder set and work with each half independently
  
  Original Net:
  ~~~
  layer<0> loss_mean_squared_multioutput
  layer<1> fc (num_outputs=2048) learning_rate_mult=1
  weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0
  layer<2> prelu (initial_param_value=0.25)
  layer<3> fc (num_outputs=256) learning_rate_mult=1
  weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0
  layer<4> prelu (initial_param_value=0.25)
  layer<5> fc (num_outputs=128) learning_rate_mult=1
  weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0
  layer<6> fc (num_outputs=16) learning_rate_mult=1
  weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0
  layer<7> prelu (initial_param_value=0.25)
  layer<8> fc (num_outputs=128) learning_rate_mult=1
  weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0
  layer<9> prelu (initial_param_value=0.25)
  layer<10> fc (num_outputs=256) learning_rate_mult=1
  weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0
  layer<11> input<matrix>
  ~~~</matrix>
  
  Encoder Net:
  ~~~
  layer<0> loss_mean_squared_multioutput
  layer<1> fc (num_outputs=16) learning_rate_mult=1
  weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0
  layer<2> prelu (initial_param_value=0.25)
  layer<3> fc (num_outputs=128) learning_rate_mult=1
  weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0
  layer<4> prelu (initial_param_value=0.25)
  layer<5> fc (num_outputs=256) learning_rate_mult=1
  weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0
  layer<6> input<matrix>
  ~~~</matrix>
  
  I train the original net without issue and then I copy layers 6-10 of the
  original network to layers 1-5 of the Encoder Net. I can verify that
  everything copies correctly. When I run a random input through the network
  the weights in the encoder network change.
  
  ~~~
  dlib::matrix<float> td = dlib::matrix_cast<float>(dlib::randm(2048, 1));</float></float>
  
  auto result_en = encoder_net(td);
  ~~~
  
  However, if I run the input through the network and then copy the weights
  from the original to the encoder and then run another random input through
  the encoder the weights don't change. Is this behavior expected or am I
  doing something wrong with the copying of the network (missing some
  flag??).
  
  My copying is just recursive templated functions that do basically this:
  ~~~
  to.layer_details() = from.layer_details();
  ~~~
  
  I thought this copied everything. Any help would be greatly appreciated.
  
  Thanks,
  Dave
  
  Copied Network Weights Change after running a matrix through the network
  
  Sent from sourceforge.net because you indicated interest in <
  https://sourceforge.net/p/dclib/discussion/442517/>
  
  To unsubscribe from further messages, please visit <
  https://sourceforge.net/auth/subscriptions/>
  
  alternate
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave - 2020-11-18

OK. I see it now:

this_layer_setup_called

is being tracked and used to call setup if needed. So it looks like copying the layer details doesn't copy everything for the layer. But at least I've got an easy work around for now.

Thanks,
Dave
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Copied Network Weights Change after running a matrix through the network

Forums

Help

Copied Network Weights Change after running a matrix through the network

Copied Network Weights Change after running a matrix through the network

Forums

Help

Copied Network Weights Change after running a matrix through the network document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Copied Network Weights Change after running a matrix through the network