I've noticed an odd behavior after copying network weights from one network to another.
This might just be my own ignorance on how the network is working when data first runs through a network. Here's what I'm doing: I'm playing around with a simple autoencoder and I want to separate the original network into its encoder/decoder set and work with each half independently
I train the original net without issue and then I copy layers 6-10 of the original network to layers 1-5 of the Encoder Net. I can verify that everything copies correctly. When I run a random input through the network the weights in the encoder network change.
However, if I run the input through the network and then copy the weights from the original to the encoder and then run another random input through the encoder the weights don't change. Is this behavior expected or am I doing something wrong with the copying of the network (missing some flag??).
My copying is just recursive templated functions that do basically this:
to.layer_details()=from.layer_details();
I thought this copied everything. Any help would be greatly appreciated.
Thanks,
Dave
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This is expected. The first time you run a network it will call the
.setup() function on each layer. Although if you copied the whole layer
state you would copy over the flag that says setup was already run. But if
you copy only some subset of a layer's state, like only the
parameter tensor, then this is an expected outcome.
I've noticed an odd behavior after copying network weights from one
network to another.
This might just be my own ignorance on how the network is working when
data first runs through a network. Here's what I'm doing: I'm playing
around with a simple autoencoder and I want to separate the original
network into its encoder/decoder set and work with each half independently
Original Net:
~~~
layer<0> loss_mean_squared_multioutput
layer<1> fc (num_outputs=2048) learning_rate_mult=1
weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0
layer<2> prelu (initial_param_value=0.25)
layer<3> fc (num_outputs=256) learning_rate_mult=1
weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0
layer<4> prelu (initial_param_value=0.25)
layer<5> fc (num_outputs=128) learning_rate_mult=1
weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0
layer<6> fc (num_outputs=16) learning_rate_mult=1
weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0
layer<7> prelu (initial_param_value=0.25)
layer<8> fc (num_outputs=128) learning_rate_mult=1
weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0
layer<9> prelu (initial_param_value=0.25)
layer<10> fc (num_outputs=256) learning_rate_mult=1
weight_decay_mult=1 bias_learning_rate_mult=1 bias_weight_decay_mult=0
layer<11> input<matrix>
~~~</matrix>
I train the original net without issue and then I copy layers 6-10 of the
original network to layers 1-5 of the Encoder Net. I can verify that
everything copies correctly. When I run a random input through the network
the weights in the encoder network change.
However, if I run the input through the network and then copy the weights
from the original to the encoder and then run another random input through
the encoder the weights don't change. Is this behavior expected or am I
doing something wrong with the copying of the network (missing some
flag??).
My copying is just recursive templated functions that do basically this:
~~~
to.layer_details() = from.layer_details();
~~~
I thought this copied everything. Any help would be greatly appreciated.
is being tracked and used to call setup if needed. So it looks like copying the layer details doesn't copy everything for the layer. But at least I've got an easy work around for now.
Thanks,
Dave
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Davis,
I've noticed an odd behavior after copying network weights from one network to another.
This might just be my own ignorance on how the network is working when data first runs through a network. Here's what I'm doing: I'm playing around with a simple autoencoder and I want to separate the original network into its encoder/decoder set and work with each half independently
Original Net:
Encoder Net:
I train the original net without issue and then I copy layers 6-10 of the original network to layers 1-5 of the Encoder Net. I can verify that everything copies correctly. When I run a random input through the network the weights in the encoder network change.
However, if I run the input through the network and then copy the weights from the original to the encoder and then run another random input through the encoder the weights don't change. Is this behavior expected or am I doing something wrong with the copying of the network (missing some flag??).
My copying is just recursive templated functions that do basically this:
I thought this copied everything. Any help would be greatly appreciated.
Thanks,
Dave
This is expected. The first time you run a network it will call the
.setup() function on each layer. Although if you copied the whole layer
state you would copy over the flag that says setup was already run. But if
you copy only some subset of a layer's state, like only the
parameter tensor, then this is an expected outcome.
On Mon, Nov 16, 2020 at 5:35 PM Dave dremerso@users.sourceforge.net wrote:
OK. I see it now:
this_layer_setup_called
is being tracked and used to call setup if needed. So it looks like copying the layer details doesn't copy everything for the layer. But at least I've got an easy work around for now.
Thanks,
Dave