combine feedforword layer with lstm layer

CUDA-enabled machine learning library for recurrent neural networks

Brought to you by: feyben, johber, weninger

combine feedforword layer with lstm layer

Forum: General Discussion

Creator: Xingyu Na

Created: 2015-01-20

Updated: 2015-01-22

Xingyu Na - 2015-01-20

Hi all,

Did anyone try training a combination of feedforward layer and lstm
layer using CURRENNT?
My base network has 5 hidden layers of 500 blstm units,
(0) input [size: 117]
(1) blstm [size: 500, bias: 1.0, weights: 737500]
(2) blstm [size: 500, bias: 1.0, weights: 1503500]
(3) blstm [size: 500, bias: 1.0, weights: 1503500]
(4) blstm [size: 500, bias: 1.0, weights: 1503500]
(5) blstm [size: 500, bias: 1.0, weights: 1503500]
(6) softmax [size: 6245, bias: 1.0, weights: 3128745]
(7) multiclass_classification [size: 6245]
It worked fine. Now I replace the first and last blstm layers with
feedforward layer, i.e.
(0) input [size: 117]
(1) feedforward_logistic [size: 2048, bias: 1.0, weights: 241664]
(2) blstm [size: 500, bias: 1.0, weights: 4599500]
(3) blstm [size: 500, bias: 1.0, weights: 1503500]
(4) blstm [size: 500, bias: 1.0, weights: 1503500]
(5) feedforward_logistic [size: 2048, bias: 1.0, weights: 1026048]
(6) softmax [size: 6245, bias: 1.0, weights: 12796005]
(7) multiclass_classification [size: 6245]
And remain other configurations unchanged. However, the first iteration
gives the validation error 99.99%, which is obviously wrong and not need
to proceed. Do you have any clue on this? How should I change the
configurations?

Best,
Xingyu

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

florian.e - 2015-01-22

Have you tried if this also appears if you change one of the middle layers only to feed_forward or only the first or last? Also try feedforward_tanh (this produces -1 to +1 output, while logistic produces 0...1), but it shouldn't really matter. Then, it would also help if you could post the training logs in both cases somewhere, and also the errors after the first epoch, i.e. for the first few epochs to see if it changes.

Another possibility is to run cuda-memcheck on both cases (slow!) to see if there are any memory errors in the feedforward case that could corrupt the training.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.