Menu

combine feedforword layer with lstm layer

Xingyu Na
2015-01-20
2015-01-22
  • Xingyu Na

    Xingyu Na - 2015-01-20

    Hi all,

    Did anyone try training a combination of feedforward layer and lstm
    layer using CURRENNT?
    My base network has 5 hidden layers of 500 blstm units,
    (0) input [size: 117]
    (1) blstm [size: 500, bias: 1.0, weights: 737500]
    (2) blstm [size: 500, bias: 1.0, weights: 1503500]
    (3) blstm [size: 500, bias: 1.0, weights: 1503500]
    (4) blstm [size: 500, bias: 1.0, weights: 1503500]
    (5) blstm [size: 500, bias: 1.0, weights: 1503500]
    (6) softmax [size: 6245, bias: 1.0, weights: 3128745]
    (7) multiclass_classification [size: 6245]
    It worked fine. Now I replace the first and last blstm layers with
    feedforward layer, i.e.
    (0) input [size: 117]
    (1) feedforward_logistic [size: 2048, bias: 1.0, weights: 241664]
    (2) blstm [size: 500, bias: 1.0, weights: 4599500]
    (3) blstm [size: 500, bias: 1.0, weights: 1503500]
    (4) blstm [size: 500, bias: 1.0, weights: 1503500]
    (5) feedforward_logistic [size: 2048, bias: 1.0, weights: 1026048]
    (6) softmax [size: 6245, bias: 1.0, weights: 12796005]
    (7) multiclass_classification [size: 6245]
    And remain other configurations unchanged. However, the first iteration
    gives the validation error 99.99%, which is obviously wrong and not need
    to proceed. Do you have any clue on this? How should I change the
    configurations?

    Best,
    Xingyu

     
  • florian.e

    florian.e - 2015-01-22

    Have you tried if this also appears if you change one of the middle layers only to feed_forward or only the first or last? Also try feedforward_tanh (this produces -1 to +1 output, while logistic produces 0...1), but it shouldn't really matter. Then, it would also help if you could post the training logs in both cases somewhere, and also the errors after the first epoch, i.e. for the first few epochs to see if it changes.

    Another possibility is to run cuda-memcheck on both cases (slow!) to see if there are any memory errors in the feedforward case that could corrupt the training.

     

Log in to post a comment.