Neuroph defines Tanh(x) as = (1 - E(-x)) / (1 + E(-x)).
Neuroph defines Dt[Tanh(x),x] = 1-Tanh(x)*Tanh(x), where Tanh is as defined above.
The derivative of (1 - E(-x) )/ (1 + E(-x)), however, is 2*E(x)/(1+E(x))^2.
The actual computation of Tanh(x) is: (-1+E(2x))/(1+E(2x))
1. Change calculation of Tanh(x) to: (-1+E(2x))/(1+E(2x)). Previous calculation is incorrect for tanh. What is (1-E(-x))/(1+E(-x)) actually calculating?
2. Change calculation of derivative to: (1+Tanh(x))(1-Tanh(x)). Previous calculation is incorrect. What is (1-tanh(x)*tanh(x)) calculating?
3. To avoid NaN errors, add conditional logic to Sigmoid() and Tanh() similar to (this is for Tanh(), Sigmoid() would return 0 if net < -100. The selection of 100 is relatively arbitrary. All that is needed is a large number):
if (net > 100) {
return 1.0;
}
if (net < -100) {
return -1.0;
}
4. I had a problem with the existing learning rate code. The total errorChange could be a number greater than 1. The existing code, using the default learningRateChange and momentumRateChange values of 0.99926 would cause the learning rate to be increased or decreased by the errorChange (or at least 1). It seems to me that you'd really need to make errorChange proportional to something such as total error possible to make this work. Therefore, I reverted the code back to something similar to the linear model. However, I made the code drop to minimum change when performance got worse. This is now a greedy algorithm.
protected void adjustLearningRate() {
// 1. First approach - probably the best
// bigger error -> smaller learning rate; minimize the error growth
// smaller error -> bigger learning rate; converege faster
// the amount of earning rate change is proportional to error change - by using errorChange
double errorChange = this.previousEpochError - this.totalNetworkError;
if (this.totalNetworkError >= this.previousEpochError) {
// If going wrong way, drop to minimum learning and work our way back up.
// This way we accelerate as we improve.
learningRate=minLearningRate;
} else {
this.learningRate = this.learningRate * (1 + (1 - this.learningRateChange)); // *1.01
if (this.learningRate > this.maxLearningRate)
this.learningRate = this.maxLearningRate;
}
}
protected void adjustMomentum() {
// Probably want to drop momentum to minimum value.
if (this.totalNetworkError >= this.previousEpochError) {
momentum = momentum * momentumChange;
if (momentum < minMomentum)
momentum = minMomentum;
} else {
momentum = momentum * (1 + (1 - momentumChange)); // *1.01
if (momentum > maxMomentum)
momentum = maxMomentum;
}
}
5. The most significant change I made was to gradient classification for backpropagation to avoid getting stuck in flatspots. There are two changes for this.
a. When updating weights in LMS.updateNeuronWeights() and MementumBackpropagation(), I use Math.tanh(neuron.getError()). This minimizes the impact of big error values, which can cause network instability.
b. In BackPropagation.calculateDelta() and SigmoidDeltaRule.adjustOutputNeurons(), I called a new method, getEffectiveGradient(neuron, outputError) which calculated the gradient used in the subsequent code. For example:
//WAS: double delta = outputError * transferFunction.getDerivative(neuronInput);
double gradient = getEffectiveGradient(neuron, outputError);
double delta = outputError * gradient;
neuron.setError(delta);
this.updateNeuronWeights(neuron);
The code to calculate the effective gradient avoids returning a 0 when trying to get off a flatspot:
// In SigmoidDeltaRule.java
protected double getEffectiveGradient(Neuron neuron, double outputError) {
TransferFunction transferFunction = neuron.getTransferFunction();
double neuronInput = neuron.getNetInput();
double gradient = transferFunction.getDerivative(neuronInput);
// If the error is large, we want a large gradient. Get out of flat spots.
// If trying to move neuron input value in direction of error, we
// are trying to move input onto a potential flatspot. Use the given gradient.
// If on a flatspot, move quickly off.
if (outputError*neuronInput>0){
return gradient;
}
double alpha = Math.abs(outputError);
// This calculates the modified gradient. If the error is small, the gradient is unchanged.
// If the error is large, the gradient is big too.
gradient = (1 - alpha) * gradient + alpha;
return gradient;
}
Thank you veru much for these suggestions, I will look into them in more detail. Very interesting ideas and they look like they all have sense.