Recent changes to support-requestshttp://sourceforge.net/p/neuroph/support-requests/2010-12-06T17:06:46ZFlatspot avoidance2010-12-06T17:06:46Z2010-12-06T17:06:46ZDavidhttp://sourceforge.net/u/ddreyfus/http://sourceforge.net61d22555d2415e4e0e00fff86d2d0d5917590fa4<div class="markdown_content"><p>Neuroph defines Tanh(x) as = (1 - E(-x)) / (1 + E(-x)).<br />
Neuroph defines Dt[Tanh(x),x] = 1-Tanh(x)*Tanh(x), where Tanh is as defined above.<br />
The derivative of (1 - E(-x) )/ (1 + E(-x)), however, is 2*E(x)/(1+E(x))^2.<br />
The actual computation of Tanh(x) is: (-1+E(2x))/(1+E(2x))</p>
<p>1. Change calculation of Tanh(x) to: (-1+E(2x))/(1+E(2x)). Previous calculation is incorrect for tanh. What is (1-E(-x))/(1+E(-x)) actually calculating?<br />
2. Change calculation of derivative to: (1+Tanh(x))(1-Tanh(x)). Previous calculation is incorrect. What is (1-tanh(x)*tanh(x)) calculating?<br />
3. To avoid NaN errors, add conditional logic to Sigmoid() and Tanh() similar to (this is for Tanh(), Sigmoid() would return 0 if net < -100. The selection of 100 is relatively arbitrary. All that is needed is a large number): <br />
if (net > 100) {<br />
return 1.0;<br />
}<br />
if (net < -100) {<br />
return -1.0;<br />
}<br />
4. I had a problem with the existing learning rate code. The total errorChange could be a number greater than 1. The existing code, using the default learningRateChange and momentumRateChange values of 0.99926 would cause the learning rate to be increased or decreased by the errorChange (or at least 1). It seems to me that you'd really need to make errorChange proportional to something such as total error possible to make this work. Therefore, I reverted the code back to something similar to the linear model. However, I made the code drop to minimum change when performance got worse. This is now a greedy algorithm.<br />
protected void adjustLearningRate() {<br />
// 1. First approach - probably the best<br />
// bigger error -> smaller learning rate; minimize the error growth<br />
// smaller error -> bigger learning rate; converege faster<br />
// the amount of earning rate change is proportional to error change - by using errorChange</p>
<p>double errorChange = this.previousEpochError - this.totalNetworkError;</p>
<p>if (this.totalNetworkError >= this.previousEpochError) {<br />
// If going wrong way, drop to minimum learning and work our way back up.<br />
// This way we accelerate as we improve.<br />
learningRate=minLearningRate;<br />
} else {<br />
this.learningRate = this.learningRate * (1 + (1 - this.learningRateChange)); // *1.01</p>
<p>if (this.learningRate > this.maxLearningRate)<br />
this.learningRate = this.maxLearningRate;</p>
<p>}<br />
}</p>
<p>protected void adjustMomentum() {<br />
// Probably want to drop momentum to minimum value.<br />
if (this.totalNetworkError >= this.previousEpochError) {<br />
momentum = momentum * momentumChange;</p>
<p>if (momentum < minMomentum)<br />
momentum = minMomentum;</p>
<p>} else {<br />
momentum = momentum * (1 + (1 - momentumChange)); // *1.01</p>
<p>if (momentum > maxMomentum)<br />
momentum = maxMomentum;</p>
<p>}<br />
}<br />
5. The most significant change I made was to gradient classification for backpropagation to avoid getting stuck in flatspots. There are two changes for this.<br />
a. When updating weights in LMS.updateNeuronWeights() and MementumBackpropagation(), I use Math.tanh(neuron.getError()). This minimizes the impact of big error values, which can cause network instability.<br />
b. In BackPropagation.calculateDelta() and SigmoidDeltaRule.adjustOutputNeurons(), I called a new method, getEffectiveGradient(neuron, outputError) which calculated the gradient used in the subsequent code. For example:<br />
//WAS: double delta = outputError * transferFunction.getDerivative(neuronInput);<br />
double gradient = getEffectiveGradient(neuron, outputError);<br />
double delta = outputError * gradient;<br />
neuron.setError(delta);<br />
this.updateNeuronWeights(neuron);</p>
<p>The code to calculate the effective gradient avoids returning a 0 when trying to get off a flatspot:<br />
// In SigmoidDeltaRule.java<br />
protected double getEffectiveGradient(Neuron neuron, double outputError) {<br />
TransferFunction transferFunction = neuron.getTransferFunction();<br />
double neuronInput = neuron.getNetInput();<br />
double gradient = transferFunction.getDerivative(neuronInput);<br />
// If the error is large, we want a large gradient. Get out of flat spots.<br />
// If trying to move neuron input value in direction of error, we<br />
// are trying to move input onto a potential flatspot. Use the given gradient.<br />
// If on a flatspot, move quickly off.<br />
if (outputError*neuronInput>0){<br />
return gradient;<br />
}<br />
double alpha = Math.abs(outputError);<br />
// This calculates the modified gradient. If the error is small, the gradient is unchanged.<br />
// If the error is large, the gradient is big too.<br />
gradient = (1 - alpha) * gradient + alpha;<br />
return gradient;<br />
}</p></div>training serially correlated training data2010-12-02T16:40:10Z2010-12-02T16:40:10ZDavidhttp://sourceforge.net/u/ddreyfus/http://sourceforge.net48b86b72614ff055c4f1f376f4de7acbd3db9af5<div class="markdown_content"><p>My training data is serially correlated; each row has overlapping data with the prior row. When I train the neural net, the weights are adjusted after each input record. Would this explain why I get a low error when training? When I save the network, the last set of weights are saved on disk, right? When I use this network and test it in non-training mode (programmatically) against the same training data set I find that my total network error is very high. Although the network appeared trained, it wasn't trained well at all. What do you suggest for training such a network? Should the weights really be changed after each record in the training set? Perhaps the weights should be changed after all rows have been evaluated? Alternatively, should I randomize the training set to solve this problem?</p></div>Java errors reading network from disk2010-12-02T14:48:45Z2010-12-02T14:48:45ZDavidhttp://sourceforge.net/u/ddreyfus/http://sourceforge.nete16a8533b7c7f50f4a746fa2308c6b29ed9ab219<div class="markdown_content"><p>When I read a network in nxml format from disk into easy neurons and then calculate the network, java generates a NullPointerException when the observers are called. I traced the problem to null entries in the network's observers vector. Null entries are added when the network is created because that's what's written to disk when the network is saved. I modivied loadNeuralNetwork as shown below to clear the observers vector.</p>
<p>public NeuralNetwork loadNeuralNetwork(String filePath) {<br />
NeuralNetwork nnet;<br />
String fileExtension = FileUtils.getExtension(filePath);<br />
if (fileExtension == null) {<br />
fileExtension = FileUtils.nn;<br />
}</p>
<p>try {<br />
if (fileExtension.equals(FileUtils.nn)) {<br />
nnet = NeuralNetwork.load(filePath);<br />
} else if (fileExtension.equals(FileUtils.nxml)) {<br />
NeuralNetworkXmlFile xmlFile = new NeuralNetworkXmlFile();<br />
nnet = xmlFile.load(filePath);<br />
// Delete observers recorded in the on disk version of the net.<br />
nnet.deleteObservers();<br />
return nnet;<br />
}<br />
} catch (Exception e) {<br />
e.printStackTrace();<br />
}</p>
<p>return null;<br />
}</p></div>NullPointerException in doLearningEpoch2010-12-02T13:17:04Z2010-12-02T13:17:04ZDavidhttp://sourceforge.net/u/ddreyfus/http://sourceforge.net8f2e3071fc3ecdba2a36347ca0eb9ac291ee7526<div class="markdown_content"><p>In SUpervisedLearning.java, doLearningEpoch()<br />
Please consider adding the following line to the top of the function:</p>
<p>if (trainingSet==null) return;</p>
<p>This will stop a NullPointerException if the Train button is pressed before a training set has been assigned to a network.</p></div>totalNetworkError2010-12-01T17:21:35Z2010-12-01T17:21:35ZDavidhttp://sourceforge.net/u/ddreyfus/http://sourceforge.net1052887b513c33ed37a39281557fa5239ab67778<div class="markdown_content"><p>UpdateTotalNetworkError() is called for each record in the training set. totalNetworkError is incremented by sqrErrorSum/(2*patternError.size()). If there is one desired output, totalNetworkError sums to 1/2 sum squared errors. Does this seem correct?</p>
<p>The size of the training set effects the totalNetworkError; thus, a large training set will show a larger totalNetworkError than the same network will show on a smaller training set. This effects subsequent calculations for learning rate, momentum, and stopping conditions. Would it not make more sense to use a mean sum of squared errors (MSE) rather than the (1/2) sum of squared errors (SSE) as a measure of network accuracy?</p></div>