I've been contemplating a lightweight joone core engine for a long time and finally got around to coding some pieces of it. After doing the NodesAndWeights, which breaks the joone layers down to the nodes level, my idea have always been to try and glue these together again so that you can use it to calculate the output given an input pattern. I'm ready to test this now, and the engine also has a number of nice features, such as being very easy to create all types of nodes (sigmoid, tanh, sin, etc ) and catering for recurrency. The way in which joone orders layers have been a bit of a sticky issue to me and now I am ordering the nodes themselves, which I think bodes well for catering for recurrency.
Anyhow, the idea is not to replace anything at present and I think a typical user will probably never even use this lightweight engine, but it may be a place to easily enhance and expand joone in future. For now, I am trying to plug it right back into the joone environment as a Layer, so that all the usual joone stuff works.
So why all this effort if it does not add new functionality nor change the way things are done?
Well, I'm hoping for performance, but that remains to be seen. The way I am doing it now certainly has less overhead than joone's layers, but still, a layer could have e.g. a 100 nodes (rows in joone) and then its heavy overhead may turn out to be less than that of a 100 lightweight nodes.
Another idea I have that goes hand in hand with performance is multithreating on a machine with more than 1 processor. The RTRL does this already, but that is in the way in which the RTRL updates. If a network is not recurrent and trained in a batch mode on a machine with say 4 processors, then it makes sense to split the patterns up into 4 batches and let each processor chase it's batch through the network. At the end, the weight deltas from the 4 threads are amalgamated and each network is updated and another batch is started.
Next, it should be much easier to train such a lightweight network, although I think the NodesAndWeights sort of does the job already.
Network flexibility is another consideration. Working on the node level certainly is more flexible, if more cumbersome, than working on the layer level.
So this is really a R&D effort at the moment, but it ploughs back into the normal joone stuff as well. The Monitor class, for example, is screaming for a getPatternNumber message and I'm going to add that real soon. The multithreaded stuff could easily be done in joone, and maybe I'll carry it over once I get it right in the lightweight engine. Another way forward could be to simply use an approach like the NodesAndWeights, where the whole network is wrapped and changes to weights are made directly in the bias matrix itself.
Not as radical as rewriting joone, and given the duplication low on the todo list, but maybe breeding ground for future ideas. I'll commit it hopefully soon and then you'll see what it is all about.
Hey, new lightweight joone is working and I like it a lot!
Much, much simpler than joone proper, and yet so easy to extend. The performance is good - albeit not the fireworks I was hoping for. It took 132s on average to ram 100,000,000 patterns through the lightweight version while proper joone took 340s. Speed can be improved more but it will probably get worse - it currently has a lot of checks and balances in and this will probably grow.
All of this happens inside the normal joone stuff. The whole network is wrapped into a joone layer and the normal joone synapses can then be connected to it. Just take your normal network, use it to instantiate a lightweight network, and you can do all of the same stuff as before. No training yet, but just watch this space. I will commit soon so you can see what its all about. Just a handful of classes, will discuss more when it is on the CVS.
For now, it only picks up sigmoid, tanh and linear layers correctly from the base network - the others or even brand new ones are soooo easy to add. It does not yet handle recurrency or fixed weights correctly - actually has support for recurrency but does not pick it up correctly from the base joone network - same with fixed weights.
you keep the project alive. thumbs up!!!
just one point: before you commit the new stuff, i propose to make a last brunch of the old stuff (name it joone_stable or something like that). so we can start cleaning up the head and we still have a stable version that provides full backward compatibility. once the head is stable we can still write a wrapper around.
as usual: just my two cents ;-)
Thanks - but I just committed and now logged in to explain. Apologies, I did not get the message in time.
Still, it is much less dramatic than it appears. The old joone still works and I've made preciously little changes to it. Only added a message to the Network class to return the current pattern number, which eventually I did not even utilise myself. Then, I thought I committed the multiprocessor RTRL before but found the one in CVS to be the old one, so I also committed RTRL and RTRLLearnerPluging.
As for the new stuff, it is all in org.joone.structure. If you don't know about it, you'll never use it and it does not affect the way joone works, but rather copies that and also is able to construct a lightweight network from a proper joone network. A quick review of the classes (let me know if I should rather write this up on the wiki?)
- Node.java : The node interface. The lightweight network operates on the node level and this is the basic building block to use
- AbstractNode.java : Typical node will subclass this
- InputNode.java : A node that does not do internal calcs but simply passes on input, allows for setting of inpu
- BiasNode.java : A special input node that passes on a constant value, typically 1
- ContextNode.java : A context node that blends prior output with current output and introduces a lag
- NodeFactory.java : This shows how to construct a sigmoid, tanh and linear node and also how terribly easy it is to add new types of nodes
- NetworkLayer.java : Here it all comes together. This is instantiated with a normal Network and will then create a lightweight network using the classes mentioned above. Only FFN with sigmoid, tanh and linear layers will work at present. It also will not correctly handle fixed weights, but should be able to handle 'typical' networks and is so easy to extent to cater for the rest.
The beauty is that this extends joone's normal Layer, so you can use all of joone's fancy input layers, then link them up to this, and link from this to get the network's output. This class contains a main method at present with an example where I take a normal joone network, create such a lightweight layer, and then hook it up to the network's input layer with a DirectSynapse. So as the joone network is fed patterns, the lightweight one also gets them and I attach a DirectSynapse to the ligthweight one to catch the outputs and later on compare it with what joone produced. It is the same, more or less, as what joone spat out. The reason for the difference is hat I use the tanh function while joone has some sped up implementation of it, and is only visible in the decimals far to the right of the output. This main method also does a performance comparison between the lightweight and proper networks.
This class is probably the place to start. Its constructor shows how the joone network is used to create a lightweight version and its forward message shows how the lightweight network handles a new pattern. Its backward method is still blank, but implementing this will allow for it to be trained and is probably the next step from my side.
This is such an exciting development to me personally as it simplifies joone quite a bit without throwing away some of the nice things such as JDBC inputs or the GUI. What lies ahead I'm not sure myself, and if this will eventually replace joone proper is debatable, but if you want to try new ideas around training or nodes or network structure, all of it within the framework joone provides, then this is the place to do it.
no problem. as long as we know when you commits have happend we can go back and branch still later on. fortunat. weekend starts soon so i have the time to check out all the stuff. really excited ;-). are there also some running mini examples how to use the new stuff in the code?
Yes. But wait, there is more. I've made a couple of changes already - simplifying and optimising a bit more and am working on the training, which should really not take long. I'll commit within the next few minutes in any case to have the current stuff up to date. Each node previously had a weight / input node pair that was used to calculate the inputs, but I've now combined that into a Connection class. This is also in preparation for training the lightweight network as well as to cater for fixed weight connections.
Best place to start is the main message in NeuralLayer class, which takes a network from the command line and runs a few patterns through it. It then checks those against the output of a lightweight network (a NeuralLayer) and compare them.
OK, I'm committing at present, and will again commit when the training is in.
The picture I have in mind is of the normal joone plug and play framework, with all the fancy joone stuff around the network. You can get inputs from all sorts of places, send outputs to charts and so forth, but the network itself can be wrapped into what is now called a NetworkLayer. This is lightweight, fast, and easy to train or extend. So to develop new training algorithms or create new types of layers is now much easier than before - and it was even easy then. Also, training is more powerful since it is on the node level and much faster since there is less overhead.
PS : The new simplification sees a performance boost. As reported above, it used to take 132s but now takes only 88s for the lightweight network to handle 100,000,000 patterns.
The latest change is really to clean up - previously each node had an array of double weights and an array of input nodes and these had to be kept in sync. Now it is all in one new class - Connection - and easier to handle and faster to loop.
Another commit is on the way - just some bug fixes and now supporting recurrency and fixed weights. Also changes to NodesAndWeights to cater for these and prepare for a backprop implementation.
i assume most of your changes are now commited. is that correct?
Swamped at the moment and wading through month end, but still working daily on it, specifically the bias nodes which joone treats very funny.
Anyhow, the stuff is in if you want to take a look at it.
Still struggling with this bias problem. I also bumped my head into it when developing the NodesAndWeights class. I would often use a tanh layer as an input layer, but since then, rather use one of the linear layers.
Anyhow, suppose you create a new network, and you want the inputs to travel through a tanh sqaushing function. So in joone you create a tanh layer, and you think that if an input of say 0.5 is coming from your input file, joone will take that, apply tanh, and send 0.46 to the next (hidden) layer.
In fact, the tanh layer has a bias, say this is 0.2 at the moment. Then joone will take your 0.5 input, add 0.2 to it, apply tanh to 0.7 and pass 0.6 on to the next layer. Very strange and I wonder what the implications for training will be. When you train the network, this bias is also optimised and I think a huge bias in the input layer has some meaning - maybe that particular input is useless or maybe useful, I haven't tried to figure it out, but it must tell you something.
The NodesAndWeights network has the ability to reset and disable these biases, and I've used such a network to test the lightweight implementation until now. Since I've found that networks where these biases are retained perform much better in practise, I've sort of learned to live with it. This could simply be because such networks have a lot more parameters. Each input node adds an additional weight to the bias input and thus I suppose it should do better. In terms of the lightweight implementation, the whole thing came apart when I used a *real* bias enabled network to test.
To replicate on the node level was easy in NodesAndWeights, since joone still did all the calculations. Now, replicating this in the lightweight engine is turning into a headache since inputs are no longer inputs, but actually a new node in the structure that feeds off a *real* input plus a bias. As such, NodesAndWeights and the new implementation is torn apart.
Anybody else ran into this at any stage?
Oh well, I've addressed this in as far as the lightweight implementation now agrees with joone if you use e.g. a tanh layer as input (haven't really tested much more than tanh...) New commit is coming soon.
The new, fixed source has been checked in. Also, in order to debug a bit, I had to create some eye candy! A network viewer that either present a network as a JTree or a Canvas'sed plot.
The JTree start with its root as the output nodes and works backwards, with the leaves the weight and the tree underneath the leaf the nodes that fire into that node. Self-explanatory I hope as I am having difficulties describing it.
Then also a node plot of the network, not too unlike how I've seen JavaSNNS present networks. It is a bit messy though, since the bias node (in blue) is connected to almost everything. Input nodes are white, context nodes yellow and fixed connections appear in magenta.
If you call this from the command line, it will take the network passed in and then you can view that. To do that, use something like
java -cp <classpath> org.joone.structure.NetworkViewer NameOfNetworkToView.snet
OK, commit is in, and, as usual, now I remember what is still outstanding: only tanh layers supported correctly at present (and I think, not tested, linear layers).
Anyhow, enjoy toying with the NetworkViewer...
== Insert clutter apology here ==
New commit just finished.
Have added, not tested, support for more layers. Following *should* work in lightweight joone: gauss, linear and biased linear, logarithmic, sigmoid, sine and tanh. See org.joone.structure.NodeFactory to add more support, verify existing ones or easily add new and exotic ones (hint hint).
Also dealing with a funny bug in the NodeViewer, where scrolling a large network messes up the display. Have tried to throw some doLayout()'s at it, but to no avail. Any gui expert comments on that are welcome.
Hi, have added backprop to the lightweight implementation which I am going to term july from now on to save some space. I am itching to test it, especially to see if it is a lot faster than joone proper.
July can also save to and restore from xml at present. All of these bare bones at present, but getting there. This just an update, the next commit is still some time off.
Rewrite your rewrites.....
Anyhow, I've rewritten most of the lightweight network after bumping my head into some stuff when implementing the backprop algo. It now works just like joone and can be embedded into a standard joone network (org.joone.net.NeuralNet), feeding off the standard joone input and output synapses and teachers and often makes me wonder if it was really worth it. At least takes ones mind off the market....
Then again, it takes about half the time to train a small network compared to joone proper and has a much more flexible design and can create much more flexible and recurrent networks - still need to test this a bit. It can be saved to and loaded from xml - still need to test this as well - and when simply running it, no training, it is 2 - 3 times faster. It already has the ability to have its input neurons bolted onto a standard double array which will buy even more performance.
Next commit coming soon.
so i assume by embedding it into the standard joone one can use all existing features (transferfunctions, learningalgorithms, optimizations ...)?
btw. have you ever looked at the reinforcement learning approach? i stumbled on this (www.icsi.berkeley.edu/~moody/JForecastMoodyWu.pdf) paper while researching for a paper i have to write. if you would like to make a comment about, i'm very interested .... ;-)
Yes, you can use the standard joone stuff. Input and output synapses, teachers to calculate and graph the error, set the pattern counts and cycles in the monitor and so on. Training no longer works the joone way, which catered specifically for Layer and Synapse training. The former simply refers to a node's bias and the latter to the weights, something that gave rise to a lot of confusion from my perspective. July trains itself, but the backprop algo gives the same result.
I did look at that paper - nice and academic - using monthly data and Sharpe ratios....
I've bumped into reinforcement learning when doing the Kalman filter earlier, but not really familiar with it. Since joone has a bias for (almost) every layer, even if it is an input layer, it seems to approach reinforcement learning in some way, with the input values adjusted by having a bias added to them. I've often tried to figure out if this is a good or bad thing, but I have no real answer.
What I'm looking for is some kind of a catalogue of neural network problems to test the lightweight version with. First up is going to be the XOR problem. But is there a similar problem for say a recurrent net?
From the EKF days, I have this link still in my favourites. Haven't used it at all, but maybe it can help you?
With july, since you operate on individual nodes and weights, I am thinking to toy a bit with pruning.
With joone's bias in input layer issue, I was thinking if one could interpret that bias as maybe indicative of the input itself being useful or not.
These two are related. Maybe to prune a network it is best to look at the weight from the bias, the distribution of node outputs, the distribution of the node's errors....
Another bunch of bugs fixed - very thankful to have joone around to compare - and now the lightweight network training is more or less done. It takes 8 seconds to train the XOR network with 1,000,000 cycles.
Have really tested the training using different neuron types, eg train with sigmoid, tanh, log, gauss etc. I mostly use tanh, sometimes linear and maybe sigmoid. But some of the others that I never use such as gauss and logarithmic actually converged pretty quick, making me thing maybe I must use these types as well. Any comments on these 'other' types?
Anyhow, now to test the XML load and save....
- i'm not sure whether there are tests regarding recurrent networks. my suggestion would have been to test a recurrent network without lagged input against a autoregressive timeseries with a trend. eg. produced by a sinus function.
- thanks for the link (i have a hardcopy of the book from the library).
- regarding the pruning issue: i'm not sure whether the bias is a good indicator. in a sigmoid transfer function it is certainly a valid approach (because bais shifts the boundaries sideways) but for example in a linear transfer function the bias is just used to compensate the difference between the mean value of the output target and the average weigthed input so there's no reason to assume that the size of the bias can be used as an indicator for whether or not the node is "important".
- logarithmic: from the little bit i know about econometrics. sometimes the logarithmic transformation helps to fit the data with a simpler model (e.g. linear regression). but since one has usually no assumption about the model when using a neural network, i'm not sure whether it really helps (may one could use pruning algorithms to figure that out ;-))-
- gauss: quite useful to classifiy the data. i usually think of it as a two-sided sigmoid function.
keep up the great work
Thanks for the feedback. Also thanks for all your inputs.
I have tested the xml and that works, so I'll probably commit within a few days. I've split the lightweight network into an independent class and a subclass that integrates it into joone to keep things tidy. Will try recurrent networks with your suggestion of a sine function and probably commit thereafter.
I now understand the XOR problem a lot better (networks are even muddier than before!)
I read that a XOR network with 1 hidden node and with output node also connected to the input nodes can solve the XOR problem and has only 1 solution, the rest being saddle points. This is exactly the type of network I want to build in the lightweight, where you can easily connect nodes to one another. (To construct this in joone is not difficult either, but goes against its layered approach).
Anyhow, despite the excitement, I could not train that network. I *think* it is because the theoretical solution uses threshold nodes which fire 0 or 1 either side of the threshold.
The closest to this is probably a sigmoid function and maybe the thing needs a couple of billion iterations to train, but mine got stuck somewhere and stopped making progress. Another approach, which is also easy in the lightweight, is to create a new type of node that defines such a threshold node. The problem is that its derivative is not continuous and while I did not do the math, I'm almost sure it will create problems when training.
Joone again is excellent, if only for providing a catalogue of network nodes and how to define them. But do you know perhaps of a reference where all the 'exotic' nodes are enumerated?
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.