I've uploaded the pristine 2.0.0RC2 as well as the EKF and RTRL samples and a bit of a manual to
for your perusal. When I get more time and/or a functional CVS client I'll figure out how to do all of this properly via CVS. For now, those who are interested in any of these can just visit that page.
very nice! will try it out.
- why starting a new branch for rtrl/ekf? in my opinion this is a huge improvement for the entire engine, so why not developing it in the trunk and just branch (or just tag) 2.0.2 ?.
anyway it's your decision.
- concerning a gui-cvs client: i like the eclipse cvs client. it's really neat :-) smartcvs is ok as well (even though nothing compares to subversion with turtoise :-( ).
- concerning the dev-ide: i just think it is easier to have one common ide (especially for things like unit-tests, findbugs etc.), so for me netbeans is ok as well. there would also be the opportunity to get intellij for free: http://www.jetbrains.com/idea/opensource/license.html
any other suggestions?
Sounds good, will check it out.
I agree with hofi85, I think it is best to branch to 2.0.2.
About the CVS client, I use TortoiseCVS, and it works fine for me, I havn't tried any other GUIs.
Concerning the dev-ide, it would probably be easiere if we all used the same IDE, but then again we can't be sure everybody does.
Wow guys, thanks for the feedback.
- new branch
In order to get recurrency to work properly, I had to make a lot of fundamental changes to the joone engine. Previously, a network would send a stop pattern at the end and everything that needed to do cleanup would watch out for that. Now, in order to get the recurrency to work, I do a similar thing, but at the start. Context layers have been given an 'initial state' which can be changed and - eventually - optimised. Every time the network is initialised, the initial state is reset, and this happens at the start. This can't be done at the end, otherwise it messes up the state that gets fired from the context layers when the next pattern is rammed through the network, say to project the first out of sample point.
So I've made quite a few potentially far reaching changes to joone and what used to work may no longer do and I thought to keep it separate for now. My initial idea was also to work on just one branch, but I did not expect to make such dramatic changes.
I typically use netbeans's built in cvs, but have been using cervisia for joone until I upgraded to Fedora 9, which seems to have some issues with Cervisia. Now I use tkcvs, but I'll take a look at the ones you mention.
I've used netbeans for ages and its really cool, but I wouldn't mind using eclipse for this, even if just to see what the other half uses. But this shouldn't be an issue at all otherwise we'll run into it whenever somebody new joins.
so i tracked your rtrl implementation ferra and i like the way you've done it. great job :-).
now i have some questions too:
- from the testcase and the documentation i assume the RTRLLearnerFactory will be removed, is that correct?
- do you had any concrete recurrent network architecture in mind? using the context layer one can implement elman/jordan networks but no fully recurrent networks like zipser networks aren't possible so far, right? of course this isn't the "fault" of your rtrl implementation but of the layered design of joone which is quite rigid...may we have to make the network construction more flexible.
- i also like the way the weight management is "outsourced", it's much more cleaner now. using this approach it seems also much more easier to implement bptt. do you already thought on/ started implementing it? too ineffective? otherwhise i will have a try implementing it.
anyway: thumbs up!
Yes, initially I tried to implement RTRL as a learner factory, having read about this in the joone manual. After a long struggle, the NodesAndWeights class came into life and more or less replaced all of that. As the RTRL/EKFFFN/EKFRNN-LearnerPlugin shows, it is much easier to add learners using NodesAndWeights than the normal joone method. Joone's learner mechanism caters for all flavours of backprop, and allows for you to easily fiddle with the weights after each cycle. But you can't get away from the backpropagation, it really allows you to intervene at all places in backprop.
This NodesAndWeights is also where you should look if you want to start from scratch. An idea I have is to use the joone GUI to build networks but then wrap it into a sort of cut down NodesAndWeights that will be fast and lightweight. I think that will also easily allow for zipser/others.
I know very little about either networks or recurrent ones. I came to networks out of frustration with the accuracy of econometric methods and to recurrent ones out of frustration with the accuracy of a specific network. I haven't yet made much progress, have built a couple of recurrent networks but nothing that outperforms the FFN that I use as well. So I'm still frustrated...
I've read about Elman and Jordan but in joone this is basically a context layer connected from the output to either the hidden or input layer? Have no idea what zipser is? I have quite a few references on BPTT but never really wanted to go that way, seems like a lot of admin and memory is required to do it. If you can implement it it will be great. I'd suggest use the NodesAndWeights as much as possible and if it does not do the job, either tweak it or subclass and tweak it.
PS : I am preparing stuff to train a couple of networks on the Amazon elastic compute cloud. If this works it will really be great.
Wow, smartcvs is great, as is the Amazon compute cloud.
hi ferra, thanks for your answers
> I came to networks out of frustration with the accuracy of econometric methods and to recurrent ones out of frustration
> with the accuracy of a specific network. I haven't yet made much progress, have built a couple of recurrent networks but
> nothing that outperforms the FFN that I use as well. So I'm still frustrated...
just started reading a paper before: "Results for out of sample show that the feedforward model is relatively accurate in forecasting both price levels and price direction, despite being quite simple and easy to use. However, the recurrent
network forecast performance was lower than that of the feedforward model. This may be because
feed forward models must pass the data from back to forward as well as forward to back, and can
sometimes become confused or unstable. Both the feedforward and recurrent models performed better
than the ARIMA benchmark model." :-(((((
>I've read about Elman and Jordan but in joone this is basically a context layer connected from the output to either the >hidden or input layer?
that't how i understood it too. but i'm not an expert myself....
> Have no idea what zipser is?
Basically all neurons are connected to one another, even the ones in the same layer.
>I have quite a few references on BPTT but never really wanted to go that way, seems like a lot of admin and memory is
> required to do it.
That was my concern too.
hopefully i find another paper, proving that rrns can outperform ffns :-). otherwhise the only way to improve the performance would be better data preprocessing, dimensionality reduction... boooooooooooring :-((((
Oh well, long ago tested some ARIMA models on the local index. AR models struggled but ARMA models did quite well... Seems an error e(t-2) helped quite a bit. In my mind AR models and FFN are more or less the same while ARMA and RNN are more or less the same. But there is a lot to experiment with. The decay - default of 0.5 in joone - in the context layer maybe should be optimised for best results. The feedback layer can be connected to a number of different layers, and can even originate in a number of different places.
If you create joone layers with just one row (just one neuron per layer) then the network is built on the neuron level and I *think* even zipser would be possible. The zipser neurons, are they directly connected to one another of via some kind of a context layer btw?
Yes, the preprocessing seems to be very important.
Thanks for the pointer to smartcvs - its great thus far.
Finally, should this recurrency stuff be shifted back into joone or stay in july? Maybe after some more testing we can merge the two branches?
> If you create joone layers with just one row (just one neuron per layer) then the network is built on the neuron level > and I *think* even zipser would be possible.
i also thought about this but i don't know whether joone can manage to invoke all the layers in the right order? but to be honest, i never looked at it that closely to be sure about.
> The zipser neurons, are they directly connected to one another of via some kind of a context layer btw?
my understanding: among each other the neurons are connected directly (synapse with weight) but their recursive connection to themselves is done using a context unit. not sure whether this is of any help for you...
> Finally, should this recurrency stuff be shifted back into joone or stay in july? Maybe after some more testing we can > merge the two branches?
to put in my two pennies: in my opinion you should shift it back asap. including your testcases. merging is usually a huge pita (even with smartcvs ;-))
just wondering: for what kind of timeseries prediction do you use joone? trend, absolute values, ...? is there any concrete benchmark you try to beat?
Would love to have some kind of a benchmark. I created a few test networks and tried to train networks with a similar structure but random weights to produce the same answers. This I could do, but the weights were often wildly different to that of the benchmark network. Also, RTRL assumes a p matrix filled with zeroes initially and, while this is typically not a big assumption, it made a huge difference for small networks with a few weights. It took me a long time to figure this out as the reason why I could not get correct convergence with a small networks. Real life benchmark is for profits to exceed costs + losses.....
Hi, have concocted together some form of multiprocessor support for RTRL. Will test and clean up a bit and then try to commit it using smartcvs. The performance looks very good thus far, a lot better on a quad core than the default that uses only 1 core e.g. (and amazon makes 8 core virtual machines available on the cloud ;-)
the ec2 stuff sounds promising ;-). do you use joone's dte therefore? would be cool if we had a ready to use ec2 image using the dte.
Nah, I overrated the multiprocessing. There is a good overview of it on wikipedia under the topic scalability, and the numbers in the example there more or less agree with my experience.
Using a quad core processor I get a significant speedup. Still, top shows that only around 70% of each processor's time is utilised. When I run it on ec2, using a virtual 8 core machine, half of the processors are running at 100% idle time. There are probably some fancy java command line arguments to improve the performance somewhat. What I did was to fire off 3 additional networks, this time requesting RTRL to utilise 4 processors for each. Thus 1 network utilising 8 and 3 networks utilising 4 processors running at the same time, but top still shows that only around 30% of each processor's time is used.
I've uploaded the files to the website if you want to have a look at the multiprocessing RTRL. If I commit, it introduces new dependencies on oat and colt, so I am still hesitating to do that. Any comments...
To some extent I use the network as one monster dimensionality reductioner. I want to again experiment a bit with much smaller networks, but have lately drifted towards bigger and bigger ones.
- i don't trust amazon ;-)
- use the server vm (-server) option. thereby the parallel gc is used by default (when you use java 1.5+; i'm quite sure you do because of the new for loop you use ;-)) and you shouldn't run in a heap space bottleneck.
- of course there are other optimzing strategies like native compilation, using another vm (eg. jrockit seems to be very fast and since oracle bought bea it is free, i think).... but i'm not sure whether any of those is worth the effort.
concerning the dependencies:
- äh... do you still use oat for your actual version? anyway imho i don't have any concerns using third-party libraries.
OK, I've committed the latest changes to CVS. I also added the colt libraries, the (altered) OAT source code and the *july* documentation. Just in case, I've also added the pristine RC2.0.0RC2 source to the tree as well. I wanted to add the gnuplot error and weight writer, but need to disentangle it a bit more from my own stuff before I can do that. But that is probably coming soon.
hofi85, I got your message - when I replied before I got some invalid host issues, which I suspect is because sf just relays the mail. Anyhow, see if the stop pattern issue have been addressed in the new code, otherwise, maybe start a new thread with some info and I'll look into it.
now that i know that you're code is commited, i will probably be able to fix it myself (if it was a bug at all/ it is still there).
btw. do you mind if i use your documentation in the new joone documentation?
No problem, you are welcome to use the docs.
I also had difficulty with the buffered output synapses, and have typically implemented my own. An example being the various learner plugins where some anonymous DirectSynapse gets subclassed to update the training algos. Elsewhere I subclass the TeacherSynapse - this to calculate diagnostics.
Apologies! The missing dependencies are now available and it should compile. These relate to OAT (optalgtoolkit).
Why is OAT in joone?
Well, the differential evolution (de) method should certainly be used for smallish networks, say 50 or less weights (if you have the hardware, I'd push that up to 200 weights). Other OAT methods can also be tried - based on the success of the DE I've created my own concoction called systematic differential (SD) which tries to be as good as or better than de. In practise results are more or less the same.
When using OAT (or RTRL or EKF), one important addition is that the maximum weight magnitude can be and must be specified. I've found a value of 200 suffice in most cases, but sometimes will use something really small like 1 to prevent overfitting.
awesome, works perfectly. thx.
one last question: is it necessary to have the entire source code in the joone-project structure? otherwhise i will move it in another project, so that it is not necessary to rebuild the entire oat code every time one does a clean up in the joone project.
is that ok for you ferra?
What you suggest is how it is supposed to work, so please go ahead and do it.
The OAT stuff should actually be an external lib and separate but since it is dead I merged my changes into it and made it part of the joone source. Bit messy but where else to put them.
You have done a very good job MG, and there is a lot of work behind it. There are still work to be done, but I would like you to know that you done a good job.
New version with a small change to differential evolution and systematic differential to prevent overflow when working with large networks.