[Geneticd-devel] Ladies & Gentlemen, the GEMaster is here.

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

And is working for good on my local network on 3 pc!
Last CVS commit is stable enugh to run for hours without human intervention. 

The changes I've made are soo deep and interesting that this letter will take 
a litte lo be written. I Hope you will be patient enugh to read it.

First of all, let me say that the GEMaster is completely operative, and 
having lot of automatized checking, it somehow substitutes human intervention 
in managing a GD cluster... AMAZING. The only thing I have still to implement 
is the "serial_size" method of GEMaster, that is anyway useless untill we 
have a master running master, or a software client. 

I had to solve a lot of nasty problems. I would like to have your opinion 
about this things I've done.

1) The GEMaster is a kind of "server-client" engine, that remotely manages 
other engines and displays them and their results as if they where only one 
engine. This means that the GEMaster logs onto other GDs, creates a suitable 
GEInABox engine (of the same type of himself, and using the parameters it 
were given when born, that is, before PREP stage), and runs it, taking care 
of everything is going fine.

2) The first problem was: how to create that remote "components" of the 
master? The same mechanism I described in the old mails is still valid. I 
just want to remember that now compiled learning sets can be moved around 
with the lset command.

3) But now, when restoring the master engine from a saved file, and creating 
new instances of the slaves, there is the need of a certain way to address a 
client engine. I created the UNID = universal ID; a random string that 
individuate a certain engine in a server (or in a cluster). Any command using 
<eng-id> (i.e. dump 1 -- dumps the 2nd engine you've created), can be written 
with "cmd *<eng-UNID> ...", using a star (*) in front of the UNID. This can 
be also done for route agents in the cluster (i.e. "dage *aasdfae234 
*UNIDasdf3.my-agent-generation-0"). In this last case, since the numeric 
slave id cannot be changed easily, can be still safe to use the numeric 
address (i.e. "dage 4 2.0").

4) No two engines can exists with the same UNID on the same GD. The UNID is 
saved in the engine file, and restored when the engine if loaded. If you want 
to load 2 copies of the same engine, you'll have to change the unid of the 
first of them, with the command "reid <eng-id>". 

5) There could have been TONS of reasons why slave GD could not have worked 
well. Errors in the genetic algorithm causing segmentation fault or division 
by zero (as ... emm... in the gfunc :-(, temporary network failure, server 
dropdown etc. The master engine HAVE to still be working. So I created some 
self-preserving mechanism that can be useful:

a - GDClient ping-pong automation: every time you "print" or "write" 
something to a GDClient class (a descendant of GStream), a "ping" command is 
issued to the target GD. The server replies "+200 pong", and the conversation 
goes on, or there is an error. In this case, the print method tries to 
reconnect several times. This ping-pong handshaking can be turned off if you 
are somehow certain that the connection is still alive, or if not alive you 
can't bring it back so easily. This can happen when you issue a brust of 
commands in a short time range. This workaround is specifically studied to 
"reanimate" the connection when it has fallen due to network inactivity, in a 
trasparent way.

b - resinchronization mechanism: often, the master engine is assuming that 
the slaves are in a certain state. I.e. while the master is running, the 
master *hopes* that all the slaves are running with him. If it does not 
happen, it doesn't matter when the engines are responding as the master 
expects. When even this is not granted, the slaves are set "out of sync". 
They are not deleted from the list of slaves, but the master will try to put 
them back in row. This is done with a combination of activities, the core of 
which is the "resync" method of the SlaveDef class (that is the class that 
stores the representation of the slaves for the master engine).

6) Serialization of the master is now possible and easy. The de-serialization 
is a little more complicated. We have to re-login into all slaves GD, and 
send them the Slaves engine we stored locally. If they still reside in the 
contacted server (remember? just one UNID for each engine), the version 
already being in the client is used. This is because, if the master fall but 
the slaves are still alive and continue running, the newly brought-up master 
has only to take over the slaves: the work they have done in the meanwhile is 
not lost. 
If the slave GD is not available when the master engine is loaded, the GD is 
removed from the list of slaves and must be manually re-enslaved. 

7) Agent loading is done by sending the agent to the first random engine that 
is capable of holding it.

All this is working now in my local network, and BOY, is amazing!. I still 
have to do manually some job (as creating from now to then a brust of new 
random agents (fresh blood!), or moving around agents from engine to engine), 
but the greater work is automatically done by the GEMaster.

TODOS about this topics are: agent loading directed to a paritcular slave, 
preservation and reallocation to different servers of slaved engines stored 
in the save file and serialization size. I also have to make concurrent calls 
to the slaves hash in the GEMaster class safe; but that can be considered far 
less important changes than the other TODOs in the 0.2 version whish list. WE 
ARE NEAR.

Other things I added, in sparse order:

- Many replyes in corecommands have been moved from -5xx to -4xx, that I will 
call "warnings".
- Empty() method for Vector class. Useful.
- Lockable::Notify is now public.
- SlaveDef is now lockable. all the calls to its 
- Removed -O2 switch: somehow, optimization caused virtual table corruptions 
in objects stored in the stack. It could be a bug of my version of gcc.
- Added "ping", "unid" and "reid" commands.

I am thinking to reduce somehow the size of "corecommands", moving its 
contents in separate files. I would like also to move GeneticEngine its 
childs out of the /genetic directory and library. ANY SUGGESTION?

The source is changed very much. I suggest you a clean start, and/or a make 
disclean autoconf ./configure.

Bye.
Jonnymind.