[Geneticd-devel] Ladies & Gentlemen, the GEMaster is here.
Status: Alpha
Brought to you by:
jonnymind
|
From: Jonny M. <jon...@ni...> - 2002-03-12 22:42:17
|
And is working for good on my local network on 3 pc! Last CVS commit is stable enugh to run for hours without human intervention. The changes I've made are soo deep and interesting that this letter will take a litte lo be written. I Hope you will be patient enugh to read it. First of all, let me say that the GEMaster is completely operative, and having lot of automatized checking, it somehow substitutes human intervention in managing a GD cluster... AMAZING. The only thing I have still to implement is the "serial_size" method of GEMaster, that is anyway useless untill we have a master running master, or a software client. I had to solve a lot of nasty problems. I would like to have your opinion about this things I've done. 1) The GEMaster is a kind of "server-client" engine, that remotely manages other engines and displays them and their results as if they where only one engine. This means that the GEMaster logs onto other GDs, creates a suitable GEInABox engine (of the same type of himself, and using the parameters it were given when born, that is, before PREP stage), and runs it, taking care of everything is going fine. 2) The first problem was: how to create that remote "components" of the master? The same mechanism I described in the old mails is still valid. I just want to remember that now compiled learning sets can be moved around with the lset command. 3) But now, when restoring the master engine from a saved file, and creating new instances of the slaves, there is the need of a certain way to address a client engine. I created the UNID = universal ID; a random string that individuate a certain engine in a server (or in a cluster). Any command using <eng-id> (i.e. dump 1 -- dumps the 2nd engine you've created), can be written with "cmd *<eng-UNID> ...", using a star (*) in front of the UNID. This can be also done for route agents in the cluster (i.e. "dage *aasdfae234 *UNIDasdf3.my-agent-generation-0"). In this last case, since the numeric slave id cannot be changed easily, can be still safe to use the numeric address (i.e. "dage 4 2.0"). 4) No two engines can exists with the same UNID on the same GD. The UNID is saved in the engine file, and restored when the engine if loaded. If you want to load 2 copies of the same engine, you'll have to change the unid of the first of them, with the command "reid <eng-id>". 5) There could have been TONS of reasons why slave GD could not have worked well. Errors in the genetic algorithm causing segmentation fault or division by zero (as ... emm... in the gfunc :-(, temporary network failure, server dropdown etc. The master engine HAVE to still be working. So I created some self-preserving mechanism that can be useful: a - GDClient ping-pong automation: every time you "print" or "write" something to a GDClient class (a descendant of GStream), a "ping" command is issued to the target GD. The server replies "+200 pong", and the conversation goes on, or there is an error. In this case, the print method tries to reconnect several times. This ping-pong handshaking can be turned off if you are somehow certain that the connection is still alive, or if not alive you can't bring it back so easily. This can happen when you issue a brust of commands in a short time range. This workaround is specifically studied to "reanimate" the connection when it has fallen due to network inactivity, in a trasparent way. b - resinchronization mechanism: often, the master engine is assuming that the slaves are in a certain state. I.e. while the master is running, the master *hopes* that all the slaves are running with him. If it does not happen, it doesn't matter when the engines are responding as the master expects. When even this is not granted, the slaves are set "out of sync". They are not deleted from the list of slaves, but the master will try to put them back in row. This is done with a combination of activities, the core of which is the "resync" method of the SlaveDef class (that is the class that stores the representation of the slaves for the master engine). 6) Serialization of the master is now possible and easy. The de-serialization is a little more complicated. We have to re-login into all slaves GD, and send them the Slaves engine we stored locally. If they still reside in the contacted server (remember? just one UNID for each engine), the version already being in the client is used. This is because, if the master fall but the slaves are still alive and continue running, the newly brought-up master has only to take over the slaves: the work they have done in the meanwhile is not lost. If the slave GD is not available when the master engine is loaded, the GD is removed from the list of slaves and must be manually re-enslaved. 7) Agent loading is done by sending the agent to the first random engine that is capable of holding it. All this is working now in my local network, and BOY, is amazing!. I still have to do manually some job (as creating from now to then a brust of new random agents (fresh blood!), or moving around agents from engine to engine), but the greater work is automatically done by the GEMaster. TODOS about this topics are: agent loading directed to a paritcular slave, preservation and reallocation to different servers of slaved engines stored in the save file and serialization size. I also have to make concurrent calls to the slaves hash in the GEMaster class safe; but that can be considered far less important changes than the other TODOs in the 0.2 version whish list. WE ARE NEAR. Other things I added, in sparse order: - Many replyes in corecommands have been moved from -5xx to -4xx, that I will call "warnings". - Empty() method for Vector class. Useful. - Lockable::Notify is now public. - SlaveDef is now lockable. all the calls to its - Removed -O2 switch: somehow, optimization caused virtual table corruptions in objects stored in the stack. It could be a bug of my version of gcc. - Added "ping", "unid" and "reid" commands. I am thinking to reduce somehow the size of "corecommands", moving its contents in separate files. I would like also to move GeneticEngine its childs out of the /genetic directory and library. ANY SUGGESTION? The source is changed very much. I suggest you a clean start, and/or a make disclean autoconf ./configure. Bye. Jonnymind. |