[Geneticd-devel] Master engines - I DID IT
Status: Alpha
Brought to you by:
jonnymind
|
From: Giancarlo N. <gi...@ni...> - 2002-02-13 19:10:01
|
Gentleman, This is to annunce that the basic framework for master engines is ready, along with a bounch of changes that I am going to explain in this mail. The Master engine have still to be programmed, but is a simple matter now: its basic functionality is working and CVS'd by the time you'll recieve this. I would like to have a feedback from you about the solutions i found. I'll explain the changes in a bottom - up manner (as I usually program...). First add, of lesser value but still important, is the Hash class. I used it to store a dictionary of slave engines in the master, but it's implementations are greater. Hash class is in the new module utils/hash.cpp (hash.h). It is basically a keyed dictionary, in which we have two vectors (implemented through the Vector class): keys and values. Values are of void * kind (they can hold anything, but a back-cast is needed to retrieve the data you store). The interesting thing is the key: it is implemented through a HashKey class, that can hold (at the moment) both integer and strings (char *). Further developement can add more key types. Key based methods are transparent: they will cast automatically an integer or a string as a key in the corrisponding HashKey object, and will retrieve a void * from the corresponding value vector. In other words, we can have: Hash h; GeneticEngine *myeng = ...; h.add( 12, "a value in a string" ); h.add( "code12", myeng); printf( "The value of the entry '12' was %s", (char *) h.get (12) ); GeneticEngine * anEng = (GeneticEngine *) h.get( "code12"); Got it? Why is this important? Because it can be applied to a whole lot of vectors that have been limited to integer indexing. First of all, the vector of engines in the daemon; but also the plugin vector, the command vector, the engine type vector and anything that comes to mind. The "get" method of that classes was a little "heavy" and, on my opinion, too redundant. I will apply Hash to anithing that could benefit from that. Also, I will write a "synchash" class that resembles the SyncVector class, to allow different thread to access safely the dictionary. Now, let's get to the serious things. GStream have been moved out from the "Session" module to a different module in the new shared library "gd_client". (src/client). Most important, I created the GDClient class as a child of GStream. GDClient is responsible of client oriented operations towards GD servers; it can be used by client programs, and is also heavily used by master engines in controlling their slaves. It has already methods to connect, log in and create an engine; maybe it will have more, but the basic "print" method from GStream, coupled with advanced reply retrieving (both reply code and text, multi line replies and binary data retriving) should be enough for the most works. And then the master engine. The new engine model is build upon a matrix of engine type combined with engine class. The basic class for all engines is GeneticEngine; then the logic splits. We have the GEInABox class, which is the base class for all local engines, and the GEMaster class, which is the base for all master engines. The basic difference is that GEInABox has all the functionality to manage an owned genetic environment. GEMaster has not this capability: it relies on slaves GEInABox es. Both this new classes have basic functionalities; more specialized ones are bound to rewrite some virtual methods to change their functionality. GEInABox is what you already know: it is the old GeneticEngine class mangled a little, and behaves with no knowledge of it's surrounding environment. It does not even know (at this moment) if it's creator is a user or a master engine, or in other words, if it is a free or a slave engine. GEMaster has the responsiblity to create slave GEInABox engines, and coordinate them. I just had the time to write a basic "enslave" method to create slaves, and it works. All the rest have still to be programmed, but now it's a rather simple matter. Engines are still created with the "crea" command, that have been updated to "crea <type> [master]". Adding the "master" keyword creates a GEMaster engine; in the future this will be changed, and the second parameter will be the name of the class to be created. A static function (in the genetic/metaengine module) is responsible to create the right engine class based upon it's name: GeneticEngine *createEngineOfClass( char * ) The "load" command has also been updated, so it starts reading the engine file, gets the class name and calls that function to create the engine. After that, the load() method of the newly created engine is called. Some problems are still to be dealt with, but they are minor headaches: connection with GD holding slaves can be lost (dued i.e. to timeout); slaves engine can be faulty or removed by a local user. Is a simple tasks to check for this condictions and update transparently the master status. A more complex problem is that: suppose that the master creates an engine; it gets an engine ID from the slave GD; now a user with appropriate rights deletes that engines, and creates a new one. Basing upon the algorithm that we are using now, GD creates an engine with THE SAME ID of the one just deleted. The master would be fooled. The solution comes handy (and this is why I badly needed an Hash class): engine IDs should be no more numeric: they have to be created in an unique way, so that two engines can't have the same ID. This would also symplify human interaction when dealing with more than one GD at the same time. But this can be done almost painlessly, so we'll deal with this problem after that the master class is fully running, but before we issue the 0.2 version of GD. Also, Engine ID should be serialized, and not dynamically assigned by GD at their creation. We could also have two kind of ID: one locally valid (the one we have now can be good) and one absolute, like the way the domain name and the fully qualified domain name work. I would like to have your opinion on this point. One last word to comment the type-class engine matrix. After this step have been made, I can be satisfied in looking back and having choosen this model in developement. Think about the mess we would have had if we had to build a class for each different genetic algorithm AND for each slave kind engine AND master kind engine. We have just two now, but they will be more in the future (i am thinking of a slave class aware of network topology, commuincating without master intervention with neighbours....). It would have been a programming nightmare to recreate a whole set of class when having to add a genetic algoritm, or a different kind of master-slave interaction... Multiple Inheritance would have been even wrose! Implementing this model means also that we won't be able to put different master or slave classes in plugins, but this is a minor drawback. It's far more common, for users, to think about the precise algorithm that the daemon should run: the overall architecture of the parallel network is more concerning with core developer. Moreover, more advanced master-slave structure can be safely shipped with newer official releases. The old structures will still work, but newer GDs will be capable of different behaviour. Another topic about the subclassing model is the "command" limitation. At this moment, commands are overloaded on a engine type basis. Commands bound to master engines must take care of themselves... the command routines must check out if the command can be issued to the engine they are working on. This could be painful for a future developement, and should be changed, but at this moment I do not see any drawback in taking care of this after 0.2 is released. It could be a "todo" for 0.3 or 0.4. We must also consider that master specific commands are less far numerous than ordinary commands (at this moment we have only two: slav - enslave a client and rele - release a client). We can deal with this later. One last concern about release timing. With the master-slave model already on CVS, I can forecast that the 0.2 release will be ready by the last decade of mars. 0.3 and 0.4 should be ready by july, and we could have 1.0 by october. Waiting for some comment, Giancarlo Niccolai. |