[Geneticd-devel] First step toward parallel processing is done!
Status: Alpha
Brought to you by:
jonnymind
|
From: Jonny M. <jon...@ni...> - 2002-01-14 01:47:22
|
This night we prepared the first step toward parallel processing in Geneticd.
After a deep thought, I understood that:
1) there is more than one strategy to implement a parallel genetic population
strategy i.e.:
1.1) Coordinated parallel populations (genetic populations running
independently on different servers)
1.2) Global population, with different calculation point on different
genetic engines
1.3) Independent populations coordinating on the same learning set.
2) Some or all this strategies must be implemented in a flexible manner. The
parallell process ongoing should be transparent to both the end-user AND the
top level program routines. All the burden to manage the parallel process
should be allocated in a convenient part of the program, and this part must
have a common programming interface.
3) We can make all this if we abstract the genetic engine class. It has been
though has a "command simplifier", that holds a lot of functionality that
commands should have: saving, restoring, dumping populations or agents were
demanded to the genetic engine because it was a convenient place to store
that routines.
4) the new genetic engine model must be a BLACK BOX, that receives bunch of
useless genetic code, and through the interaction of some internal pieces (as
the evaluator, the environment and the learning set) is capable of
synthetize a set of better agents in output.
5) this new black box will be self-complete; it will be "programmed" by
commands or other interfaces with a programmable API, and it will be the only
mean by which the servers will deal with the agents.
In this way, we can switch black boxes so that they run their populations on
just one machine, on more than one machine, or in any fashion they like.
Genetic engine will not be pluggable, since their architecture is deeply tied
inside the daemon itself, but will be easy to reprogram if we want to add
some new way to manage our populations. Engine type will still be a set of
"internal pieces" of the genetic engines, the tools by which each engine
synthetizes its agents.
I WOULD GREATLY APPRECIATE A COMMENT FROM THE READERS OF THIS MAILING LIST.
------------------------
Now, I have been making some changes towards this model, and I think that we
can really start to develop a parallel - multiserver engine. Now I'll
describe the changes I've made, so that you can get in touch with what is
going on...
1) I splitted up the BIG genetic.cpp file; now engines will reside in a file
called gengine.cpp (with a gengine.h); each source willing to include genetic
code will have to include gengine.h, that includes genetic.h and learningset.h
(I have already changed #include directives in the whole source tree).
2) I removed the methods getEnv and getEvaluator from the GeneticEngine
class, substituting them with some new method. The daemon was accessing
directly the population or the environment for trivial reasons (i.e. for
counting how many agents had been alive or locking the environment). I
removed this references, and substituted them with new GeneticEngine methods.
In this way we can have a completely "population" created by our engines,
without having to worry about the daemon being able to understand it.
3) I abstracted class GeneticEngine, that is now a pure virtual class. It's
virtual=0 methods describe all the API that the rest of the daemon will be
able to use to manage genetic populations. The old GeneticEngine class became
the new GEInABox, Genetic Engine in a box, that is a genetic engine
completely self-functional, but capable to manage it's population locally
only.
Now, the API of the engines will be modified, and maybe we'll have to change
something in the design of the daemon itself, but I think that this powerful
model is capable of solving the problem of massive internet based broadcast
parallel computation in genetic algorithms (WOW!).
Waiting to have your news,
Giancarlo Niccolai.
|