From: Ioan V. <i....@ya...> - 2006-09-08 16:52:52
|
Clusters have been really hairy reproducibility territory since they emerged, because each clustering solution is customized. The "cluster hairiness" thing appears in Madagascar a little bit too. In /path/to/Madagascar/source/book/packages/LAUNCH, there is a call to a "pam" executable that sends the job to the cluster. A bit of googling shows that this is very likely to be part of the HP-MPI package ( http://tinyurl.com/gpo6k ). Anybody who has something else has to undo the calls to it and plug his own stuff in... for each Madagascar version. This is quite incompatible with proper package management, and saying "Just use HP-MPI" won't work because people use what works best with their cluster hardware, or what the company/university/sysadmin decided, etc. I believe the solution to that is leaving cluster.py and its relatives out of Madagascar and providing a well-documented parallelization API for distributed memory jobs. Ideally, if a repro document with cluster calls is being run on a system without a cluster add-on, it should spit a big big warning that it will take forever, and start running the code in single-node mode. The user could implement the parallelization system suited to his machine. This system could use anything from totally independent PCs to which jobs are sent to run as screensavers in the SETI@home fashion, to supercomputers. The API should include input, command, output, the info along which dimension(s) to slice data in the input hypercube, slice "thickness" and superposition, and whether the output should be made by summing or concatenating slices. The "parallelization" script coming by default with Madagascar would just send the job to run in single-node mode and print the warning. The user will be told to replace that script with his to send the job. What do other people think? Is this a good idea? The proposal above only refers to distributed memory jobs. Shared-memory stuff does not need to be user-dependent. GCC 4.2 has OMP (the GCC 4.1 that comes with Fedora Core 5 has it in advance). It is reasonable to assume the user has GCC or another OMP-enabled compiler and will not roll out his own solution for shared memory stuff. OMP will become more and more important. The increase in number of GHz has stalled after decades of advancement. Intel and AMD are putting more cores now on a single machine -- by year end, 4-core consumer PCs will be available in stores (ok, maybe it's commercial hype, but next year they will surely be here). Cheers, Nick |