On Thu, Apr 04, 2002 at 09:48:46AM +0530, amitm wrote:
> Considering the amount of interest in building UML based clusters, I was just thinking about using UML as part of a computational grid.
> Suppose we have a screensaver that brings up a pre-configured UML instance and also sets up communication channels with the outside world. The screensaver itself participates in a P2P/centralized (to be decided either way) network, anounces the availability of an UML instance and provides a communication proxy to the UML. The UML runs off a standard rh7.2 or similar root_fs that provides all the possible computing environments - perl, python, java, etc. Standard well known userids are also defined. host_fs on the UML is not included. The UML instance is a secure sandbox for running unknown code. The screensaver makes a new copy of the root_fs before every run, therefore every run has the same initial environment. The screen saver can also ensure that the UML instance only communicates back to the machine from where the code for computation was loaded.
> In my view the advantages are
> 1) A complete UNIX environment is available. Scientists can program in any language of their choice.
> 2) Fully secure because of the sandbox model. Am I right in saying that the host environment cannot be corrupted / crashed ?
> 3) Once the Windows port of UML is available, we can provide build a UML based computational grid on the millions of broadband connected home PC's
> With the cost of disk going down, it should not be an issue to take up 0.5 - 0.75 GB for the two root_fs's (one for the clean start and the other, the currently executing copy).
> These are just my initial thoughts. Is anybody else thinking along these lines ? Is it doable ? What are the issues ?
We've looked at using UML in a grid environment (which is not exactly what
you're describing - you're describing a SETI@... setup, which
is often called grid computing in the popular press, but in reality Grid
computing is a bit different (think cracker vs hacker ))
The biggest problem that we ran into is that the virtualization is not
complete - a uml is not an i386, and the memory map to the user process
looks a bit different (parts of the UML kernel show up in your address space,
which confused some of the things we were working with)
The other big problem was the setuid networking requirement - it'd be nice to
have an additional networking option that just piggy-backed on the host's
network, and bypassed most of the UML network stack. This means that some
stuff isn't going to work, of course - if the UML kernel is running as user
'nobody', then any process running inside the UML kernel is going to fail
when it tries to open a privileged port. That's fine for some applications
of UML. It's sort of a bear to implement, though - you need to jump out of
the UML's networking stack in the generic code, and not in an arch-specific
Other than that, the only big problems that you're going to run into is the
size of root file system (yes, disk is cheap, but moving a 500 meg root
filesystem image is still not trivial to most folks, especially if you need to
move different root filesystems around, depending on the job)
And of course, there are all the standard massively-distributed-computing
problems of how do you trust the clients, which UML can't help with (but it
doesn't make things worse, and makes other things easier)