From: amitm <am...@in...> - 2002-04-04 04:19:39
|
Considering the amount of interest in building UML based clusters, I was = just thinking about using UML as part of a computational grid. Suppose we have a screensaver that brings up a pre-configured UML = instance and also sets up communication channels with the outside world. = The screensaver itself participates in a P2P/centralized (to be decided = either way) network, anounces the availability of an UML instance and = provides a communication proxy to the UML. The UML runs off a standard = rh7.2 or similar root_fs that provides all the possible computing = environments - perl, python, java, etc. Standard well known userids are = also defined. host_fs on the UML is not included. The UML instance is a = secure sandbox for running unknown code. The screensaver makes a new = copy of the root_fs before every run, therefore every run has the same = initial environment. The screen saver can also ensure that the UML = instance only communicates back to the machine from where the code for = computation was loaded. In my view the advantages are 1) A complete UNIX environment is available. Scientists can program in = any language of their choice. 2) Fully secure because of the sandbox model. Am I right in saying that = the host environment cannot be corrupted / crashed ? 3) Once the Windows port of UML is available, we can provide build a UML = based computational grid on the millions of broadband connected home = PC's With the cost of disk going down, it should not be an issue to take up = 0.5 - 0.75 GB for the two root_fs's (one for the clean start and the = other, the currently executing copy). These are just my initial thoughts. Is anybody else thinking along these = lines ? Is it doable ? What are the issues ? Amit |
From: Erik P. <epa...@cs...> - 2002-04-04 05:00:58
|
On Thu, Apr 04, 2002 at 09:48:46AM +0530, amitm wrote: > Considering the amount of interest in building UML based clusters, I was just thinking about using UML as part of a computational grid. > Suppose we have a screensaver that brings up a pre-configured UML instance and also sets up communication channels with the outside world. The screensaver itself participates in a P2P/centralized (to be decided either way) network, anounces the availability of an UML instance and provides a communication proxy to the UML. The UML runs off a standard rh7.2 or similar root_fs that provides all the possible computing environments - perl, python, java, etc. Standard well known userids are also defined. host_fs on the UML is not included. The UML instance is a secure sandbox for running unknown code. The screensaver makes a new copy of the root_fs before every run, therefore every run has the same initial environment. The screen saver can also ensure that the UML instance only communicates back to the machine from where the code for computation was loaded. > > In my view the advantages are > > 1) A complete UNIX environment is available. Scientists can program in any language of their choice. > 2) Fully secure because of the sandbox model. Am I right in saying that the host environment cannot be corrupted / crashed ? > 3) Once the Windows port of UML is available, we can provide build a UML based computational grid on the millions of broadband connected home PC's > > With the cost of disk going down, it should not be an issue to take up 0.5 - 0.75 GB for the two root_fs's (one for the clean start and the other, the currently executing copy). > > These are just my initial thoughts. Is anybody else thinking along these lines ? Is it doable ? What are the issues ? > We've looked at using UML in a grid environment (which is not exactly what you're describing - you're describing a SETI@Home/Distributed.net setup, which is often called grid computing in the popular press, but in reality Grid computing is a bit different (think cracker vs hacker )) The biggest problem that we ran into is that the virtualization is not complete - a uml is not an i386, and the memory map to the user process looks a bit different (parts of the UML kernel show up in your address space, which confused some of the things we were working with) The other big problem was the setuid networking requirement - it'd be nice to have an additional networking option that just piggy-backed on the host's network, and bypassed most of the UML network stack. This means that some stuff isn't going to work, of course - if the UML kernel is running as user 'nobody', then any process running inside the UML kernel is going to fail when it tries to open a privileged port. That's fine for some applications of UML. It's sort of a bear to implement, though - you need to jump out of the UML's networking stack in the generic code, and not in an arch-specific driver. Other than that, the only big problems that you're going to run into is the size of root file system (yes, disk is cheap, but moving a 500 meg root filesystem image is still not trivial to most folks, especially if you need to move different root filesystems around, depending on the job) And of course, there are all the standard massively-distributed-computing problems of how do you trust the clients, which UML can't help with (but it doesn't make things worse, and makes other things easier) -Erik |
From: Jeff D. <jd...@ka...> - 2002-04-04 14:31:41
|
epa...@cs... said: > The biggest problem that we ran into is that the virtualization is not > complete - a uml is not an i386, and the memory map to the user > process looks a bit different (parts of the UML kernel show up in > your address space, which confused some of the things we were working > with) Generally, anything which gets confused about the memory layout changing is broken. The only exception I can think of would be things that need a huge contiguous virtual area, and UML now loads itself at the top of memory, so those things should be happier now. > The other big problem was the setuid networking requirement There is a non-setuid way of doing networking (persistent TUN/TAP), although it requires some one-time root setup. I do have a slirp patch in my queue which would do the piggy-backing on the host networking though. am...@in... said: > 2) Fully secure because of the sandbox model. Am I right in saying > that the host environment cannot be corrupted / crashed ? Yes, modulo bugs, of course. Without 'jail' enabled, you can break out of UML if you know what you are doing. With it enabled, you can't. The disadvantage of jail is a noticable performance hit because getting in and out of the kernel is a lot more expensive. However, you might not notice because I imagine that the things you're talking about are CPU intensive and don't get into the kernel too often. If so, you might not notice the performance hit. epa...@cs... said: > Other than that, the only big problems that you're going to run into > is the size of root file system (yes, disk is cheap, but moving a 500 > meg root filesystem image is still not trivial to most folks, > especially if you need to move different root filesystems around, > depending on the job) It's not hard to implement a COW hostfs, where the bulk of the files used by UML are on the host, but some are transparently replaced by the hostfs mount, and any file modifications are stored separately, so they don't actually change the host file. This would let you boot off the host's / and ship a relatively small filesystem which contains things that you want to add or modify. Jeff |
From: Erik P. <epa...@cs...> - 2002-04-04 15:12:45
|
On Thu, Apr 04, 2002 at 09:34:01AM -0500, Jeff Dike wrote: > epa...@cs... said: > > The biggest problem that we ran into is that the virtualization is not > > complete - a uml is not an i386, and the memory map to the user > > process looks a bit different (parts of the UML kernel show up in > > your address space, which confused some of the things we were working > > with) > > Generally, anything which gets confused about the memory layout changing > is broken. The only exception I can think of would be things that need a > huge contiguous virtual area, and UML now loads itself at the top of memory, > so those things should be happier now. > I'd disagree - if it's not something my process can touch, then it shouldn't be in visible in my address space. It should be perfectly acceptable for any process to map memory however it wants. Specifically, what was killing us was the fact that there was this block of memory at 0x5000000 that our process shouldn't touch. One of the things that our codes could do is checkpoint themselves by writing out all of their memory segments, and restoring them later. There was no way for us to know that 0x5000000 wasn't us and should have been skipped. When we'd restore, we'd overwrite the new 0x500000 with the old 0x500000, which caused bad things to happen. Just moving UML to the top of memory isn't going to help us any. > > The other big problem was the setuid networking requirement > > There is a non-setuid way of doing networking (persistent TUN/TAP), although > it requires some one-time root setup. I do have a slirp patch in my queue > which would do the piggy-backing on the host networking though. > Both of those require setting up basically another machine though, with it's own IP (though maybe IP masq'ed). That's fine if you're looking to bring up another machine, but if you only want to provide a sandboxed process with a different view of an operating system than that's not what you want. For example, I want to run one of my codes on Redhat 6.2, but all I have is Redhat 7.2 hosts. I want to use UML to give this one process a Redhat 6.2 operating environment - if my host 7.2 machine is foo.bar.edu, then the Redhat 6.2 machine should think that it's hostname is foo.bar.edu. And if I try and open a listen socket on port 1234 in the UML, then it should open a socket for me on the 7.2 host on port 1234, and totally bypass the UML network stack. The current TUN/TAP stuff is really cool and really useful to a lot of people, but I think this other method of operation would be useful for a lot of other people. And I suspect more people would be looking for this once UML is available as a user-space library, no? -Erik |
From: Jeff D. <jd...@ka...> - 2002-04-04 16:39:40
|
epa...@cs... said: > I'd disagree - if it's not something my process can touch, then it > shouldn't be in visible in my address space. It should be perfectly > acceptable for any process to map memory however it wants. Wrong. Try mapping something into the 0xc0000000 - 0xffffffff range on the host and see how far you get. > Specifically, what was killing us was the fact that there was this > block of memory at 0x5000000 that our process shouldn't touch. One of > the things that our codes could do is checkpoint themselves by writing > out all of their memory segments, and restoring them later. There was > no way for us to know that 0x5000000 wasn't us and should have been > skipped. When we'd restore, we'd overwrite the new 0x500000 with the > old 0x500000, which caused bad things to happen. Just moving UML to > the top of memory isn't going to help us any. Yes it is. Try it now, it should work. UML mappings aren't visible any more. > Both of those require setting up basically another machine though, > with it's own IP (though maybe IP masq'ed). Slirp doesn't. That's kind of the whole point of it. > And if I try and open a listen socket on port 1234 in the UML, then it > should open a socket for me on the 7.2 host on port 1234, and totally > bypass the UML network stack. Which is exactly what slirp does. Except that it doesn't bypass the UML network. It sits at the bottom and munges packets going in and out. > but I think this other method of operation would be useful for a lot > of other people. And I suspect more people would be looking for this > once UML is available as a user-space library, no? Yup. Jeff |
From: Erik P. <epa...@cs...> - 2002-04-04 17:12:48
|
On Thu, Apr 04, 2002 at 11:42:05AM -0500, Jeff Dike wrote: > epa...@cs... said: > > I'd disagree - if it's not something my process can touch, then it > > shouldn't be in visible in my address space. It should be perfectly > > acceptable for any process to map memory however it wants. > > Wrong. Try mapping something into the 0xc0000000 - 0xffffffff range on > the host and see how far you get. > But 0xc00000->0xffffffff isn't in /proc/maps :) > > Specifically, what was killing us was the fact that there was this > > block of memory at 0x5000000 that our process shouldn't touch. One of > > the things that our codes could do is checkpoint themselves by writing > > out all of their memory segments, and restoring them later. There was > > no way for us to know that 0x5000000 wasn't us and should have been > > skipped. When we'd restore, we'd overwrite the new 0x500000 with the > > old 0x500000, which caused bad things to happen. Just moving UML to > > the top of memory isn't going to help us any. > > Yes it is. Try it now, it should work. UML mappings aren't visible any more. > I will - are there any other differences between an i386 and UML-on-i386 address spaces that people have to watch out for? > > Both of those require setting up basically another machine though, > > with it's own IP (though maybe IP masq'ed). > > Slirp doesn't. That's kind of the whole point of it. > > > And if I try and open a listen socket on port 1234 in the UML, then it > > should open a socket for me on the 7.2 host on port 1234, and totally > > bypass the UML network stack. > > Which is exactly what slirp does. Except that it doesn't bypass the UML > network. It sits at the bottom and munges packets going in and out. So is your patch queue visible anywhere? I didn't see the slirp patch in the patch browser at sourceforge. -Erik |