Steve Lane wrote:
> Greetings. We currently have a bunch of machines (I hesitate to call
> them a cluster :) that we would like to bind together into a SSI cluster.
> We are considering using OpenSSI or OpenMosix to do this, and we have
> some questions about functionality. We intend to try a test install
> of OpenSSI first (hence this email), but if anyone has any answers that
> can address OpenMosix as well, that information will be much appreciated.
To summarize my understanding of the difference between OpenSSI and
If you build a ten-node OpenMosix cluster, you'll have ten independent
SSI clusters that share CPU cycles, but little else. If you build a
ten-node OpenSSI cluster, you'll have a single SSI cluster that shares
almost every resource in the cluster, with the exception of memory.
If all you want to do is share CPU cycles, OpenMosix might be better for
you. If you want full SSI clustering, then OpenSSI is your best option
> Some packages are interactive and graphical
> (e.g. X-windowed; X sessions on the cluster will be forwarded back
> to the user's local terminal - the cluster itself will be headless),
OpenSSI's support for running an X server (the graphical display) on
each node is still a bit immature, although it's quickly getting better.
It sounds like what you want to do, however, is run X clients (the
applications themselves) on the cluster, and run the X servers on the
user's workstations. This shouldn't be a problem.
> We want an environment in which our users can ssh into the "machine"
> (e.g. the SSI cluster), and just do their work, much as they would if
> the "machine" were a single-motherboard Linux box with 1 or 2 CPUs.
> We have a couple of SGI Origins with 4 CPUs each, and we're hoping the
> user experience with the SSI cluster will be more-or-less identical
> to that with the Origins, only a lot faster :) Is this how OpenSSI
> (and/or OpenMosix) works, or is intended to work?
I don't think anyone's tested OpenSSI on SGI Origins, but if they use
Pentium-compatible chips and can run Red Hat 9, then they should be able
to run OpenSSI.
What OpenSSI will do is make your two 4-CPU machines behave more or less
like a single 8-CPU machine, particularly if you enable automatic
process load-balancing between nodes. The main advantage of this is
manageability. You only have a single filesystem tree for installing and
managing your software, so you don't have to diagnose problems caused by
inconsistent data on different machines. You also have a single process
space, so you see all processes in the cluster with the tried-and-true
ps command, and you can signal any of them with the familiar kill and
killall commands, regardless of where the processes are running in the
Another advantage is availability. If a CPU failed in a real 8-CPU
machine, the entire system would crash. If it fails in your OpenSSI
cluster, only one node would crash (assuming the cluster still has
access to the root filesystem).
If your processes don't communicate too intensively with each other,
then your OpenSSI cluster will run significantly faster than one of the
4-CPU machines by itself, without much additional administrative
overhead. I doubt it would be faster than a real 8-CPU machine, although
it could be. The reason is that a real 8-CPU machine would share a
single kernel between 8 CPUs (which Linux doesn't handle very well),
whereas an OpenSSI cluster runs a kernel on every node.
Hopefully this explanation is clear.
> Is it reasonable (or even plausible) to assume that our software will
> therefore in general work "fine" under an OpenSSI cluster?
It should. One of the requirements we've strived to follow is to
preserve base semantics as much as possible. Your software should not
notice any difference between OpenSSI and base Red Hat 9, apart from
some unusual error conditions if a node crashes (e.g., a process on node
2 could lose a pipe it was using on node 3).
> Much of this
> software is installed (e.g. built) from source - are there any known
> major issues with building software under OpenSSI vs. vanilla RH9?
Not that I'm aware of. I've built the release RPMs on one node of an SSI
cluster (which includes doing a full kernel build), although I haven't
tried doing this with a parallel make, yet.
> We will be running a somewhat hardware heterogeneous cluster: at present,
> nine 1GHz PIII boxes (one of which will be the head node) and four 2.4GHz
> Xeon boxes (all with 2 CPUs/mobo).
Ah. I thought you were building the cluster from the 4-CPU Origins.
Hopefully my misunderstanding doesn't detract from my explanation above.
I'm not sure if anyone's run OpenSSI on Xeons. I'm not even familiar
with the chip. Is it Pentium compatible?
Apart from that issue, the heterogeneity should be okay.
> It is our intention not to have the
> head node used for computation. Does anyone see any potential problems
> with (or benefits to) this setup? Many of the clusters discussed in the
> list seem to be composed of much smaller numbers of machines (e.g. 2-3).
> Has anyone had experience using OpenSSI (or OpenMosix) with a somewhat
> larger cluster (e.g. ~10 boxes, 10-20 CPUs)?
One of our testers (Scott Hinchley) plays with an OpenSSI cluster of
blades that typically exceeds fifty nodes. Most of the SSI code is
architected to be infinitely scalable, so your plan to build a ten node
cluster should not be a big deal.
Before I close, one advantage of OpenSSI over OpenMosix is that you only
have to install RH9 on the first node of an OpenSSI cluster. The other
nodes are network booted into the cluster and share the single
installation with the first node. I think OpenMosix requires you to
install your distro on every node, although maybe you can simplify this
by installing once and replicating the disk image to the other nodes.
Hopefully this note helps with your decision making,