[SSI-users] White paper comparing OpenSSI and OpenMosix
Brought to you by:
brucewalker,
rogertsang
|
From: Bruce W. <br...@ka...> - 2004-10-29 01:34:04
|
For those of you who might have more recent information on OpenMosix than I do,
feel free to correct me privately or publicly.
A Comparison of OpenSSI and OpenMosix
Bruce J. Walker
Oct. 27, 2004
While OpenMosix and OpenSSI have commonality (process-level load balancing
via process migration), their goals and strategy are quite different.
I am no expert in OpenMosix and what I explain below is my current
understanding of what OpenMosix can and cannot do and how it works.
Both technologies claim Single System Image (SSI) but the SSI in each case
is quite different. OpenSSI strives to aggregate all the resources of
all the nodes to result in one big SMP-like environment (one big single
system image (SSI)). OpenMosix does not. Instead, OpenMosix strives to
attain "home-node" SSI, where, while processes can move from their home node
to other nodes, these processes see only their home node. One could argue
this isn't SSI at all, but simply cpu borrowing.
To accomplish the limited goal of cpu borrowing, OpenMosix leaves the
kernel portion of the process back at the home node and for the most part
re-directs all system calls done by the migrated process back to the home
node. Over time the OpenMosix group determined their strategy had
performance and availability limitations and has tried to let some of the
system calls be executed on the new node (eg. DFSA). However, given that
most calls still go back to the home node, loss of the home node means
all processes started there must die. The OpenSSI strategy
has always been that all system calls are executed on the node where the
process is running. This means the whole process moves in OpenSSI and not
just the "user" part of the process. There are several SSI ramifications
to the two approaches. In OpenSSI, a process has a single clusterwide unique
process id which can be seen and accessed from any process on any node.
In OpenMosix, migrated processes get a new pid on their host and visibility
of processes is limited to only those started on the home node (processes
which migrate into a new node (other than home node) are visible on that
new node with a different name).
The situation is similar with Inter-process communication (IPC) objects
(pipes, fifos, semaphores, message queues, shared memory, unix-domain sockets,
etc.). In OpenSSI, all objects are clusterwide unique (SSI) and visible
and accessible from all nodes. Consequently an object is created on the
nodes where the process is currently running, not on the home node. OpenMosix
creates all objects on the home node and processes started on the same home
node can share them (except shared memory objects, which OpenMosix didn't
support across nodes(this may have been enhanced recently);
OpenSSI allows completely coherent read/write shared memory sharing across
nodes).
OpenMosix does not have a strong sense of cluster membership. OpenMosix
has no APIs for membership and no infrastructure for high availability.
OpenSSI ensures that all nodes always agree on the current membership and
through the APIs, cluster-aware applications can see a consistent history of
membership transition events on all nodes. There are APIs in OpenSSI
for membership information, membership history and membership
event notifications. There are also several high availability
facilities integrated and included as part of the base OpenSSI. First,
the cluster filesystem capability (CFS) is highly available;
filesystems will transparently failover from one node to another, with no
errors seen by processes on any node actively working in those filesystems
(more on the filesystem capabilities below). Second, OpenSSI comes with
HA-LVS, which provides a highly available IP address for the cluster as
well as providing load-balancing of incoming tcp/ip connections (like http,
ssh, etc.). Providing a highly available IP address with persistent
connections across failures is an important part of a high availability in
any SSI cluster environment. Next, rc-type services can trivially be
restarted on another node after failure and OpenSSI includes a simple yet
flexible process monitoring and restart subsystem. OpenSSI can also be
used to provide an HA-NFS file service.
The filesystem capabilities for OpenSSI and OpenMosix are quite different.
OpenMosix, through their MFS, provides some access to remote files. It uses a
superroot naming scheme (you can name any file on any node by using the
naming convention //<nodename>/pathname). Such a strategy is clearly not
transparent (node specific names; different name for a file locally than
from another node) and it does not do coherent caching (and thus no shared
read/write mapped file capability). The OpenMosix MFS also has no failover
capability. MFS is perhaps on the bottom rung of cluster filesystems.
OpenSSI is designed to support different cluster filesystem technologies.
It comes, however, with HA-CFS, which is a transparent client-server
stacked cluster filesystems (transparently stacks on ext3, xfs, reiserfs, JFS,
etc.) that is fully coherent and yet caches aggressively, supports
shared read/write mapped files, and can transparently failover on node
failure. OpenSSI has also worked with GFS and OpenGFS, including using them as
a shared root. OpenSSI also works with Lustre and has used Lustre to support
a shared root. OpenSSI has also integrated OCFS (Oracle cluster filesystem).
OpenSSI enforces a clusterwide file namespace without the limitation of a
superroot naming scheme. OpenSSI has always worked with a shared root
(whether CFS, GFS or Lustre). In addition, any mount of any physical
filesystem (ext3, xfs, etc.) or NFS filesystem done on any node is
automatically and transparently visible by the same name on all nodes.
A key design goal for OpenSSI was to provide a platform in which other
open source cluster technologies could be integrated, thus building an
environment suitable for all clustering needs. Earlier, it was mentioned
that HA-LVS has been integrated, as well as GFS, OpenGFS, Lustre and OCFS.
In addition, OpenDLM and DRBD have been integrated. The kernel-based
membership capability of OpenSSI provides a set of APIs so these subsystems,
like those already in OpenSSI, can register for node membership events
and can co-ordinate node up and node down activities. OpenSSI also has a
kernel-to-kernel communication system that can be used by various
subsystems and has RDMA capabilities ready to leverage interconnects like
Infiniband.
Load balancing is, at some level, a point of commonality between OpenSSI
and OpenMosix and in fact the OpenMosix load calculation algorithm was adapted
into OpenSSI. However, OpenSSI has connection load balancing as well as
process load balancing. OpenSSI also supports migrating processes with shared
memory segment (didn't used to work in OpenMosix; may work now). OpenSSI also
supports migrating process groups as an atomic action and supports migrating
threads (which OpenMosix may have added recently). OpenSSI also has exec-time
load balancing as well as process migration. Exec-time is much less expensive
because there is no process data to migrate. OpenSSI leverages the HA
imalive messages to share load information between nodes on a frequent
basis so exec-time load balancing decisions can be made. OpenMosix has
a capability to do process load balancing based on memory pressure;
OpenSSI has not enabled that feature to date.
As is evident above, the goals of OpenSSI are much broader than just
the cpu sharing goal of OpenMosix. The chief goal of OpenSSI is to be a
complete cluster solution, which means addressing availability, scalability
(sharing of all resources), manageability and usability, as well as being the
platform that other open source cluster technology can be integrated and/or
layered on. Managability is a key cluster problem and by having such a high
degree of SSI, OpenSSI largely reduces the management problem from that of
a cluster to that of a single machine. The shared root is key to that, along
with visibility and access to all resources of all nodes from all nodes.
To summarize, I must re-iterate that I am no OpenMosix expert. Nonetheless,
I have tried to capture the significant differences between the two offerings.
A summary of the differences includes:
- OpenSSI has a single management and administrative domain and OpenMosix
does not;
- OpenSSI has a single root filesystem enforced across the cluster (single
copy of binaries, admin files (like password), etc.) and
OpenMosix does not;
- OpenSSI has a single pid per process and a clusterwide process management
space, which OpenMosix does not;
- OpenSSI has a transparent, clusterwide namespace for all IPC objects and
OpenMosix does not;
- OpenSSI has clusterwide device access and a single pty namespace;
OpenMosix may have clusterwide device access.
- OpenSSI has a consistent "single site" file naming across all nodes and
OpenMosix has the superroot naming paradigm;
- OpenSSI has transparent and fully coherent file access across all nodes
while OpenMosix has a limited function ship file access model;
- OpenSSI has integrated with most cluster filesystem technologies so
there is flexibility and choices in what to run on OpenSSI.
- OpenSSI has the kernel interfaces to allow integrating other open source
technologies and several technologies have been integrated;
- OpenSSI has a highly available cluster filesystem with transparent
failover; OpenMosix does not;
- OpenSSI provides a single name and address for the cluster and that
name/address is highly available, with persistent connections.
- OpenSSI and OpenMosix both do process migration but OpenSSI then executes
system calls on the new node and OpenMosix function ships most calls
back to the home node;
- OpenSSI has exec-time process load balancing while OpenMosix does not;
- OpenMosix has memory pressure based process load balancing and OpenSSI has
not enabled that;
- OpenSSI has a variety of high availability features which OpenMosix does
not, including process monitoring and restart, automatics service
failover, automatic filesystem failover, cluster ip address
and connection management failover, and the ability to lose a
home node without killing all the processes that started on it;
- OpenSSI has strong membership guarantees and APIs for membership while
OpenMosix does not;
- OpenSSI has APIs for rexec() and rfork() as well a migrate while OpenMosix
has only process migration;
Both OpenMosix and OpenSSI have roots back to the early 1980s. The OpenSSI
technology started at UCLA with a system called Locus. The OpenMosix code was
adapted to Linux several years before the OpenSSI code was adapted and when
the OpenSSI Linux project was started, the question was asked "Mosix is
already there; why do OpenSSI?" Hopefully this document has explained why
we believe OpenSSI is the technology base that will propel Linux to dominance
in the clustering arena.
|