[SSI-users] White paper comparing OpenSSI and OpenMosix

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

For those of you who might have more recent information on OpenMosix than I do,
feel free to correct me privately or publicly.

		A Comparison of OpenSSI and OpenMosix

		           Bruce J. Walker

		            Oct. 27, 2004

  While OpenMosix and OpenSSI have commonality (process-level load balancing
via process migration), their goals and strategy are quite different. 
I am no expert in OpenMosix and what I explain below is my current 
understanding of what OpenMosix can and cannot do and how it works.
Both technologies claim Single System Image (SSI) but the SSI in each case
is quite different.  OpenSSI strives to aggregate all the resources of 
all the nodes to result in one big SMP-like environment (one big single
system image (SSI)).  OpenMosix does not.  Instead, OpenMosix strives to 
attain "home-node" SSI, where, while processes can move from their home node
to other nodes, these processes see only their home node.  One could argue
this isn't SSI at all, but simply cpu borrowing.

  To accomplish the limited goal of cpu borrowing, OpenMosix leaves the 
kernel portion of the process back at the home node and for the most part 
re-directs all system calls done by the migrated process back to the home
node.  Over time the OpenMosix group determined their strategy had
performance and availability limitations and has tried to let some of the
system calls be executed on the new node (eg. DFSA).  However, given that
most calls still go back to the home node, loss of the home node means
all processes started there must die.  The OpenSSI strategy 
has always been that all system calls are executed on the node where the 
process is running.  This means the whole process moves in OpenSSI and not 
just the "user" part of the process.  There are several SSI ramifications 
to the two approaches.  In OpenSSI, a process has a single clusterwide unique
process id which can be seen and accessed from any process on any node.
In OpenMosix, migrated processes get a new pid on their host and visibility
of processes is limited to only those started on the home node (processes 
which migrate into a new node (other than home node) are visible on that 
new node with a different name).

  The situation is similar with Inter-process communication (IPC) objects
(pipes, fifos, semaphores, message queues, shared memory, unix-domain sockets,
etc.).  In OpenSSI, all objects are clusterwide unique (SSI) and visible
and accessible from all nodes.  Consequently an object is created on the 
nodes where the process is currently running, not on the home node.  OpenMosix
creates all objects on the home node and processes started on the same home 
node can share them (except shared memory objects, which OpenMosix didn't
support across nodes(this may have been enhanced recently);  
OpenSSI allows completely coherent read/write shared memory sharing across 
nodes).

  OpenMosix does not have a strong sense of cluster membership. OpenMosix 
has no APIs for membership and no infrastructure for high availability.  
OpenSSI ensures that all nodes always agree on the current membership and 
through the APIs, cluster-aware applications can see a  consistent history of 
membership transition events on all nodes.  There are APIs in OpenSSI 
for membership information, membership history and membership 
event notifications.  There are also several high availability 
facilities integrated and included as part of the base OpenSSI.  First, 
the cluster filesystem capability (CFS) is highly available; 
filesystems will transparently failover from one node to another, with no 
errors seen by processes on any node actively working in those filesystems 
(more on the filesystem capabilities below).  Second, OpenSSI comes with 
HA-LVS, which provides a highly available IP address for the cluster as 
well as providing load-balancing of incoming tcp/ip connections (like http, 
ssh, etc.).  Providing a highly available IP address with persistent 
connections across failures is an important part of a high availability in
any SSI cluster environment.  Next, rc-type services can trivially be 
restarted on another node after failure and OpenSSI includes a simple yet 
flexible process monitoring and restart subsystem.  OpenSSI can also be 
used to provide an HA-NFS file service.

   The filesystem capabilities for OpenSSI and OpenMosix are quite different.
OpenMosix, through their MFS, provides some access to remote files.  It uses a
superroot naming scheme (you can name any file on any node by using the
naming convention //<nodename>/pathname).  Such a strategy is clearly not
transparent (node specific names; different name for a file locally than
from another node) and it does not do coherent caching (and thus no shared
read/write mapped file capability).  The OpenMosix MFS also has no failover
capability.  MFS is perhaps on the bottom rung of cluster filesystems.
OpenSSI is designed to support different cluster filesystem technologies.
It comes, however, with HA-CFS, which is a transparent client-server
stacked cluster filesystems (transparently stacks on ext3, xfs, reiserfs, JFS,
etc.) that is fully coherent and yet caches aggressively, supports
shared read/write mapped files, and can transparently failover on node
failure.  OpenSSI has also worked with GFS and OpenGFS, including using them as
a shared root.  OpenSSI also works with Lustre and has used Lustre to support
a shared root.  OpenSSI has also integrated OCFS (Oracle cluster filesystem).
OpenSSI enforces a clusterwide file namespace without the limitation of a
superroot naming scheme.  OpenSSI has always worked with a shared root
(whether CFS, GFS or Lustre). In addition, any mount of any physical 
filesystem (ext3, xfs, etc.) or NFS filesystem done on any node is 
automatically and transparently visible by the same name on all nodes.

   A key design goal for OpenSSI was to provide a platform in which other
open source cluster technologies could be integrated, thus building an
environment suitable for all clustering needs.  Earlier, it was mentioned 
that HA-LVS has been integrated, as well as GFS, OpenGFS, Lustre and OCFS.  
In addition, OpenDLM and DRBD have been integrated.  The kernel-based 
membership capability of OpenSSI provides a set of APIs so these subsystems, 
like those already in OpenSSI, can register for node membership events 
and can co-ordinate node up and node down activities.  OpenSSI also has a 
kernel-to-kernel communication system that can be used by various 
subsystems and has RDMA capabilities ready to leverage interconnects like
Infiniband.

   Load balancing is, at some level, a point of commonality between OpenSSI 
and OpenMosix and in fact the OpenMosix load calculation algorithm was adapted 
into OpenSSI.  However, OpenSSI has connection load balancing as well as 
process load balancing.  OpenSSI also supports migrating processes with shared 
memory segment (didn't used to work in OpenMosix; may work now).  OpenSSI also
supports migrating process groups as an atomic action and supports migrating
threads (which OpenMosix may have added recently). OpenSSI also has exec-time 
load balancing as well as process migration.  Exec-time is much less expensive 
because there is no process data to migrate.  OpenSSI leverages the HA 
imalive messages to share load information between nodes on a frequent 
basis so exec-time load balancing decisions can be made.  OpenMosix has
a capability to do process load balancing based on memory pressure;
OpenSSI has not enabled that feature to date.

  As is evident above, the goals of OpenSSI are much broader than just 
the cpu sharing goal of OpenMosix.  The chief goal of OpenSSI is to be a
complete cluster solution, which means addressing availability, scalability
(sharing of all resources), manageability and usability, as well as being the
platform that other open source cluster technology can be integrated and/or
layered on.  Managability is a key cluster problem and by having such a high
degree of SSI, OpenSSI largely reduces the management problem from that of
a cluster to that of a single machine.  The shared root is key to that, along
with visibility and access to all resources of all nodes from all nodes.

  To summarize, I must re-iterate that I am no OpenMosix expert.  Nonetheless,
I have tried to capture the significant differences between the two offerings.  
A summary of the differences includes:
  - OpenSSI has a single management and administrative domain and OpenMosix
        does not;
  - OpenSSI has a single root filesystem enforced across the cluster (single
        copy of binaries, admin files (like password), etc.) and 
	OpenMosix does not;
  - OpenSSI has a single pid per process and a clusterwide process management
        space, which OpenMosix does not;
  - OpenSSI has a transparent, clusterwide namespace for all IPC objects and
        OpenMosix does not;
  - OpenSSI has clusterwide device access and a single pty namespace;  
	OpenMosix may have clusterwide device access.
  - OpenSSI has a consistent "single site" file naming across all nodes and
        OpenMosix has the superroot naming paradigm;
  - OpenSSI has transparent and fully coherent file access across all nodes
        while OpenMosix has a limited function ship file access model;
  - OpenSSI has integrated with most cluster filesystem technologies so
        there is flexibility and choices in what to run on OpenSSI.
  - OpenSSI has the kernel interfaces to allow integrating other open source
        technologies and several technologies have been integrated; 
  - OpenSSI has a highly available cluster filesystem with transparent 
        failover;  OpenMosix does not;
  - OpenSSI provides a single name and address for the cluster and that
        name/address is highly available, with persistent connections.
  - OpenSSI and OpenMosix both do process migration but OpenSSI then executes
        system calls on the new node and OpenMosix function ships most calls
	back to the home node;
  - OpenSSI has exec-time process load balancing while OpenMosix does not;
  - OpenMosix has memory pressure based process load balancing and OpenSSI has 	
	not enabled that;
  - OpenSSI has a variety of high availability features which OpenMosix does
        not, including process monitoring and restart, automatics service
	failover, automatic filesystem failover, cluster ip address
	and connection management failover, and the ability to lose a 
	home node without killing all the processes that started on it;
 - OpenSSI has strong membership guarantees and APIs for membership while
	OpenMosix does not;
 - OpenSSI has APIs for rexec() and rfork() as well a migrate while OpenMosix
	has only process migration;

   Both OpenMosix and OpenSSI have roots back to the early 1980s.  The OpenSSI
technology started at UCLA with a system called Locus.  The OpenMosix code was
adapted to Linux several years before the OpenSSI code was adapted and when
the OpenSSI Linux project was started, the question was asked "Mosix is 
already there; why do OpenSSI?"  Hopefully this document has explained why 
we believe OpenSSI is the technology base that will propel Linux to dominance 
in the clustering arena.