Menu

Tree [96ff88] master /
 History

HTTPS access


File Date Author Commit
 client 2012-06-07 Ryan McCabe Ryan McCabe [fb992c] Add a delay (-w) option.
 common 2012-06-01 Ryan McCabe Ryan McCabe [f61626] Add a TCP listener plugin for use with viosproxy
 config 2012-02-07 Lon Hohberger Lon Hohberger [af48d0] Add 'interface' directive to example.conf
 include 2012-06-07 Ryan McCabe Ryan McCabe [fb992c] Add a delay (-w) option.
 man 2012-07-25 Lon Hohberger Lon Hohberger [96ff88] Improve fence_virt.conf man page description of...
 server 2012-06-01 Ryan McCabe Ryan McCabe [f61626] Add a TCP listener plugin for use with viosproxy
 utils 2009-09-17 Lon Hohberger Lon Hohberger [088799] Add simple tarball / release script
 COPYING 2009-09-17 Lon Hohberger Lon Hohberger [f3e956] Add missing COPYING file; update TODO
 Makefile.in 2012-02-07 Lon Hohberger Lon Hohberger [9522be] Add systemd unit file and generation
 Makefile.top.in 2009-12-04 Lon Hohberger Lon Hohberger [a9e609] Work around broken nspr headers
 README 2011-09-19 Lon Hohberger Lon Hohberger [2118df] Update README
 TODO 2012-02-07 Lon Hohberger Lon Hohberger [9522be] Add systemd unit file and generation
 architecture.txt 2009-08-20 Lon Hohberger Lon Hohberger [8f1ade] Add architecture overview description
 autogen.sh 2009-09-17 Lon Hohberger Lon Hohberger [2b42e9] Ignore automake error
 build 2009-12-04 Lon Hohberger Lon Hohberger [57d2bc] Fix default build script
 configure.in 2012-06-01 Ryan McCabe Ryan McCabe [f61626] Add a TCP listener plugin for use with viosproxy
 fence_virt.txt 2009-09-17 Lon Hohberger Lon Hohberger [211dd9] Update TODO and requirements file
 fence_virtd.init.in 2010-01-12 Lon Hohberger Lon Hohberger [6656a7] Add Fedora init script
 fence_virtd.service.in 2012-02-07 Lon Hohberger Lon Hohberger [19674a] Fix startup in systemd environments

Read Me

TODO: update

I. Fence_xvm - Virtual machine fencing agent

Fence_xvm is an agent which establishes a communications link between
a cluster of virtual machines (VC) and a cluster of domain0/physical
nodes which are hosting the virtual cluster.  Its operations are
fairly simple.

  (a) Start a listener service.
  (b) Send a multicast packet requesting that a VM be fenced.
  (c) Authenticate client.
  (e) Read response.
  (f) Exit with success/failure, depending on the response received.

If any of the above steps fail, the fencing agent exits with a failure
code and fencing is retried by the virtual cluster at a later time.
Because of the simplicty of fence_xvm, it is not necessary that
fence_xvm be run from within a virtualized guest - all it needs is
libnspr and libnss and a shared private key (for authentication; we
would hate to receive a false positive response from a node not in the
cluster!).


II. Fence_virtd - Virtual machine fencing host

Fence_virtd is a daemon which runs on physical hosts (e.g. in domain0)
of the cluster hosting the virtual cluster.  It listens on a port
for multicast traffic from virtual cluster(s), and takes actions.
Multiple disjoint virtual clusters can coexist on a single physical
host cluster, but this requires multiple instances of fence_virtd.

NOTE: fence_virtd *MUST* be run on ALL nodes in a given cluster which
will be hosting virtual machines if fence_xvm is to be used for 
fencing!

There are a couple of ways the multicast packet is handled,
depending on the state of the host OS.  It might be hosting the VM,
or it might not.  Furthermore, the VM might "reside" on a host which
has failed.

In order to be able to guarantee safe fencing of a VM even if the
last- known host is down, we must store the last-known locations of
each virtual machine in some sort of cluster-wide way.  For this, we
use the AIS Checkpointing API, which is provided by OpenAIS.  Every
few seconds, fence_virtd queries the hypervisor via libvirt and
stores any local VM states in a checkpoint.  In the event of a
physical node failure (which consequently causes the failure of one
or more guests), we can then read the checkpoint section corresponding
to the guest we need to fence to find out the previous owner.  With
that information, we can then check with CMAN to see if the last-
known host node has been fenced.  If so, then the VM is clean as well.
The physical cluster must, therefore, have fencing in order for
fence_virtd to work.

Operation of a node hosting a VM which needs to be fenced:
  
  (a) Receive multicast packet
  (b) Authenticate multicast packet
  (c) Open connection to host contained within multicast
      packet.
  (d) Authenticate server.
  (e) Carry out fencing operation (e.g. call libvirt to destroy or
      reboot the VM; there is no "on" method at this point).
  (f) If operation succeeds, send success response.

Operation of high-node-ID:

  (a) Receive multicast packet
  (b) Authenticate multicast packet
  (c) Read VM state from checkpoint
  (d) Check liveliness of nodeID hosting VM (if alive, do nothing)
  (e) Open connection to host contained within multicast
      packet.
  (f) Check with CMAN to see if last-known host has been fenced.
  (g) If last-known host has been fenced, send success response.
  (h) Authenticate server & send response.

NOTE: There is always a possibility that a VM is started again
before the fencing operation and checkpoint update for that VM
occurs.  If the VM has booted and rejoined the cluster, fencing will
not be necessary.  If it is in the process of booting, but has not
yet joined the cluster, fencing will also not be necessary - because
it will not be using cluster resources yet.


III. Security considerations

While fencing is generally expected to run on a more or less trusted
network, there are cases where it may not be.

* The multicast packet is subject to replay attacks, but because no
fencing action is taken based solely on the information contained
within the packet, this should not allow an attacker to maliciously
fence a VM from outside the cluster, though it may be possible to
cause a DoS of fence_virtd if enough multicast packets are sent.

* The only currently supported authentication mechanisms are simple
challenge-response based on a shared private key and pseudorandom
number generation.

* An attacker with access to the shared key(s) can easily fence any
known VM, even if they are not on a cluster node.

* Different shared keys should be used for different virtual
clusters on the same subnet (whether in the same physical cluster
or not).  Additionally, multiple fence_virtd instances must be run
(each listening on a different multicast IP + port combination).

IV.  Configuration

Generate a random key file.  An example of how to generate it is:

    dd if=/dev/urandom of=/etc/cluster/fence_xvm.key bs=4096 count=1

Distribute the generated key file to all domUs in a cluster as well
as all dom0s which will be hosting that particular cluster of domUs.
The key should not be placed on shared file systems (because shared
file systems require the cluster, which requires fencing...).

Start fence_virtd on all hosts

Configure fence_xvm on the domU cluster...

rest...tbd