[Planetlab-tools] Re: [Planetlab-users] RE: [Planetlab-support] Re: Planetlab related questions

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

	> I'm sure Vivek also has a bunch of machinery to keep CoDeeN running 
	> all the time on PlanetLab.  Vivek?

One question is which of these are general enough to be provided for
experimenters, at some layer.  Feedback on any of this would be
welcome.

Vivek said:
	We have the following:
	a) a monitoring process on each node that tries to make sure that
	    all of the CoDeeN processes are alive, and restarts it if they
	    aren't
	b) a centrally-run sweep that checks every node every five minutes
	    to make sure that the monitoring process is alive, and it restarts
	    everything if the process is dead
	c) version numbers in the intra-CoDeeN communications protocol,
	    such that nodes with different versions ignore each other
	d) a daily sweep of all "important" files in CoDeeN - we checksum
	    each file on each node, and decide majorities, quorums, etc

Emulab doesn't do any of these currently.  (c) and (d) seem pretty
application specific; (c) could be supported with some library help
but doesn't seem worth it-- although the mechanism and APIs would fit
right in with per-experiment (per-slice) port space allocation, which
is definitely worth it.

However, (a)'s notion of a distinguished process on each node and
(b)'s check for it seem both general and easy to generalize, and they
would be easy to support in our structure, which already has a notion
of a startup program.

	...
	We don't do any automatic "get the latest version" kind of checks,
	because we often will stage our rollout of new versions, or we test
	our alpha code on a few live nodes from time to time.

	Our update process consists of scp'ing a set of files to all of the
	nodes, and then doing (on each node)
	1) stop all processes
	2) copy the new files into place
	3) restart all processes

The Emulab interface to Plab automatically provides the above for you,
if you ask for it.  1 and 3 are "reboot".  2 occurs as a side effect
of specifying initial node state with RPMs or tarballs; at reboot any
changed state is installed.

These actions can be requested on one node, your whole experiment (a
Plab "dynamic slice"), or any set of nodes in your experiment.

There are several easy dynamic or static ways to control state update,
e.g., to separate alpha and production nodes, or to split node state
into that which is essentially static and that which is more labile.
These include associating different sets of nodes with different sets
of RPMs/tarballs, or simply using a separate experiment for each
node set.

IIRC, the whole thing takes about 12 seconds on a node with little
state to update.  It's parallelized across all the nodes, with a
little chunk serialization added for reliability at large scale.
Reboot happens immediately.  You can also trigger state update w/o
reboot, which can take up to 3 seconds for nodes to learn.

	The one thing we don't do right now is grab all of our log files and
	store them centrally, but this has not been much of an issue yet. We'll
	probably start doing that soon, though, just so we can free up disk
	space. Our compressed logs are approaching 500MB on some nodes.

We have SFS on our FreeBSD widearea nodes (RON and others), but not
yet for Plab nodes.  Used directly for log files, I assume it would
have scalability problems.  It would be scalable as the target of a
serialized copy.  We don't have anything for scalable parallel node
state capture, although we're going to be working on that for the
local Emulab cluster, so the high-level structure would be there.