planetlab-tools Mailing List for PlanetLab
Brought to you by:
alklinga
You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
(1) |
Dec
|
From: Brent N. C. <bn...@in...> - 2003-11-13 09:55:50
|
Here are some tools that might be of interest to newbies on PlanetLab. While I'm sure many people already have comparable or more advanced versions of these, these tools do offer the advantage of being simple and easy to use. Anyhow, here they are: http://www.theether.org/pssh bnc |
From: Jay L. <le...@cs...> - 2003-10-08 15:09:32
|
> I'm sure Vivek also has a bunch of machinery to keep CoDeeN running > all the time on PlanetLab. Vivek? One question is which of these are general enough to be provided for experimenters, at some layer. Feedback on any of this would be welcome. Vivek said: We have the following: a) a monitoring process on each node that tries to make sure that all of the CoDeeN processes are alive, and restarts it if they aren't b) a centrally-run sweep that checks every node every five minutes to make sure that the monitoring process is alive, and it restarts everything if the process is dead c) version numbers in the intra-CoDeeN communications protocol, such that nodes with different versions ignore each other d) a daily sweep of all "important" files in CoDeeN - we checksum each file on each node, and decide majorities, quorums, etc Emulab doesn't do any of these currently. (c) and (d) seem pretty application specific; (c) could be supported with some library help but doesn't seem worth it-- although the mechanism and APIs would fit right in with per-experiment (per-slice) port space allocation, which is definitely worth it. However, (a)'s notion of a distinguished process on each node and (b)'s check for it seem both general and easy to generalize, and they would be easy to support in our structure, which already has a notion of a startup program. ... We don't do any automatic "get the latest version" kind of checks, because we often will stage our rollout of new versions, or we test our alpha code on a few live nodes from time to time. Our update process consists of scp'ing a set of files to all of the nodes, and then doing (on each node) 1) stop all processes 2) copy the new files into place 3) restart all processes The Emulab interface to Plab automatically provides the above for you, if you ask for it. 1 and 3 are "reboot". 2 occurs as a side effect of specifying initial node state with RPMs or tarballs; at reboot any changed state is installed. These actions can be requested on one node, your whole experiment (a Plab "dynamic slice"), or any set of nodes in your experiment. There are several easy dynamic or static ways to control state update, e.g., to separate alpha and production nodes, or to split node state into that which is essentially static and that which is more labile. These include associating different sets of nodes with different sets of RPMs/tarballs, or simply using a separate experiment for each node set. IIRC, the whole thing takes about 12 seconds on a node with little state to update. It's parallelized across all the nodes, with a little chunk serialization added for reliability at large scale. Reboot happens immediately. You can also trigger state update w/o reboot, which can take up to 3 seconds for nodes to learn. The one thing we don't do right now is grab all of our log files and store them centrally, but this has not been much of an issue yet. We'll probably start doing that soon, though, just so we can free up disk space. Our compressed logs are approaching 500MB on some nodes. We have SFS on our FreeBSD widearea nodes (RON and others), but not yet for Plab nodes. Used directly for log files, I assume it would have scalability problems. It would be scalable as the target of a serialized copy. We don't have anything for scalable parallel node state capture, although we're going to be working on that for the local Emulab cluster, so the high-level structure would be there. |
From: Adams, R. <rob...@in...> - 2003-10-06 21:46:24
|
Below, Vivek describes some of the mechanisms his team has created to keep the CoDeeN service running on all nodes. What are the rest of the PlanetLab users doing to keep their services running? If several people could describe what they're doing (and possibly provide some code) I'll pull together a HOWTO. -- RA -----Original Message----- From: pla...@li... [mailto:pla...@li...]On Behalf Of Bowman, Mic Sent: Monday, October 06, 2003 9:04 AM To: Vivek Pai; Brent N. Chun Cc: sk...@cs...; pla...@li... Subject: [Planetlab-users] RE: [Planetlab-support] Re: Planetlab related questions moving this discussion to planetlab-users... --Mic -----Original Message----- From: pla...@li... [mailto:pla...@li...] On Behalf Of Vivek Pai Sent: Saturday, October 04, 2003 08:20 PM To: Brent N. Chun Cc: sk...@cs...; pla...@li... Subject: Re: [Planetlab-support] Re: Planetlab related questions Brent N. Chun wrote: > I'm sure Vivek also has a bunch of machinery to keep CoDeeN running=20 > all the time on PlanetLab. Vivek? This may be more than what you bargained for :-) We have the following: a) a monitoring process on each node that tries to make sure that all of the CoDeeN processes are alive, and restarts it if they aren't b) a centrally-run sweep that checks every node every five minutes to make sure that the monitoring process is alive, and it restarts everything if the process is dead c) version numbers in the intra-CoDeeN communications protocol, such that nodes with different versions ignore each other d) a daily sweep of all "important" files in CoDeeN - we checksum each file on each node, and decide majorities, quorums, etc The last two items make sure that if a node is unreachable for a while (especially while we do an upgrade), it won't cause too much damage when it comes back up. It'll generally be ignored by the other nodes for a day, and we'll catch it in our admin e-mail the next morning. We don't do any automatic "get the latest version" kind of checks, because we often will stage our rollout of new versions, or we test our alpha code on a few live nodes from time to time. Our update process consists of scp'ing a set of files to all of the nodes, and then doing (on each node) 1) stop all processes 2) copy the new files into place 3) restart all processes This lets us have downtimes of about 20 seconds per node when we do a rollout of new code, and it works pretty well. We did have a weird case where the node died in step 2, leaving only some files updated. When it came back up, it refused to be restarted by step (b) above, but when step (d) did the checks, we saw the problem right away. The one thing we don't do right now is grab all of our log files and store them centrally, but this has not been much of an issue yet. We'll probably start doing that soon, though, just so we can free up disk space. Our compressed logs are approaching 500MB on some nodes. -Vivek |
From: Adams, R. <rob...@in...> - 2003-01-06 23:58:33
|
By way of introduction, I am Robert Adams and I am a member of the Planetlab team (up in Oregon) and I met most of you at the Planetlab meeting on Boston. My interest is the easy use and management of Planetlab. I have written a set of PERL scripts to deploy and control a service on Planetlab (see the contribution page). It was built for us to create an administration slice for Planetlab. The 'tools' mailing list is for us to share and distribute tools people have developed to distribute, manage and control Planetlab applications and services. Additionally, this is the forum to discuss the types of tools that would be nice to have. We should see discussions on: * tools for application/service distribution (push service onto Planetlab, ...) * libraries for development of services (distributed store discovery, ...) * libraries for access to PlanetLab services and infrastructure (get list of nodes, ...) and any other topics that make the creation of Planetlab applications easier. If there are tools you would like see, make a suggestion here in this list. We have two contributions in the Planetlab contribution list (http://www.planet-lab.org/php/contrib/contrib.php). If anyone has tools they are using, please contribute them using the contribution submission page. Robert Adams mailto:Rob...@in... MS JF2-58 2111 NE 25th Ave Hillsboro, OR 97124 USA phone: +1.503.264.2682; cell: +1.503.781.9914; FAX: +1.503.264.1578 |
From: Brent N. C. <bn...@in...> - 2002-12-18 14:22:41
|