A quick summary of the Anubis deployer - for details look at the documentat=
ion. The Anubis deployer has two aspects.
Firstly allows users to provide state information regarding a particular VM=
as a set of attributes, these are in three categories: properties (eg x86 =
host, linux os, ...), quantities (e.g. memory available) and lists of enume=
rated values (e.g. free ports [80, 8080, 8088]).
Secondly instead of providing attributes that specify host names for locati=
ons to deploy components, attributes may be given that specify properties (=
e.g. must be linux), required quantities (need 20MB memory) and required nu=
mbers of the values (a port number).
The deployer then matches the properties, ensures that there are sufficient=
of the quantities, and sufficient of the enumerated values, and deploys on=
a host that is appropriate. On deployment, advertised quantities are reduc=
ed, the selected values are removed from the sets and put into the descript=
ion that is being deployed - ie the deployed description uses up resources.=
On termination, those resources (quantities and enumerated values) are put=
back into the advertised available values and quantities.
Intended for deploying into clusters of worker nodes where location isn't i=
mportant but capacities and certain properties are.
-----Original Message-----
From: smartfrog-developer-bounces@... [mailto:smartfrog-d=
eveloper-bounces@...] On Behalf Of Murray, Paul (HP Labs,=
Bristol)
Sent: 05 December 2007 13:10
To: Steve Loughran; Zhang Qian; smartfrog-developer
Subject: Re: [Smartfrog-developer] Questions about SmartFrog
Just a couple of corrections:
> We also have an anubisdeployer component that exists to deploy anubis it=
self.
The AnubisDeployer is a SmartFrog deployer that uses Anubis to determine wh=
ere to deploy things - it doesn't deploy Anubis.
> 3. It is currently dependent on the clocks being synchronised.
> If NTP is not working properly across all nodes, you have problems. t=
his is something that
> could be fixed, as it is less fundamental to the design than multicas=
ting.
This statement is not true (but once was). Anubis has two timing protocols:
- one times clock skew, comms delay and scheduling dely so machines with c=
locks that drift become untimely relative to one another
- one times comms delay and scheduling delay only, so clocks are allowed t=
o drift.
Paul.
Paul Murray
Hewlett-Packard Laboratories, Bristol
Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks RG12=
1HN
Registered No: 690597 England
The contents of this message and any attachments to it are confidential and=
may be legally privileged. If you have received this message in error, you=
should delete it from your system immediately and advise the sender.
To any recipient of this message within HP, unless otherwise stated you sho=
uld consider this message and attachments as "HP CONFIDENTIAL".
-----Original Message-----
From: smartfrog-developer-bounces@... [mailto:smartfrog-d=
eveloper-bounces@...] On Behalf Of Steve Loughran
Sent: 05 December 2007 12:36
To: Zhang Qian; smartfrog-developer
Subject: Re: [Smartfrog-developer] Questions about SmartFrog
Zhang Qian wrote:
> Hi All,
>
Hello!
> I have two questions about SmartFrog:
> 1. Can SmartFrog be used to synchronize configuration on many hosts? I
> have a cluster which contains hundreds of hosts, so it's very
> important to make config changes synchronoursly on these hosts. Is it
> possible to write my own SmartFrog component whose responsibility is
> communicate with my own daemons on all the hosts, and deliver config
> changes to them synchronoursly?
This is one of those really interesting areas where automated deployment ge=
ts both challenging and fun. I'm actually preparing some slides for a talk =
on that topic for presentation to undergraduates on friday -though I wont b=
e going into any details on how to get it to work.
Once you have that many hosts you can't assume that any small set of them w=
ill remain functional; if you have one or two nodes that are declared manag=
ers, you can be sure that eventually they will fail and your entire farm wi=
ll go offline. If you have a simple hierarchy of deployed components, you c=
an easily create such a failure point
What you have to do instead is make every node standalone, sharing awarenes=
s of their role amongst their peers
The Anubis component is what we use for this kind of farm management; here =
are the papers:
http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/components=
/anubis/doc/
Anubis is a partition-aware version of a tuple space; you insert facts into=
the space, the machines talk by a heartbeat and a tick after you insert a =
fact, it is shared amongst all peers, a tick after that they know you know =
that fact, and so on. if there is a partition event - a host goes away, the=
network gets split- you get notified and the tuple
space(s) that now exist have to re-evaluate who is in there and who is not.
You bring up every node in the cluster 'unpurposed' and then let them decid=
e -based on what else is live- what they are going to be. The first one cou=
ld be the resource manager that allocates work to others; then you could br=
ing up some as a database, a filestore, and then finally your application i=
tself.
All the anubis components are in the redistributables, and the sf-anubis.rp=
m, where they can be pushed out to servers. We also have an anubisdeployer =
component that exists to deploy anubis itself. There's also a nice visualiz=
er application that lets you see what is going on, and test failure recover=
y by triggering partition events.
There's some limitations of the (current) design I should flag early, two o=
f which are all based on the use of multicast IP to share information.
1. it doesnt run on Amazon EC2, as their network doesnt support multicast=
s.
2. it doesnt like farms where the machines are connected over long-haul n=
etworks. Single site networks are OK, though the more complex the network, =
the bigger the TTL has to be and the slower you need to make the heartbeat.
3. It is currently dependent on the clocks being synchronised. If NTP is =
not working properly across all nodes, you have problems. this is something=
that could be fixed, as it is less fundamental to the design than multicas=
ting.
I'd recommend you have a look at the papers and the examples; talk to us if=
you want more details or help bringing it up. We have used Anubis successf=
ully for 500+ node deployments.
>
> 2. I am trying to write a sample SmartFrog component for my own, but I
> find it can not take effect until sfDaemon restarts. I have written a
> java class which extends PrimImpl and implements Prim, also written
> related .sf file to describe the config. At first, it works fine, but
> when I change some codes in my java class, and build it by ant, run it
> by sfStart, the code I added will not take effect until I restart
> sfDaemon. Did I missed some thing or made some mistake?
It comes down to if/whether you are using dynamic code downloading, and JVM=
classloading quirks. If Java has loaded a class and there are still refere=
nces to it around somewhere, it tends to keep the old classes loaded.
if you dont use dynamic classloading then yes, you must restart the JVM to =
get the JARs reloaded.
If you do want to dynamically classload, then you must
-create a list of urls to the JAR files in the sfCodebase. There's some d=
ocumentation on this as you can specify a codebase for parsing the deployme=
nt files as a JVM property in the sfDeploy operation, and a codebase for th=
e actual process classloading. All java URLs are
supported: http:, https:, ftp:, file:
-for absolute reliability start your deployments in a new JVM. This is as s=
imple as using the sfProcessName attribute in a component/compound; it tell=
s the runtime to put everything beneath there in a different process. A new=
process is dynamically created, which will pick up all the changed files. =
This is something to try if somehow the JVM is hanging on to old class defi=
nitions after the components are unloaded.
We have ant tasks to do the start/deploy, tasks that can help set up the co=
debase. If you want some help setting up your build file, you can send us t=
hose bits of your build.xml and I'll take a look at them.
I just want to close by saying yes, big clusters is what SmartFrog can do; =
its just the changing nature of the hardware changes your deployment archit=
ecture in interesting ways.
-Steve
--
-----------------------
Hewlett-Packard Limited
Registered Office: Cain Road, Bracknell, Berks RG12 1HN Registered No: 6905=
97 England
-------------------------------------------------------------------------
SF.Net email is sponsored by: The Future of Linux Business White Paper from=
Novell. From the desktop to the data center, Linux is going mainstream. =
Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
_______________________________________________
Smartfrog-developer mailing list
Smartfrog-developer@...
https://lists.sourceforge.net/lists/listinfo/smartfrog-developer
-------------------------------------------------------------------------
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell. From the desktop to the data center, Linux is going
mainstream. Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
_______________________________________________
Smartfrog-developer mailing list
Smartfrog-developer@...
https://lists.sourceforge.net/lists/listinfo/smartfrog-developer
|