Re: [Smartfrog-developer] Questions about SmartFrog

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Just a couple of corrections:

>  We also have an anubisdeployer component that exists to deploy anubis it=
self.

The AnubisDeployer is a SmartFrog deployer that uses Anubis to determine wh=
ere to deploy things - it doesn't deploy Anubis.

>  3. It is currently dependent on the clocks being synchronised.
>     If NTP is not working properly across all nodes, you have problems. t=
his is something that
>     could be fixed, as it is less fundamental to the design than multicas=
ting.

This statement is not true (but once was). Anubis has two timing protocols:
 - one times clock skew, comms delay and scheduling dely so machines with c=
locks that drift become untimely relative to one another
 - one times comms delay and scheduling delay only, so clocks are allowed t=
o drift.

Paul.

Paul Murray
Hewlett-Packard Laboratories, Bristol

Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks RG12=
 1HN
Registered No: 690597 England

The contents of this message and any attachments to it are confidential and=
 may be legally privileged. If you have received this message in error, you=
 should delete it from your system immediately and advise the sender.

To any recipient of this message within HP, unless otherwise stated you sho=
uld consider this message and attachments as "HP CONFIDENTIAL".

-----Original Message-----
From: sma...@li... [mailto:smartfrog-d=
eve...@li...] On Behalf Of Steve Loughran
Sent: 05 December 2007 12:36
To: Zhang Qian; smartfrog-developer
Subject: Re: [Smartfrog-developer] Questions about SmartFrog

Zhang Qian wrote:
> Hi All,
>

Hello!

> I have two questions about SmartFrog:
> 1. Can SmartFrog be used to synchronize configuration on many hosts? I
> have a cluster which contains hundreds of hosts, so it's very
> important to make config changes synchronoursly on these hosts. Is it
> possible to write my own SmartFrog component whose responsibility is
> communicate with my own daemons on all the hosts, and deliver config
> changes to them synchronoursly?

This is one of those really interesting areas where automated deployment ge=
ts both challenging and fun. I'm actually preparing some slides for a talk =
on that topic for presentation to undergraduates on friday -though I wont b=
e going into any details on how to get it to work.

Once you have that many hosts you can't assume that any small set of them w=
ill remain functional; if you have one or two nodes that are declared manag=
ers, you can be sure that eventually they will fail and your entire farm wi=
ll go offline. If you have a simple hierarchy of deployed components, you c=
an easily create such a failure point

What you have to do instead is make every node standalone, sharing awarenes=
s of their role amongst their peers

The Anubis component is what we use for this kind of farm management; here =
are the papers:
http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/components=
/anubis/doc/

Anubis is a partition-aware version of a tuple space; you insert facts into=
 the space, the machines talk by a heartbeat and a tick after you insert a =
fact, it is shared amongst all peers, a tick after that they know you know =
that fact, and so on. if there is a partition event - a host goes away, the=
 network gets split- you get notified and the tuple
space(s) that now exist have to re-evaluate who is in there and who is not.

You bring up every node in the cluster 'unpurposed' and then let them decid=
e -based on what else is live- what they are going to be. The first one cou=
ld be the resource manager that allocates work to others; then you could br=
ing up some as a database, a filestore, and then finally your application i=
tself.

All the anubis components are in the redistributables, and the sf-anubis.rp=
m, where they can be pushed out to servers. We also have an anubisdeployer =
component that exists to deploy anubis itself. There's also a nice visualiz=
er application that lets you see what is going on, and test failure recover=
y by triggering partition events.

There's some limitations of the (current) design I should flag early, two o=
f which are all based on the use of multicast IP to share information.
  1. it doesnt run on Amazon EC2, as their network doesnt support multicast=
s.
  2. it doesnt like farms where the machines are connected over long-haul n=
etworks. Single site networks are OK, though the more complex the network, =
the bigger the TTL has to be and the slower you need to make the heartbeat.
  3. It is currently dependent on the clocks being synchronised. If NTP is =
not working properly across all nodes, you have problems. this is something=
 that could be fixed, as it is less fundamental to the design than multicas=
ting.

I'd recommend you have a look at the papers and the examples; talk to us if=
 you want more details or help bringing it up. We have used Anubis successf=
ully  for 500+ node deployments.

>
> 2. I am trying to write a sample SmartFrog component for my own, but I
> find it can not take effect until sfDaemon restarts. I have written a
> java class which extends PrimImpl and implements Prim, also written
> related .sf file to describe the config. At first, it works fine, but
> when I change some codes in my java class, and build it by ant, run it
> by sfStart, the code I added will not take effect until I restart
> sfDaemon. Did I missed some thing or made some mistake?

It comes down to if/whether you are using dynamic code downloading, and JVM=
 classloading quirks. If Java has loaded a class and there are still refere=
nces to it around somewhere, it tends to keep the old classes loaded.

if you dont use dynamic classloading then yes, you must restart the JVM to =
get the JARs reloaded.

If you do want to dynamically classload, then you must

  -create a list of urls to the JAR files in the sfCodebase. There's some d=
ocumentation on this as you can specify a codebase for parsing the deployme=
nt files as a JVM property in the sfDeploy operation, and a codebase for th=
e actual process classloading. All java URLs are
supported: http:, https:, ftp:, file:

-for absolute reliability start your deployments in a new JVM. This is as s=
imple as using the sfProcessName attribute in a component/compound; it tell=
s the runtime to put everything beneath there in a different process. A new=
 process is dynamically created, which will pick up all the changed files. =
This is something to try if somehow the JVM is hanging on to old class defi=
nitions after the components are unloaded.

We have ant tasks to do the start/deploy, tasks that can help set up the co=
debase. If you want some help setting up your build file, you can send us t=
hose bits of your build.xml and I'll take a look at them.

I just want to close by saying yes, big clusters is what SmartFrog can do; =
its just the changing nature of the hardware changes your deployment archit=
ecture in interesting ways.

-Steve

--
-----------------------
Hewlett-Packard Limited
Registered Office: Cain Road, Bracknell, Berks RG12 1HN Registered No: 6905=
97 England

-------------------------------------------------------------------------
SF.Net email is sponsored by: The Future of Linux Business White Paper from=
 Novell.  From the desktop to the data center, Linux is going mainstream.  =
Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
_______________________________________________
Smartfrog-developer mailing list
Sma...@li...
https://lists.sourceforge.net/lists/listinfo/smartfrog-developer