From: Guijarro, J. <jul...@hp...> - 2007-12-07 10:55:21
|
Hi Qian, What kind of configuration data are you talking about? Is it lots of data o= r small sets of attribute value pairs? One way that we have used Anubis is to propagate changes in the configurati= on data so that all the "master nodes" can see those changes and cache the = changes locally. This is simple to do with Anubis because of the guaranties= and consistency of the Anubis notifications. You could probably do somethi= ng similar using your notification mechanism. Then we have, as you mentioned, components to operate with the file system = and/or with ftp/ssh/... that could be extended to meet your needs. One inte= resting component could be a wrapper for rsync but this won't help you that= much in n+1 configurations. Other possibilities are: use simple multicast to announce changes in your c= onfiguration data or use RSS feeds to propagate these changes. The right solution will depends exactly on your architecture and type/amoun= t of data to synchronize. Regards, Julio Guijarro -----Original Message----- From: sma...@li... [mailto:smartfrog-d= eve...@li...] On Behalf Of Zhang Qian Sent: 07 December 2007 06:06 To: Steve Loughran Cc: smartfrog-developer Subject: Re: [Smartfrog-developer] Questions about SmartFrog > I see. How does the management console deal with failure of the master? > Does it discover it using some discovery protocol, or is the active > master expected to update a dynamic DNS entry? Yes, we deal with this issue by DNS way. Today I took a look at Anubis document. As my understanding, It seems Anubis is a notification service and provides a detection mechanism for distributed system. But in my cluster, we have already had this kind of mechanism for detecting the status of our key daemons, dealing with master failure, etc. We don't want to change that, just want to remove the shared-file system dependency. Anubis looks a little big for this request. As I know, SmartFrog has shipped some inbuild services for file operation, downloading in its package. I am wondering it is possible to fulfill my request by writing a SmartFrog which just extends these inbuilt service. Thanks, Qian ------------------------------------------------------------------------- SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ Smartfrog-developer mailing list Sma...@li... https://lists.sourceforge.net/lists/listinfo/smartfrog-developer -----Original Message----- From: sma...@li... [mailto:smartfrog-d= eve...@li...] On Behalf Of Steve Loughran Sent: 06 December 2007 13:15 Cc: smartfrog-developer Subject: Re: [Smartfrog-developer] Questions about SmartFrog Zhang Qian wrote: > Hi All, > > Thanks for your replies about this topic. > > I'd like to share more details about my cluster with you. > As you know, it's a cluster includes hundreds of nodes. We divide > these nodes into > two categories: management nodes and computing nodes. I see. We've tend to prefer the tactic of letting any node become a master (with agreement), because stops you having to decide which machines are in charge. Whatever boots up first can take over. > For computing nodes, they just run the task arranged to them, do not > have management roles, so we don't care it in this case. OK -the workers are expected to fail and are told what to do; if they go away then something else gets the job. > > For management nodes, we have a dozen of this kind of nodes in the > cluster. Only > one of them is the master node whose responsibility is to manage the > entirely cluster, others are just the master candidates. The reason we > do it in this way is to avoid single point failure, once the master > node fails, a master candidate will take over its job, and become the > new master node. So we have the heartbeat mechanism to detect the node > status to realize fail-over. OK. You're giving one machine charge of the resource management problem, but by sharing the data amongst candiates, if the master goes away you can have an election of some sort to decide who is the new master. > Now there is a limitation: our cluster relies on shared-file system(such = as NFS) > which can be accessed by all the management nodes.That means all the conf= ig > files placed on the shared-file system, all the management nodes need the= se > config files. It's the master node's responsibilityto update these config= file > according to user's request, after a fail-over, the new master node > will read these > config file to know the latest configuration. ah, so 1. the NFS filestore is a failure point 2. you need to save the configuration to a filesystem that doesnt go out of its way to enable locking > > Now we want to remove the shared-file system dependency, each management > node has config files in its local file system. So obviously, we need > a mechanism > to synchronize these config files on all the management nodes. That's > why I asked > that questions. > I don't know whether there is a inbuilt component or service can > provide this kind of mechanism in SmartFrog. Certainly I will > investigate Anubis first, thanks for your sharing. This is what anubis is designed for, to make a cluster out of a set of machines on a LAN. The papers and Paul can provide more details. > In addition, we have had a management console for user which will > communicate with our daemon in the master node, and deliver config > change to that daemon. > After receive the config change, this daemon will verify and activate > the change first, > then write it into the config file placed on the shared-file system. I see. How does the management console deal with failure of the master? Does it discover it using some discovery protocol, or is the active master expected to update a dynamic DNS entry? ------------------------------------------------------------------------- SF.Net email is sponsored by: The Future of Linux Business White Paper from Novell. From the desktop to the data center, Linux is going mainstream. Let it simplify your IT future. http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4 _______________________________________________ Smartfrog-developer mailing list Sma...@li... https://lists.sourceforge.net/lists/listinfo/smartfrog-developer -----Original Message----- From: sma...@li... [mailto:smartfrog-d= eve...@li...] On Behalf Of Zhang Qian Sent: 06 December 2007 03:31 To: Steve Loughran Cc: smartfrog-developer Subject: Re: [Smartfrog-developer] Questions about SmartFrog In addition, we have had a management console for user which will communicate with our daemon in the master node, and deliver config change to that daemon. After receive the config change, this daemon will verify and activate the change first, then write it into the config file placed on the shared-file system. This is what we are doing, but we want to remove shared-file system depende= ncy. Thanks and Regards, Qian ------------------------------------------------------------------------- SF.Net email is sponsored by: The Future of Linux Business White Paper from Novell. From the desktop to the data center, Linux is going mainstream. Let it simplify your IT future. http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4 _______________________________________________ Smartfrog-developer mailing list Sma...@li... https://lists.sourceforge.net/lists/listinfo/smartfrog-developer -----Original Message----- From: sma...@li... [mailto:smartfrog-d= eve...@li...] On Behalf Of Zhang Qian Sent: 06 December 2007 02:58 To: Steve Loughran Cc: smartfrog-developer Subject: Re: [Smartfrog-developer] Questions about SmartFrog Hi All, Thanks for your replies about this topic. I'd like to share more details about my cluster with you. As you know, it's a cluster includes hundreds of nodes. We divide these nodes into two categories: management nodes and computing nodes. For computing nodes, they just run the task arranged to them, do not have management roles, so we don't care it in this case. For management nodes, we have a dozen of this kind of nodes in the cluster. Only one of them is the master node whose responsibility is to manage the entirely cluster, others are just the master candidates. The reason we do it in this way is to avoid single point failure, once the master node fails, a master candidate will take over its job, and become the new master node. So we have the heartbeat mechanism to detect the node status to realize fail-over. Now there is a limitation: our cluster relies on shared-file system(such as= NFS) which can be accessed by all the management nodes.That means all the config files placed on the shared-file system, all the management nodes need these config files. It's the master node's responsibilityto update these config f= ile according to user's request, after a fail-over, the new master node will read these config file to know the latest configuration. Now we want to remove the shared-file system dependency, each management node has config files in its local file system. So obviously, we need a mechanism to synchronize these config files on all the management nodes. That's why I asked that questions. I don't know whether there is a inbuilt component or service can provide this kind of mechanism in SmartFrog. Certainly I will investigate Anubis first, thanks for your sharing. Regards, Qian ------------------------------------------------------------------------- SF.Net email is sponsored by: The Future of Linux Business White Paper from Novell. From the desktop to the data center, Linux is going mainstream. Let it simplify your IT future. http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4 _______________________________________________ Smartfrog-developer mailing list Sma...@li... https://lists.sourceforge.net/lists/listinfo/smartfrog-developer |
From: Guijarro, J. <jul...@hp...> - 2007-12-10 15:57:52
|
HI Qian, I think your solution should work. My suggestion or improvement would be to use multicast to propagate the cha= nges directly to all the management nodes and instead of writing it to a fi= le sending the file why not to send the .sf file directly to all the manage= ment nodes. Another thing that you could do is to use the console to send the new .sf f= ile to all the nodes (you could do this in text form of in ComponentDescrip= tion form if the recipients are sf Components). The master node would act o= n the new configuration and the other nodes would just store the configurat= ion locally. In this way each management node would have a cached copy of t= he cluster configuration. Another thing that you will need to add is how to recover a management node= from a failure, for example should a recovered node get a full copy of the= entire configuration from the master node or peer or should it try to re-s= ynch it config data. The first option is probably easier. This is quite easy to do with Anubis because its protocol guaranties that w= hat you received is the same that the other nodes in your "partition" see a= nd it simplifies what you have to do to avoid errors when synchronizing you= r configuration data in the cluster but as I said, it could work equally we= ll with your own protocol or with some other form of multicast and extra pr= ogramming. Regards, Julio Guijarro -----Original Message----- From: Zhang Qian [mailto:zhq...@gm...] Sent: 09 December 2007 02:15 To: Guijarro, Julio Cc: Steve Loughran; smartfrog-developer; sma...@li...urceforge.= net Subject: Re: [Smartfrog-developer] Questions about SmartFrog Hi Julio, The configuration data of my cluster are small sets of attribute value pairs, not lots of data. The data amount is not large, but we really need the reliability. Usually, I make config change in the management console of my cluster, then this console will communicate with the daemon in the master node, and send the config change to it. The daemon will activate the change and write them in the config file stored in the NFS. Then other management nodes will also see the changes. But obviously, NFS coulde be a single-point failure of my cluster. Now I am trying to change this flow. The config change I make in the management console will be saved as a .sf file, then I will run my own SmartFrog component which extends some SmartFrog inbuilt services. This component will get the config change by parsing the .sf file and send the change to the daemon in master node. The daemon will activate this change, then the component I mentioned before will write the change to the local file, and propagate this file to all the manangement nodes. Any suggestions about this approach?:-) Thanks=1B$B!*=1B(B Regards, Qian |