Re: [Queue-developers] Feedback requested on detailed plans and code for contrib project
Brought to you by:
wkrebs
From: Tavis B. <ta...@ma...> - 2000-09-16 05:11:03
|
On Fri, 15 Sep 2000, Monica Lau wrote: > Someone also brought up the idea of a back-up queue_manager. There > would be a master queue_manager and a slave queue_manager. The slave > queue_manager has all the information that the master queue_manager has, > so if the slave ever detects that the master has failed, it will take > over the master's role until the master starts back up again. Is this > similar to what you have in mind? I think it would do the trick. I had been thinking of just some system that would rank all machines in the cluster for taking over the queue_manager but perhaps that's overkill. A more general concern that's not specific to your system but may be relevant to the discussion is just how to avoid machines becoming specialized in a cluster. We have an imperfectly clustered system in my setup and I always have to remember, when I take down a machine, which services I have to fail over to another machine before I take it down (or after it crashes). License managers are of course a particular problem to fail over since they often depend on hardware checks to make sure they're running on the designated machine, and I'm curious how you deal with that or would deal with it (i.e., please indulge my laziness for not reading the documentation carefully enough if it's in there). So just it would be nice to not have queue be part of the solution and not another such specialized service but maybe this is only a pet peeve and not a big problem. Also, on a cluster of say 50 or 100 machines, having one backup might not be enough. But then I have no idea how well queue works on clusters that large. (Is anyone out there running such a thing?) Perhaps things like sub-clusters are more basic to such a problem than worrying about having enough backups. Anyway, sorry for thinking out loud on a large list. Great work. Cheers, Tavis |