Re: [Clockwork-developers] Implementation of a decentralized schedule
Status: Planning
Brought to you by:
jlouder
|
From: C. B. H. <br...@do...> - 2002-01-14 00:13:53
|
----- Original Message ----- From: "Joel Loudermilk" <jo...@lo...> To: <clo...@li...> Sent: Sunday, January 13, 2002 6:53 PM Subject: Re: [Clockwork-developers] Implementation of a decentralized schedule > > I like Brian's ideas of how to run a decentralized schedule using multicast > and unicast where appropriate. Some things to consider would be: > > When a job finishes, does the system always post a notification about the > job, or does it try to figure out if there are dependencies at other systems > and only send to those systems? The way I'm thinking, it would always post a notification, in case there are clients running. In other words, if you (as a client or as an agent on a node) were listening to the multicast traffic, you'd receive all the "real-time" information about the scheduler as it happens. > If job A runs on system X and job B runs on system Y when job A finishes, > what happens when job A finishes, but system Y is down? If system X simply > sent a multicast notification when job A was finished, we're out of luck. > If we decide that the systems need to be smart enough to know who'll be > starting jobs after theirs finish, then system X could resend the message > until acknowledged by system Y. I was thinking that, when job A finishes, a multicast notification would be sent out. However, if system Y is down, then when it comes back up, for any jobs that are in an activated state (to borrow an Autosys term), the agent would directly query the servers hosting the jobs that are the source of the dependency. > We might want to think about breaking up systems in to management units, so > that if a user has 500 systems, but only wants to monitor a schedule that > affects 50 of them, we don't force him to poll all the systems to get > the status of that schedule. Also, I was thinking that jobs could be assigned into logical groups. How much easier would it be to manage CHRONOS' schedule if we could load up a GUI that would, for example, only show the invoicing cycle, or some other group of jobs? > I'll have to do some reading about multicast, as I have no experience with > it. I really don't have any experience writing multicast software, but from the work I've done with Tibco I have a general understanding of how it works. The Tibco software can use either multicast or broadcast, but, in an enterprise situation, multicast gives it a lot of power (eliminates the need for the logical routing daemons we use in the CHRONOS implementation). It seems like the issue of centralization / decentralization is a kind of fork in the road of our design process. I propose we try to come up with a list of possible advantages and disadvantages of each approach, in order to help us make our decision. In addition, if people were interested, we could prototype one or both systems using, say, Java, to give ourselves a feel for how the implementation might proceed, and to possibly help us uncover problems we hadn't yet thought of. I'd be glad to compile the list of pros / cons for the group, so just send them to the mailing list, and I'll put them together. -Brian |