Re: [Clockwork-developers] Implementation of a decentralized schedule
Status: Planning
Brought to you by:
jlouder
|
From: Joel L. <jo...@lo...> - 2002-01-13 23:53:53
|
I like Brian's ideas of how to run a decentralized schedule using multicast and unicast where appropriate. Some things to consider would be: When a job finishes, does the system always post a notification about the job, or does it try to figure out if there are dependencies at other systems and only send to those systems? If job A runs on system X and job B runs on system Y when job A finishes, what happens when job A finishes, but system Y is down? If system X simply sent a multicast notification when job A was finished, we're out of luck. If we decide that the systems need to be smart enough to know who'll be starting jobs after theirs finish, then system X could resend the message until acknowledged by system Y. Using a centralized monitoring system like Brian suggested, updates to the schedule could be distributed from there as well, since it would already have knowledge of all the nodes in the schedule. And we could implement something akin to Autosys' global variables also, by distributing updates to those variables the same way we distribute updates to the schedule. (I think that's a particularly neat feature of AutoSys.) We might want to think about breaking up systems in to management units, so that if a user has 500 systems, but only wants to monitor a schedule that affects 50 of them, we don't force him to poll all the systems to get the status of that schedule. I'll have to do some reading about multicast, as I have no experience with it. -- Joel |