Re: [Clockwork-developers] Implementation of a decentralized schedule
Status: Planning
Brought to you by:
jlouder
|
From: C. B. H. <br...@do...> - 2002-01-13 16:05:46
|
----- Original Message ----- From: "Joel Loudermilk" <jo...@lo...> To: <clo...@li...> Sent: Saturday, January 12, 2002 4:11 PM Subject: [Clockwork-developers] Implementation of a decentralized schedule > Don't get me wrong, I really like the idea of not needing a central server > process and database (multiplied by two for redundancy), but I can't quite > figure out how to make this work without them. How about this: a "client" (used for monitoring) that connects to each server in an environment, based on either a predetermined configuration, or some sort of autodiscovery method. Yes, the client would need to talk to each server, but I think some way to centralize monitoring is going to be a requirement, whether the data is centralized or not. The way I see it, we have the following options if we want to decentralize things: 1) Allow the client to have a predetermined list of servers (possibly in a configuration file), and have it make TCP connections to each server as needed, sending all communications through these servers. This isn't difficult to implement, so I'd suggest that even if we go with a more complex design, we might want to implement something like this, even if only really for testing purposes. 2) Have the client broadcast on its local networks (UDP), and autodiscover servers. This might work, especially, if we make each of the servers know about each of the other servers in its environment. If you think about it, the servers will need to know which other servers are part of their environment just to execute the schedule and to know if servers are down. The client could broadcast on its local network, and then grab a server list from one server who responds, then opening TCP connections to communicate with the individual servers. 3) Implement two methods of communication: a protocol for the client to directly request information from the servers (this would be done using unicast TCP connections), and then the usage of IP multicast for the sending of event-related information between the servers and the clients. This would keep us from having to use broadcasts (to start up the client, method 1 or 2 could be used), and make it easy for all the servers to know about event that occur. 4) Use broadcasts for communications which apply to more than one server, and use unicast TCP for point-to-point communication. Of these, if we want to decentralize the application, I'm most in favor of the third option: using multicast to send event-related messages, and TCP unicast for point-to-point communications. Here's how it would work in some scenarios: Scenario 1: Job failure. The server on which the job failed sends a multicast message to the other servers in its environment (and any clients which may be running) to inform them of the failure. From there, servers could respond to the failure as necessary...running other jobs, displaying an alert (on the client), automatically notifying administrators, whatever. Scenario 2: Job dependency between two servers. On completion of the first job, its scheduling daemon sends a multicast message to the group informing them of the completion of the first job and its status. The server on which the second job runs receives this information and begins the dependent job if the first job was successful. Scenario 3: Force Start of a Job. While monitoring the distributed application, an administrator wants to start a job on demand. The client opens a unicast TCP connection to the server in question and issues the command to start the job. That server then sends a multicast message so that the other servers can act on the starting of that job if necessary. I hope this is enough for everyone to see how the idea might work. Using plain old TCP will work too, but will require a lot of connections between a lot of servers. If we can design two protocols so that we use multicast and unicast together, each where it makes the most sense to do so, I think we can accomplished the decentralized design. I don't think relying solely on point-to-point communications will be very scalable in a decentralized design. The reason is that, for an environment of n servers, we might need up to nC2 connections. Using multicast, each server listens to a single multicast group address, and a lot of the traffic would go over that connection. When point-to-point connections are required (should be relatively infrequently), they can be opened and then closed. Multicast might be a little more difficult to implement, but it will reward us in terms of scalability and resource usage. So, what does everyone think? -Brian |