clockwork-developers Mailing List for Clockwork
Status: Planning
Brought to you by:
jlouder
You can subscribe to this list here.
| 2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
(6) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2002 |
Jan
(16) |
Feb
|
Mar
(10) |
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
(4) |
| 2003 |
Jan
(9) |
Feb
(4) |
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Shawn M. <syb...@gm...> - 2012-08-09 19:29:16
|
I'm gonna go out on a limb and say we're not actually gonna do this. Right? |
|
From: Shawn M. <smc...@ei...> - 2003-03-29 20:21:02
|
I've remembered a pet peeve about Autosys. If you have a job on hold, and you want to change it to on ice, and the flow has come to that job, the job starts instead of going on ice. That is not what most people expect when they try to put the job on ice. Let's make sure clockwork doesn't do that. Instead, the job should go on ice, and the flow should proceed past it. --=20 Shawn McMahon | All US citizens should immediately start open- Episode IV Consulting | signing their email messages as a voluntary act System Administrator | of patriotic duty. - Dr. Michael L. Love and all-around nice guy | http://gnu-darwin.sourceforge.net/war.html |
|
From: Joel L. <jo...@lo...> - 2003-02-23 00:19:27
|
Shawn McMahon and I had a conversation weeks ago about logging. Depending on how our users do their monitoring, they may need our software to log to any number of different formats/destinations. Shawn came up with a pretty comprehensive list: * file * syslog * SNMP trap * Windows event log That should satisfy just about everyone. I can't imagine anyone who's serious about monitoring their hosts and applications not being able to pick up error messages from at least one of those sources. The user should be able to configure multiple logging destinations, and set a minimum severity for each. For example, it should be possible to configure logging like: + DEBUG or higher -> /var/log/clockwork.log + INFO or higher -> syslog + WARNING or higher -> send as SNMP trap Having different places to log, each with their own semantics for severity, will require some abstraction so that Clockwork can log to a common logging API, and that thing can figure out how to translate it to the API of a particular backend. Luckily, the nice folks at Apache's Jakarta project have already done just this with the Commons Logging component. It's described as "an ultra-thin bridge between different logging libraries." They have created the common logging API, and make it easy for you to write a little code that plugs in whatever backend you want to use to their logging framework. Incidentally, they've already got the "plug-in" for Log4J written, which handles file and syslog logging. So all that's left is SNMP traps and Windows event log. I did a little digging, and I'm afraid there's no way to do the Windows event log in pure Java -- a .DLL must be involved. If you're interested in doing some reading, here are a couple of references: http://jakarta.apache.org/commons/logging.html http://jakarta.apache.org/log4j/index.html -- Joel |
|
From: Joel L. <jo...@lo...> - 2003-02-23 00:03:40
|
The other day I was thinking about how the Clockwork servers would store their configuration data, which would be replicated between all the servers. I made some notes on this, and will share them with everyone. First off, when I talk about the configuration of the server, I'm not talking about the job schedule. I'm talking about settings like: which mail server to use, lists of clients and port numbers, which ciphers to use for SSL. Since this configuration applies to the entire schedule "instance", it needs to be replicated between all the servers. That makes it a natural fit for one of the Berkeley databases, which are already getting replicated. So now all that needs to be determined is how to modify the configuration. For this, I'll borrow from some features of Veritas Cluster Server, which lets you modify just about any aspect of configuration while the server is running by using commands that really have no idea what you're tweaking. Here's what I mean: We put all the configuration data into a separate Berekely DB, called "config" for example. Into the config database we store each item of the configuration, storing the item name and then it's data. The data could be one of a few "types": scalar, list, or hash. All any program that manipulates the config database needs to know is how to add/modify/remove these configuration items. Here's an example: when the scheduler needs to know what mail server to use, it pulls the configuration item called "SMTPServer" out of the config database. It's value is a scalar, and is the hostname of the SMTP server. When the scheduler needs to know who all the clients are, it pulls out the value of "Clients", which is a list of hostnames. So the interface that reads and writes the configuration can be very simple, and independent of the configuration data. I've glossed over exactly who it is that's updating the config database -- is it the running server, or is it a command-line program that's directly manipulating the database file? Most of the time, it should be the running server. The config database is replicated, so it's a bad idea to go mucking around with the database file behind the server's back. But there could be times when all servers are down, and you need to make a modification without starting the server. So we'll need that capability as well. As a safeguard, the config database should include a flag to indicate whether the server is running. If a user tries to use the file-modifying configuration method and the flag is set, the utility can refuse to make the change. Of course, there will need to be a way for the user to explicitly ask to clear the flag, for situations where the server didn't stop gracefully, so it didn't get a chance to clear the flag. If necessary, we can also provide a way to use the file-modifying configuration method to load the configuration from a file, for users who are making lots of changes at once. Changing the SMTP server is no big deal, but a user who's configuring their server for the first time may have lots of settings to change, and may not like the idea of running the configuration command 30 or so times with different arguments. This is all for the servers' configuration, of course. Clients will need some configuration, too, but since it's unique to each client it's not worth the effort to do something like this. The clients can just use configuration files. -- Joel |
|
From: Joel L. <jo...@lo...> - 2003-02-22 23:41:35
|
It's been a while, and I've played a bit more with Java's SSL support, and I'm now convinced that SSL is the way to go for the client-to-server communication. Interestingly enough, this week a vulnerability was found in SSL when used in certain scenarios. But it doesn't look like it would impact Clockwork, because it requires: * The cipher must be used in CBC mode. The "best" cipher Java has available for SSL is RC4, which isn't vulnerable to this attack because it's a stream cipher. * The protocol on top of SSL needs to have a fixed "password" at a certain spot. As far as I can tell, Clockwork's protocol won't have this, because there's no password to send. The attack works on things like IMAP, where the connection always opens with "LOGIN username password" or something like that. * The SSL implementation must "leak" error information by treating padding errors and decryption errors differently. I'm sure that by the time we're ready to release any files, Sun will have a patch for this (if their implementation contains this flaw). At first glance, SSL seems painful for Clockwork users to use, since each node needs to have a private key, and must have the public key of all the entities that will communicate with it. But I think this could be done rather simply with just two key pairs: (1) A server key pair (2) A client key pair Each server would have the server private key, and the client public key. Likewise, each client would have the client private key and the server public key. Sharing them makes things operationally more simple for the user, but it increases the risk if a private key is compromised. So we'll build in a way for the administrator to easily add keys to the "keyring" of the client and the server, to make transitioning to a new client key pair (or server key pair) easy, without taking down the scheduler. This would also make it possible for the administrator to use multiple client keys or server keys if he chooses. Now that I've beat security to death, I've got some ideas on other topics. But I'll send those in a separate mail. -- Joel |
|
From: Joel L. <jo...@lo...> - 2003-02-10 22:48:23
|
I have a few more ideas about security and want to float them past everyone. I also have some more thoughts on a couple of other topics, but I'll leave them to separate emails. Currently, the plan for security calls for all conversations to be encrypted. Shawn suggested the Rijndael cipher, which was selected to be the new AES standard. We'd use public key cryptography to let each end of the connection verify the other end and securely exchange a Rijndael key to use for the rest of the conversation (because public key ciphers are too slow for bulk encryption). After doing some more reading, I've learned that this is essentially SSL. In addition, the SSL specification also includes cipher negotiation, so the two ends can list the ciphers they support and choose the best mutually available one. Java has support for SSL built-in (to Java 1.4, that is). In most HTTP-over-SSL applications (like secure web sites), the client checks the server's identity, but the server doesn't check the client's (at the SSL layer). But SSL does have support for both ends verifying the other -- that's just not used as much. We'd need to do that. The upside of using SSL is that we don't have to roll our own code for exchanging keys, checking identities, and negotiating ciphers. Also, the issue of changing the client/server keys I touched on a while back (without giving an easy solution) gets easier, since Java has a concept of a keystore and certificate store that we could easily add more keys to and later (when the old keys should no longer be accepted) remove keys from. The down side? It looks like the SSL support is restricted to only use certain ciphers: RC4, DES, and Triple-DES (these are the symmetric algorithms, I didn't list the public key algorithms). At least that's what the built-in SSL provider from Sun supports. I'll have to dig around some more and see if another provider is available, or if you can just plug in a cipher. -- Joel |
|
From: Joel L. <jo...@lo...> - 2003-01-28 02:55:07
|
While the JCE (Java Cryptography Extension) provider built into the JDK 1.4 (SunJCE provider) doesn't support a tremendous amount of ciphers, I've found another provider that does. Check out Cryptix JCE at: http://cryptix.org/products/jce/index.html This is a pure Java implementation of a JCE provider that supports lots more ciphers, including AES, all under a free license. The catch: it's "early access" quality. If you're interested in seeing how JCE is used, I've attached a tiny example program I used to encrypt a short string with AES. I had to beat my head against the wall for quite a while to make this work before I figured out that instead of asking for the "AES" algorithm I should ask for the "Rijndael" algorithm. -- Joel |
|
From: Shawn M. <smc...@ei...> - 2003-01-26 21:39:17
|
On Sun, Jan 26, 2003 at 03:36:43PM -0500, Joel Loudermilk said: >=20 > key. Shawn explained to me that the keys are symmetric, meaning that you > can either encrypt a message with the public key and decrypt it with the > private key, or you can encrypt with the private key and decrypt with the > public key. Actually, that makes them asymmetric, but you have the technical gist correct. > message. I suppose we could make this communication not encrypted, but ju= st > with a digital signature in it, but it seems awful easy to make the whole > conversation encrypted, and then there are no issues with someone snooping > secret data (although it's probably not very important data). Since we have to set up a secure channel anyway, it may be not much of a performance hit to use it for commands. AES, DES, et. al. were designed to be used on machines as stupid as smart cards, so we should get acceptable performance. > (2) How multiple servers work with authentication. My examples above talk > about multiple agents and one server, but we know there will be multiple > servers. My first thought is for all servers to have a copy of the > private key. I think this would still be secure, and would allow everythi= ng > to continue to work as I described. I agree. > that level of security could be optional. Also, the administrator needs to > be able to change the schedule's key pair if it's compromised. It would be > nice if this could be done without stopping the servers (outages are bad, > right?). If we make it difficult to change the key pair, the administrator > will be less likely to do it. We could make a mechanism for distribution of new keys over the existing channel, but you wouldn't want it to be the only way, in case your reason for sending new keys was because the old ones were compromised. But as a prophylactic measure it'd be useful. --=20 Shawn McMahon | Every time you walk out of the house FedEx Services | with clothes on, you give up freedom DSS-MCO Security Lead | for temporary safety. |
|
From: Joel L. <jo...@lo...> - 2003-01-26 20:36:57
|
Shawn McMahon and I had a detailed discussion on security on Thursday. While I don't have a handle on everything involved yet, I do think we made some headway in the area of security. Here are the details of what we discussed. (Shawn may have to correct me if I misstate anything.) When an administrator sets up a schedule (or an "instance," to borrow from AutoSys's terminology), one of the things that will be done is the creation of a public/private key pair for the schedule. The scheduling server will keep the private key (secured by appropriate filesystem permissions), and all the clients (both GUIs and the agent software that runs on the managed nodes to make them valid targets of jobs) will get the public key. Shawn explained to me that the keys are symmetric, meaning that you can either encrypt a message with the public key and decrypt it with the private key, or you can encrypt with the private key and decrypt with the public key. First, how to make GUI clients (and command-line client programs used by users) secure. As I mentioned before, I don't want to have to store user and password lists inside Clockwork, and would like to at least give the administrator the option of authenticating against something else. But this means that the password from the GUI (or client program, rather, since it might not be a GUI), will have to be sent to the server. We can't do a challenge-response kind of thing, since if we're going to pass the password off to some other backend we need to actually know what it is. So the communications channel needs to be encrypted so we can safely pass the user's password along. (I suppose the administrator will need to trust that we're not secretly recording all the passwords. If he has doubts, he can just read the source.) When the client program and the server talk, they use public key encryption to secretly exchange a "session key" that will be used to encrypt the rest of their conversation. I suppose they could just use their existing key pair to encrypt the entire conversation, but Shawn said that's way slower than "secret key" encryption methods, so most people use the slow public key encryption to securely exchange a secret key, which they then use to encrypt the rest of their conversation. Now that the client and server have an encrypted communications channel, the client can prompt the user for his username/password and send it to the server without fear that it will be snooped or replayed. Some client programs (for example, a command-line program to send an event or to display job history) may be more useful if they don't ask for a password interactively, so the administrator can set up a web page that runs those commands for instance. In that case, we can make it so that the client programs can take the username and password from the command line or from a file. This will make things flexible enough so that the administrator can do whatever he wants with the client programs. While we don't want to force the administrator to create a list of Clockwork-only usernames and passwords, some people might want to do just that. To make things easy for them, we could make one of our authentication backends check against an internal database. Berkeley DB provides an easy way to encrypt a database, too. Other backends I have in mind are PAM (for the UNIX folks) and calling an external program (for those who want to use neither PAM nor the internal list). That should satisfy just about everyone. The other part of security is the communications between the server and the agents (the piece of Clockwork that runs on every machine that is able to run jobs). Obviously, the agent must be able to tell that commands are coming from the real scheduling server, or this would be a huge root exploit. So when the scheduling server sends commands to the agents, it will encrypt them using the schedule's private key. If the agent can decrypt the message using the schedule's public key, then it knows it's a valid message. I suppose we could make this communication not encrypted, but just with a digital signature in it, but it seems awful easy to make the whole conversation encrypted, and then there are no issues with someone snooping secret data (although it's probably not very important data). That's it for the server-to-agent connections, but there will also be agent-to-server connections as well. Just like in AutoSys, if a job is going to run for hours, we probably don't want to hold the TCP connection open that long, just so that the agent can say at the end what the job's exit code was. So the agent will have to initiate a connection back to the server (or, a server, since there may be multiple servers) to provide the exit code. In this scenario, the server needs to be able to make sure the system that just connected to it is a valid agent. So the agent will encrypt the message with the schedule's public key (which all agents have) and the server will decrypt it with the schedule's private key. For added security, we can allow the administrator to set up a list of agent IP addresses/networks. The incoming connection can be checked against this list, too. For even more security, the incoming connection can be checked against the IP address of the system where the job ran. There's little reason that anyone other than "hosta" should be reporting on the final status of a job that ran on "hosta." But this level of security might be difficult to manage, since most hosts have several IP addresses, and the one the server connected to to get the job started might not be the one all the traffic out of the system goes across. What I haven't addressed here are: (1) Authorization. We've checked to make sure the users are who they say they are, but how do we know who should be allowed to do what? I was thinking of a three-level access model: administrators can change things, operators can only stop/start/hold jobs, and guests can look but not modify anything. We may need to apply these to something more granular than the entire schedule, since it may be useful to have a set of users who can only start/stop the backup jobs, or can only modify a certain set of jobs. (2) How multiple servers work with authentication. My examples above talk about multiple agents and one server, but we know there will be multiple servers. My first thought is for all servers to have a copy of the private key. I think this would still be secure, and would allow everything to continue to work as I described. (3) Ease of maintenance. It needs to be easy for an administrator to change a system's IP address. It's okay if we have this stored in our databases somewhere, as long as we provide an easy way to change it when an host's IP changes. Honestly, though, I'd prefer that we not be on the list of things an administrator needs to do when he changes IP addresses. Maybe that level of security could be optional. Also, the administrator needs to be able to change the schedule's key pair if it's compromised. It would be nice if this could be done without stopping the servers (outages are bad, right?). If we make it difficult to change the key pair, the administrator will be less likely to do it. Comments, anyone? -- Joel |
|
From: Joel L. <jo...@lo...> - 2003-01-23 03:06:34
|
Security is a big area to tackle in the scheduler, and it's important that it
be designed in correctly from the start. I see security used in two different
ways:
(1) Making sure the scheduler processes on different systems communicating
with each other are genuine and haven't been replaced with something
that acts like the scheduler but really isn't.
(2) Making sure that users running commands on the scheduling servers and
through the GUI are who they say they are, so we can enforce restrictions.
I haven't devoted much thought to area #1, but I think the tougher part will
be area #2.
For the moment, let's assume that permissions restrictions are already set up,
meaning that there are rules in the scheduler saying things like "user
jlouder can do this." What I'm attempting to address is how we determine who
the user is -- without requiring the scheduler administrator to set up a
Clockwork-only user/password database.
For GUI users, we prompt them for a username and password. On the server
side, we let the administrator configure what we'll do to check usernames
and passwords. We could provide a couple of backends -- for example, one that
takes the username and password and feeds it to PAM (using a service name
of "clockwork"), and perhaps also one that feeds the username/password to
an external program, which the administrator could use to check the password
against just about anything (LDAP, NT domain, homegrown database, etc.).
For command-line users, we don't prompt for a password, but simply take the
identity of the user who's running the program. For example, if user 'jlouder'
is running the process, then we'll do all authorization checks with the
username 'jlouder'.
Of course, for a command-line program to do much of anything (say, put a
start-job event in the queue), it probably will need to talk to a scheduler
daemon running with elevated privileges. So there will need to be a check of
security credentials in that conversation. This might prevent us from not
asking local users for passwords, since we'll have to supply something to the
daemon to get checked. It's not safe to have the client tell the daemon
"I'm user XYZ, just trust me!" So obviously I need to do some more thinking
on this.
But my main goal is to *not* have to house a username/password database. I
hate applications that do that.
If you have any thoughts or comments on security, I'd love to hear them. As
you can see, I have about 20% of an idea here.
--
Joel
|
|
From: Joel L. <jo...@lo...> - 2003-01-05 23:56:49
|
I've made a change to the "Database Design" document on the SourceForge site. The start_times database, which had time-of-day as keys and job names as data, has been replaced by two new databases: next_start_by_time and next_start_by_job. The purpose of these databases (and the one they replaced) is to handle getting a job started when its start time rolls around. The old design would have required the scheduler to look up "9:21 PM" in the database, start any jobs for that time, and check again in one minute. This didn't support jobs that run at odd intervals well (i.e., every 13 minutes), and it also meant that if for some reason the scheduler didn't check for "9:22 PM" jobs for some reason (maybe it had a lot of jobs to start at 9:21 PM) they'd get missed. Also, if the scheduler is down, it would be hard to figure out what starts got missed when it came back up. The replacement databases are Btrees, which are automatically sorted by key. If we track the next start time of every job (like AutoSys does), and store it in next_start_by_time with the key being the UNIX-style time, then the database will get sorted in chronological order. Now scheduled starts are easy. Using a cursor on the database, it's simple to say "give me the first record." That will be the first job to be started. If it's not time to start that job, do nothing. If it is, start it and try the next one. A side benefit of this technique is that if the scheduler is down and misses a bunch of job starts, they'll all get started as soon as it comes back up without any special coding for that scenario. The next_start_by_job exists for the cases where we want to pull up the next start of a particular job. I imagine the GUI will want to display this to the user. This will be a secondary index on the first database, so Berkeley DB will take care of keeping it in sync for us. In other news, I've constructed a simple-minded Java example that uses the Berkeley DB Queue format with multiple readers and one transaction-protected writer, simulating the adding of events on to event_queue, and processing of the events by the worker threads. After beating my head against the wall several times, it's now working just fine. I'm still unable to get gcj to compile a Java program that uses Berkeley DB to native code. I'm not sure what's wrong. But I'm not too worried about that. I believe the next thing to work on is to get a more detailed understanding of how high availability will work, synchronizing databases and such. In a previous mail, I had mentioned that I wanted to use the Berkeley DB feature of making commits not return until the data had been pushed to all replica databases. That simplifies the programming somewhat, but the more I think about it the more that scares me. I suppose we have to deal with the same scenario all database replication environments have -- what do you do when the primary database fails, but some of the latest updates haven't been applied to the replica database? I'll do some more thinking on this and get back to everyone. -- Joel |
|
From: Joel L. <jo...@lo...> - 2003-01-02 16:35:06
|
There's an article on Linux Journal about using GCJ to compile Java code to either bytecode or native code that was just posted yesterday. The guy who wrote it is one of the original GCJ developers: http://www.linuxjournal.com/article.php?sid=4860 I was playing around with this earlier, and got the obligatory "Hello, World!" example to compile into native code and run. I'm having a little difficulty getting an example built that uses Berkeley DB, probably because JNI (native interface) is involved. But the article says JNI works under GCJ, so I probably just need to fiddle around with it. When a program is compiled to native code with GCJ, it's linked against libgcj, which provides (almost) all the classes that Sun's JRE provides (the most notable exception is the AWT). There's even a garbage collector. I need to do some more research to make sure that libgcj provides all the classes/features we would want. I also hope that it's no more bug-ridden than Sun's classes. -- Joel |
|
From: Joel L. <jo...@lo...> - 2003-01-02 15:09:37
|
+- On Thursday (1/2/2003 9:20) Shawn McMahon <smc...@ei...> Wrote- | I don't like the idea of packaging a JRE, they're huge, but if you don't | you find that Java is not write-once run-anywhere, not even nearly so. You're correct that we shouldn't expect to compile some Java bytecode and let it loose on every platform, expecting it to work with whatever JRE is installed there. What I'm referring to, though, is the way that using Java would free the programmer from needing to think much about the platform while writing the code. It would be really nice not to have to look up every function to make sure that POSIX guarantees it's there. And since we said we want to support Windows (in some capacity), it would be _really_ nice not to have to wrap code in "#ifdef WINDOWS". And it's just nicer to be able to open a socket connection in 2 lines instead of 20. | However, with the GNU Compiler, we could produce compiled native Java | applications, on every target platform. This would actually be easier | than finding a JRE for all the target platforms and keeping it | synchronized when we need new features. If this works, it sounds great. I'd rather deliver native code for each platform anyway, especially since using Berkeley DB would require that. | I vote for whatever language you and Brian are best at coding. Good point. I was working on the assumption (based on responses to emails on this list and previous information I was told) that both Brian and Shawn D. don't have very much free time to contribute, so I'd be doing the vast majority of the coding. While I'm better at C than Java, I'd rather not do a project this large in C because I think I'd spend too much time coding things that are already there for you in Java and debugging memory leaks. And I just plain don't like C++. Brian and Shawn D. can correct me if I'm wrong about their planned involvement in coding. -- Joel |
|
From: Shawn M. <smc...@ei...> - 2003-01-02 14:20:54
|
On Wed, Jan 01, 2003 at 05:41:08PM -0500, Joel Loudermilk said: >=20 > One of the drawbacks of Java is that you can't really depend on a version > of the JRE to be installed on a target system. But we could simply package > the software with an acceptable JRE for each platform (which is what > commercial vendors do with Java and with Perl). It would even be possible > to make a version of the software that comes without a JRE for folks who > don't want another one. I don't like the idea of packaging a JRE, they're huge, but if you don't you find that Java is not write-once run-anywhere, not even nearly so. It's beaten by a country mile by perl, python, and probably half a dozen others I don't know about. However, with the GNU Compiler, we could produce compiled native Java applications, on every target platform. This would actually be easier than finding a JRE for all the target platforms and keeping it synchronized when we need new features. Of course, if you're going to go there, why do Java in the first place, unless all your coders are best at it? I vote for whatever language you and Brian are best at coding. The whole point to the cross-platform ability of Java is lost until they define a standard and stick to it, and they are years away from that presently, if they even get there at all without turning it over to a standards body. --=20 Shawn McMahon | Emacs: It's a nice OS, but to compete with AIM work: spmcmahonfedex | Linux or Windows it needs a better text AIM home: smcmahoneiv | editor. - Alexander Duscheleit |
|
From: Joel L. <jo...@lo...> - 2003-01-01 22:41:23
|
I've been kicking around some ideas about the language to use to write Clockwork, and while I was resistant at first, I think Java might work well. Java would take care of much of the portability for us (remember that we stated in the requirements that Clockwork needs to run on Windows, too). It's also made easy the tasks of memory management, data structures, network communication, threading, and even logging (Jakarta's log4j is quite good). It really seems like Java would let us spend more time writing code that actually runs the scheduler, without having to stop and first create a message-passing system, a generalized logging system, and other low-level stuff. One of the drawbacks of Java is that you can't really depend on a version of the JRE to be installed on a target system. But we could simply package the software with an acceptable JRE for each platform (which is what commercial vendors do with Java and with Perl). It would even be possible to make a version of the software that comes without a JRE for folks who don't want another one. And there's the foolishness with $CLASSPATH and the fact that Java programs don't look quite like other programs in 'ps' when they run, but we could easily wrap them with shell scripts to make startup and shutdown easier. I really think that if a user is able to pkgadd/apt-get/rpm/swinstall our software package, edit a couple of configuration files, run '/etc/init.d/clockwork start' and get the software running, then it doesn't matter if it's Java and we had to bundle it with a JRE. If install and setup is easy, users will use it. LimeWire is a great example of this. It can install its own JRE if you want, and it's almost transparent. You just run an installer and it works. The Berkeley DB code has a Java interface, but it uses native code as the backend. So we'd need to have separate software packages for each platform for that reason. But I don't think this is a really big deal. Rather than doing like RedHat's RPM and assume that db >= version _x_ is installed and in your $LD_LIBRARY_PATH, which forever ties together your db and rpm software, I'd rather just stick the db libraries for whatever version we've tested with in a lib/ directory along with the rest of our software. -- Joel |
|
From: Joel L. <jo...@lo...> - 2002-12-23 02:53:32
|
I put down an idea for the number, type, and contents of the required Berkeley DB databases for Clockwork this weekend, as well as my idea for possible job states. Rather than send everything through email, I posted them in the DocManager on the SourceForge site: https://sourceforge.net/docman/index.php?group_id=40038 These are by no means final, and I would encourage anyone who's interested to review them, find problems, and poke fun at me. -- Joel |
|
From: Joel L. <jo...@lo...> - 2002-12-14 23:48:07
|
I'd like to propose a more detailed design for the job scheduler. This is based on my ideas in the last email, and some more research into Berkeley DB. In this mail, I'll attempt to explain how the scheduler could be implemented as a multiple-master using Berkeley DB. In every collection of systems running a job schedule, there must always exist at least two "master" nodes. As I mentioned previously, these nodes don't have to be dedicated systems. Being a master node just means that the system has some additional responsibilities in the schedule. The databases representing jobs, events, and such would be Berkeley DB databases. Exactly one of the master nodes would be responsible for handling database writes (a requirement of Berkeley DB is that only one system can make updates). This system could be called the "write master." The other master nodes have copies of the databases, which they update when they receive replication messages from the write master. They can make read-only queries against their data, but they can't update it -- updates must be made to the write master. Let me pause for a moment to explain how replication and failover works using the Berkeley DB API. We are responsible for setting up and maintaining a communications infrastructure between the write master and the other masters. We provide a function to the Berkeley DB API that it can call on the write master to send data to another system. We are in control of who is the write master, although the API will help us figure it out by supporting elections. There is a well-defined procedure for adding a new subscriber system which will get that system caught up on the current state of the database(s) and will start feeding the system updates. We're also responsible for making sure writes only happen on the write master. If we want, the API will guarantee that an update is committed to all replica databases before returning from the commit. This may be useful, but may make writes too slow. Having at least two master nodes -- a write master and one or more additional masters -- meets our HA requirement. If one goes down, we can promote another system in the environment to master and, if necessary, hold an election to determine the write master. In addition to managing the databases of jobs, the masters are responsible for running the jobs and deciding when it's time to run a job. Much of this can be distributed among all the masters, letting them share the burden of this work. Suppose that the job schedule defines that at 5:00 PM, 100 jobs are supposed to run on various systems in the environment. The write master would be responsible for checking the time and determining that it's time to start the 100 jobs (I haven't figured out how to distribute that part). He's occasionally checking the clock to see if time-based jobs should start, and when he sees that these 100 jobs should start, he sticks 100 events in the "jobs-to-be-started" queue. This queue is one of the databases, which means that all the masters have access to it. (One of the database types that Berkeley DB supports is a queue, with an atomic "eat-from-the-head" operation.) The jobs-to-be-started queue is seen by all the masters, which are occasionally checking it for work. When one sees some work, it grabs some number of jobs off the head of the queue (it must talk to the write master to do this, since this is not a read-only query). Each master takes a portion of the jobs to be started, tries to start them by talking to the client system, and updates the write master again with the status of the job (either "started" or "failed to start"). When the write master gets updates as to the final status of jobs when they complete (I haven't covered exactly how that happens, but assume that word eventually gets back to the write master), the write master sticks the name of each newly-finished job on the "newly-finished" queue. The purpose of this queue is to distribute the work of determining whether the completion of a job means that any other jobs should now be started. All the masters are checking this queue periodically also, and will grab a chunk of jobs to process from it. Determining the successors of jobs efficiently will require that we keep a database of successors, keyed by completing job. For example, if jobs B and C should start when job A finishes, looking up job A in the successors database will return the names of jobs B and C. So in order to process work from the newly-finished queue, all a master must do is search through the successors database, determine if the dependency is met (i.e., did job A finish with failure, but the dependency is success-only), and if it is, stick more job-start events on the to-be-started queue (by talking to the write master again). These job-start events will in turn get distributed among multiple masters for processing. It might sound like the write master has a far greater burden of the work, but it doesn't necessarily have to be that way. If we configure the database replication so that replication is synchronous, we can guarantee that the all the other masters are as up-to-date as the write master. Then, those masters can query their own, local copies of the databases to find out job definitions and successor information. They'll only need to involve the write master when an update needs to be made. Sure, there's a performance penalty from making the replication synchronous, but if all the masters need to be working from the guaranteed-latest data, then we have to pay that penalty somehow. Either the database layer can do it for us, or we can do it ourselves by making all the other masters have to talk to the write master to make even read-only queries to be sure the data is current. The overhead is probably the same, so I'd rather let the database do the work. I haven't fully fleshed-out how the client updates will get back to the write master, but I was thinking of something like having the clients get a list of all the masters when a job is started. When it finishes, the client will try to contact one of the systems in that list to report status. If that system isn't the write master, it will take care of getting the update to the write master. If that system is down (or no longer a master), the client will move on to the next system in the list. As long as the set of masters doesn't change by 100% while a job runs, then this will work. And assuming all masters have current job status data, a JobScape-like GUI could attach to any of them. Being able to distribute this work will really help out, especially if there are a lot of GUI users. So the only things that the write master does that the other masters can't share in is make all the database updates (and we can't split that up), and check the clock periodically and see if any time-based jobs need to start. If we keep another database keyed by time, it shouldn't be too much work to do that task. That's why I consider this somewhere in between the single-master and distributed-across-all-nodes approaches. I really think this will scale quite well. Do any of you have any thoughts on this design? I need to try and shoot holes in it and see if I spent enough time thinking this up. -- Joel |
|
From: Joel L. <jo...@lo...> - 2002-12-06 03:58:47
|
I don't know how familiar everyone is with Berkeley DB (I'm really not), but
I was just doing some reading this evening about its capabilities and API
and thought I'd share some thoughts with the group.
Apparently, Berkeley DB will support not only transactions, but also
failover and replication. When you combine that with the fact that it's
embedded in your application so there's no need to mess with your
Oracle or MySQL installation, it starts to sound attractive.
Unfortunately, you can't make queries even approaching the complexity of
a SQL query. Every database is just a collection of {key, value} pairs
(although both can be of arbitrary length, and of any format), so your
queries are all either "give me the next record" or "give me the record
whose key is X."
And if you want more than one set of {key, value} pairs, you need to create
a database. I suppose this is why there are about 20 of these in
/var/lib/rpm -- one for each "table." Fortunately, your transactions can
be across databases, and it does log replay on a crash ... the whole bit.
Being able to put _any_ type of data in the database could be very cool.
Imagine if there was a Java object (or a C structure) representing a job
definition, then the database of these job definitions could be accessed
by simply reading and writing the serialized objects (or the C structures).
That would certainly make things easy, and there are already APIs for
C, C++, Java, and Perl.
A specific advantage Berkely DB has over a SQL database for our application
is the replication and failover. It's built-in to the product, whereas
if we went with a SQL database, to make it highly available we'd either
have to assume the user is using a database replication product from his
vendor or roll our own replication (like AutoSys, and I don't think anyone
is excited about the AutoSys homegrown replication).
According to the web site (http://www.sleepycat.com/), the HA Berkeley DB
package will log updates, which are all made to a single master, and
distribute them to other systems. In the event of a failure, one of the
other systems is promoted to a master. I assume this is with minimal hassle
to the application, since it claims this is transparent to the end-user,
but I haven't read any code that uses those features yet.
I'll read some more, but at this point it sounds to me like for all it could
buy us (no external database required, and built-in database failover), I'd
be willing to give up querying the database with SQL.
--
Joel
|
|
From: Joel L. <jo...@lo...> - 2002-12-02 02:56:28
|
First off, Happy Thanksgiving to everyone on the list! Hopefully you won't mind some actual traffic on this mailing list (in addition to the monthly mailman announcement). I was doing some thinking the past couple of weeks about the clockwork project, believe it or not, and Shawn McMahon and I had a few minutes to talk about design the other day. As you may recall, things pretty much stalled out while we were trying to work out whether or not to make the scheduler centralized (like AutoSys) or decentralized (like something else we've not really worked with, but think would be better). Everyone agrees that AutoSys has some pretty bad bottlenecks, and I think that's what has made many of us (myself included) want to steer clear of a centralized design. But as Shawn pointed out to me last week, there's a good chance that some well-applied multithreading could make AutoSys scale a whole lot better. I've heard that there's some maximum number of events per second that can be processed by an event processor, regardless of how much horsepower you have. To me, this sounds like there's some important stuff in AutoSys that isn't multithreaded. The appeal to me of the single-master/centralized design is its simplicity. The distributed design sounds great, but it also sounds very complex, possibly requiring us to do multicast notifications and implement a tiny little publish/subscribe system. A single-master design would make things simpler to implement. What are the things we hate about AutoSys' single-master design? (1) It won't scale past 5,000 jobs. As I said before, I think we can fix that with multithreading. (2) It requires a dedicated pair of scheduling machines. We can eliminate this requirement for small schedules if the event processor is fast enough and we make configuration easy enough. An administrator could elect to "promote" a couple of the managed systems to run the event processor. We could even design the multithreading so that some of the event processing work could be done not just by another thread on the scheduling server, but by another scheduling machine altogether. For instance, when you look at a job in AutoSys that's about to be started, the state of the event is briefly "PG" for "ProcessinG" while the EP dispatches it and talks to the client. Imagine if there were multiple machines processing events, and the status were set to "Processing by machine A." Something as simple as that could off-load part of the burden of event-processing to multiple systems. And if the system were flexible enough to allow the scheduling servers to be easily set up (unlike AutoSys), they could be easily moved around either by the administrator or perhaps automatically, based on load averages. Now we've got a system that behaves as the distributed model, but isn't too much more complicated than the plain single-master model. There's also the issue of databases. Our AutoSys administrators don't like its requirement of a SQL database because then they have to get DBA support. But a SQL database sure makes some things easier for the programmers. I looked briefly at SQLite [1], an embedded SQL database engine. It's kind of neat -- you get SQL queries and even transaction support fully contained within your application; the database lives in a file on the filesystem. But it doesn't support object types on columns (any column can hold anything), and it's unclear how well it holds up under a load of concurrent users (all its benchmarks are single-user). There's also Berkeley DB (which I know Shawn McMahon despises), which claims to support transactions and failover. If this is robust enough and easy enough to work with, then it might be the answer -- giving the programmers something that does the work of a database while appearing invisible to the end user. Of course, most people have a SQL database running *somewhere*, and is it really a big deal if we tell them they need to host another database on it, particularly if we didn't require a specific vendor's database? I'll spend some more time thinking about exactly what the responsibilities of the event processor are and trying to find ways to easily distribute them over a few machines. In the mean time, if you have any thoughts, please send them to the list. The bottom line is that I really think we can take the AutoSys EP model, apply some well-placed multithreading and distributed computing, we'd have a system that with a simple design and the scalability we want. [1] http://www.hwaci.com/sw/sqlite -- Joel |
|
From: Shawn M. <smc...@ei...> - 2002-08-01 19:27:15
|
Happy Mailman Day, everybody!
--=20
Shawn McMahon | Help spread accurate information
AIM: spmcmahonfedex, smcmahoneiv |about Xenu and the Church of Scientology.
<a href=3D"http://xenu.net/">Scientology</a> on your web site.
|
|
From: Shawn M. <smc...@ei...> - 2002-03-29 01:46:09
|
begin quoting what Joel Loudermilk said on Thu, Mar 28, 2002 at 07:08:16PM= -0500: >=20 > It doesn't seem too difficult to take the AutoSys multi-instance design > and make the management tool smart enough to talk to all the instances > behind the scenes, and present the user with one view of *all* the jobs. > And it should also be possible to devise a way to have dependencies across > instances without the user needing to specify the instance. At first glance it seems possible, and it certainly would rule if we can make it work. |
|
From: Joel L. <jo...@lo...> - 2002-03-29 00:08:22
|
+- On Wednesday (3/13/2002 8:47) Shawn McMahon <smc...@ei...> Wrote- | Decentralized would have "N" equal to the number of clients, I'd think. | | I'm not opposed to multiple centers, I just think it only works if at | any given moment there's unambiguous "ownership" of each job. It occurred to me today that the "multiple centers" design is essentially what AutoSys is in a multi-instance configuration. But the drawback of AutoSys' multiple centers is that each has to be managed separately. Another (less severe, perhaps) limitation is that cross-instance dependencies need to explicitly specify the instance where the remote job lives. These two things combined mean you need to not only manage the schedulers separately, but design them as separate entities as well. It doesn't seem too difficult to take the AutoSys multi-instance design and make the management tool smart enough to talk to all the instances behind the scenes, and present the user with one view of *all* the jobs. And it should also be possible to devise a way to have dependencies across instances without the user needing to specify the instance. Wouldn't that be the best of both worlds? This question is really aimed at Shawn Dvorak, since he was shooting for the decentralized design. -- Joel |
|
From: Shawn M. <smc...@ei...> - 2002-03-13 13:48:01
|
This one time, at band camp, Joel Loudermilk wrote: >=20 > But it sure would be nice if you could distribute the schedule-processing > load just by, say, activating a few more nodes as schedulers. Sort of like > Legato Cluster -- you can promote as many nodes as you want to primary, a= nd > they'll just start replicating the configuration among themselves. >=20 > I'm not opposed to this design, but I'd want to flesh out a plan for how > the GUI would manage the jobs and how the jobs would be dispatched before > committing to this approach. That's still centralized, it's just got N centers. Decentralized would have "N" equal to the number of clients, I'd think. I'm not opposed to multiple centers, I just think it only works if at any given moment there's unambiguous "ownership" of each job. |
|
From: Joel L. <jo...@lo...> - 2002-03-13 03:14:02
|
+- On Tuesday (3/12/2002 20:40) "Shawn Dvorak" <sd...@cf...> Wrote- | Perhaps we could make a compromise, with | some number of distributed servers responsible for collecting scheduling | statuses from a subsets of servers. These collector servers wouldn't do any | dispatching; they'd only collect job status broadcasts/unicasts from the If multiple systems would do the dispatching, then how would we determine which system would dispatch a given job? I'm in agreement that it would be great not to have the single-dispatcher bottleneck, but all the solutions I can think of involve a lot of constant communication between the scheduler nodes, and it seems like there would be a fair amount of risk that something would get out of sync. Then again, even the centralized design would require some sort of failover replication, so maybe it wouldn't be _that_ much more work to design a system that has N schedulers instead of 2. But it sure would be nice if you could distribute the schedule-processing load just by, say, activating a few more nodes as schedulers. Sort of like Legato Cluster -- you can promote as many nodes as you want to primary, and they'll just start replicating the configuration among themselves. I'm not opposed to this design, but I'd want to flesh out a plan for how the GUI would manage the jobs and how the jobs would be dispatched before committing to this approach. Have you got any more detailed ideas for how to approach this? -- Joel |
|
From: Shawn D. <sd...@cf...> - 2002-03-13 01:40:44
|
My preference is for a decentralized scheduler. I think that the only real downside is the load involved in getting a complete real-time view of the entire schedule. The advantages in removing the bottlenecks caused by a dedicated server outweigh this. Perhaps we could make a compromise, with some number of distributed servers responsible for collecting scheduling statuses from a subsets of servers. These collector servers wouldn't do any dispatching; they'd only collect job status broadcasts/unicasts from the nodes in the subset. Then the GUI management tool would only have to poll these few collector servers to get the complete view. Shawn ----- Original Message ----- From: "Joel Loudermilk" <jo...@lo...> To: <clo...@li...> Sent: Monday, March 11, 2002 8:47 PM Subject: [Clockwork-developers] Decision time > Shawn D. or Brian, > > Do you guys have an opinion on the centralized/decentralized question? > > I share Shawn M's feeling that we ought to choose a path and get on with > the project. > > -- > Joel > > _______________________________________________ > Clockwork-developers mailing list > Clo...@li... > https://lists.sourceforge.net/lists/listinfo/clockwork-developers |