[Clockwork-developers] Update to database design
Status: Planning
Brought to you by:
jlouder
|
From: Joel L. <jo...@lo...> - 2003-01-05 23:56:49
|
I've made a change to the "Database Design" document on the SourceForge site. The start_times database, which had time-of-day as keys and job names as data, has been replaced by two new databases: next_start_by_time and next_start_by_job. The purpose of these databases (and the one they replaced) is to handle getting a job started when its start time rolls around. The old design would have required the scheduler to look up "9:21 PM" in the database, start any jobs for that time, and check again in one minute. This didn't support jobs that run at odd intervals well (i.e., every 13 minutes), and it also meant that if for some reason the scheduler didn't check for "9:22 PM" jobs for some reason (maybe it had a lot of jobs to start at 9:21 PM) they'd get missed. Also, if the scheduler is down, it would be hard to figure out what starts got missed when it came back up. The replacement databases are Btrees, which are automatically sorted by key. If we track the next start time of every job (like AutoSys does), and store it in next_start_by_time with the key being the UNIX-style time, then the database will get sorted in chronological order. Now scheduled starts are easy. Using a cursor on the database, it's simple to say "give me the first record." That will be the first job to be started. If it's not time to start that job, do nothing. If it is, start it and try the next one. A side benefit of this technique is that if the scheduler is down and misses a bunch of job starts, they'll all get started as soon as it comes back up without any special coding for that scenario. The next_start_by_job exists for the cases where we want to pull up the next start of a particular job. I imagine the GUI will want to display this to the user. This will be a secondary index on the first database, so Berkeley DB will take care of keeping it in sync for us. In other news, I've constructed a simple-minded Java example that uses the Berkeley DB Queue format with multiple readers and one transaction-protected writer, simulating the adding of events on to event_queue, and processing of the events by the worker threads. After beating my head against the wall several times, it's now working just fine. I'm still unable to get gcj to compile a Java program that uses Berkeley DB to native code. I'm not sure what's wrong. But I'm not too worried about that. I believe the next thing to work on is to get a more detailed understanding of how high availability will work, synchronizing databases and such. In a previous mail, I had mentioned that I wanted to use the Berkeley DB feature of making commits not return until the data had been pushed to all replica databases. That simplifies the programming somewhat, but the more I think about it the more that scares me. I suppose we have to deal with the same scenario all database replication environments have -- what do you do when the primary database fails, but some of the latest updates haven't been applied to the replica database? I'll do some more thinking on this and get back to everyone. -- Joel |