[Clockwork-developers] Storing configuration data

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

The other day I was thinking about how the Clockwork servers would store
their configuration data, which would be replicated between all the servers.
I made some notes on this, and will share them with everyone.

First off, when I talk about the configuration of the server, I'm not talking
about the job schedule. I'm talking about settings like: which mail server
to use, lists of clients and port numbers, which ciphers to use for SSL.

Since this configuration applies to the entire schedule "instance", it needs
to be replicated between all the servers. That makes it a natural fit for
one of the Berkeley databases, which are already getting replicated.

So now all that needs to be determined is how to modify the configuration.
For this, I'll borrow from some features of Veritas Cluster Server, which lets
you modify just about any aspect of configuration while the server is running
by using commands that really have no idea what you're tweaking. Here's what
I mean:

We put all the configuration data into a separate Berekely DB, called "config"
for example. Into the config database we store each item of the configuration,
storing the item name and then it's data. The data could be one of a few
"types": scalar, list, or hash. All any program that manipulates the config
database needs to know is how to add/modify/remove these configuration items.

Here's an example: when the scheduler needs to know what mail server to use,
it pulls the configuration item called "SMTPServer" out of the config
database. It's value is a scalar, and is the hostname of the SMTP server.
When the scheduler needs to know who all the clients are, it pulls out the
value of "Clients", which is a list of hostnames.

So the interface that reads and writes the configuration can be very simple,
and independent of the configuration data.

I've glossed over exactly who it is that's updating the config database --
is it the running server, or is it a command-line program that's directly
manipulating the database file?

Most of the time, it should be the running server. The config database is
replicated, so it's a bad idea to go mucking around with the database file
behind the server's back. But there could be times when all servers are down,
and you need to make a modification without starting the server. So we'll
need that capability as well.

As a safeguard, the config database should include a flag to indicate whether
the server is running. If a user tries to use the file-modifying configuration
method and the flag is set, the utility can refuse to make the change. Of
course, there will need to be a way for the user to explicitly ask to clear
the flag, for situations where the server didn't stop gracefully, so it
didn't get a chance to clear the flag.

If necessary, we can also provide a way to use the file-modifying configuration
method to load the configuration from a file, for users who are making lots
of changes at once. Changing the SMTP server is no big deal, but a user
who's configuring their server for the first time may have lots of
settings to change, and may not like the idea of running the configuration
command 30 or so times with different arguments.

This is all for the servers' configuration, of course. Clients will need
some configuration, too, but since it's unique to each client it's not
worth the effort to do something like this. The clients can just use
configuration files.

-- 
Joel