Re: [Clockwork-developers] Architecture of job scheduler

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

----- Original Message -----
From: "Shawn McMahon" <smc...@ei...>
To: <clo...@li...>
Sent: Friday, January 11, 2002 10:03 AM
Subject: Re: [Clockwork-developers] Architecture of job scheduler

> Another concern would be clocks; it's easier to keep one machine synced
> than dozens.

A valid question may be whether our tool should be responsible for time
synchronization.  The users could always use NTP to keep the clocks on their
systems synchronized (to a certain degree).  NTP can generally keep system
clocks within a second of each other, so the question is whether it would be
valuable to keep system clocks more closely synchronized than is possible
with the standard tools like NTP.  As we're talking mostly about batch
processing, it seems relatively unlikely to me that I would run into a
situation where I need to start jobs on multiple systems with that much
accuracy.  Even supposing that were the case, the developer would likely
want to be using real-time programming techniques, which would make the use
of a scheduler like we're discussing out of the question.  In a
decentralized configuration like Joel suggested, I'm thinking it would
probably be enough to have the servers start jobs according to their own
system clocks, and let the system administrators worry about keeping the
system clocks as closely synchronized as they need.  Many routers
participate in NTP time synchronization, and I'd guess that most large
server network installations are configured for NTP as well.  (I even run a
server at my house to keep all of my PCs' clocks in sync.)

> > event processor (to borrow a term from AutoSys). And a SQL-based
database
> > would make things easier to work with from a development perspective,
but
> > not until now did I realize that it might make the system less
attractive
> > for a user, since they would have to manage another database.
>
> No reason we can't make the database a part of the server program, and
> not use a full-blown SQL, is there?

That's certainly an option, but if we could achieve the same, or even
better, performance (one of Autosys' disadvantages) without having a
centralized server to depend on and without having a full-fledged SQL
database that humans have to manage, I'd be all for it.  If we choose to use
a full SQL database, we have two possible routes:  either choose a database
that everyone will have to do, or decide to support multiple database
platforms.

If I wanted to use a full-fledged database, I would want to take advantage
of some of the more advanced features of the database engine (let's say we
had one that supports transactions, replication, triggers, etc).  To use
those features we'd need to go with a single database platform.  Trying to
use, for example, JDBC, and let the user choose the database platform would
mean that we wouldn't be able to use features that aren't commonly
supported.

If we go centralized, I think we might as well pick a full-fledged SQL
database platform (hopefully a free one) and standardize on that.  A
centralized system means we're dependent on a single server, and if that's
the case, that server is liable to be very, very busy.  If we wanted to have
redundant servers be an option, we'd really need some database replication
to do it right, so there's really no alternative to a real database.
However, if we decide to decentralize the application, perhaps we should
investigate using something smaller, like maybe Berkeley DB, that may not be
SQL, but might have enough functionality to manage the jobs for a single
machine, and do a good job at it.  Alternatively, we could choose to store
the data in XML, and load it into the data structures we're using in
whatever language(s) we choose.  Or, if we could find an SQL engine that
doesn't require human management and is small enough to include with builds
of our application, that would be cool too.

Just my $0.02

-Brian