[Clockwork-developers] Berkeley DB

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I don't know how familiar everyone is with Berkeley DB (I'm really not), but
I was just doing some reading this evening about its capabilities and API
and thought I'd share some thoughts with the group.

Apparently, Berkeley DB will support not only transactions, but also
failover and replication. When you combine that with the fact that it's
embedded in your application so there's no need to mess with your
Oracle or MySQL installation, it starts to sound attractive.

Unfortunately, you can't make queries even approaching the complexity of
a SQL query. Every database is just a collection of {key, value} pairs
(although both can be of arbitrary length, and of any format), so your
queries are all either "give me the next record" or "give me the record
whose key is X."

And if you want more than one set of {key, value} pairs, you need to create
a database. I suppose this is why there are about 20 of these in
/var/lib/rpm -- one for each "table." Fortunately, your transactions can
be across databases, and it does log replay on a crash ... the whole bit.

Being able to put _any_ type of data in the database could be very cool.
Imagine if there was a Java object (or a C structure) representing a job
definition, then the database of these job definitions could be accessed
by simply reading and writing the serialized objects (or the C structures).
That would certainly make things easy, and there are already APIs for
C, C++, Java, and Perl.

A specific advantage Berkely DB has over a SQL database for our application
is the replication and failover. It's built-in to the product, whereas
if we went with a SQL database, to make it highly available we'd either
have to assume the user is using a database replication product from his
vendor or roll our own replication (like AutoSys, and I don't think anyone
is excited about the AutoSys homegrown replication).

According to the web site (http://www.sleepycat.com/), the HA Berkeley DB
package will log updates, which are all made to a single master, and
distribute them to other systems. In the event of a failure, one of the
other systems is promoted to a master. I assume this is with minimal hassle
to the application, since it claims this is transparent to the end-user,
but I haven't read any code that uses those features yet.

I'll read some more, but at this point it sounds to me like for all it could
buy us (no external database required, and built-in database failover), I'd
be willing to give up querying the database with SQL.

-- 
Joel