[srvx-devel] next generation database approach
Brought to you by:
entrope
From: Entrope <en...@us...> - 2002-01-10 03:22:23
|
There was some discussion today about what kind of database srvx should use in its reincarnation. There are three choices that I see: whether to use multiple databases (versus just one), the kind of database to use, and how to give other processes access to the data. The question of multiple databases is mostly an issue if we have in-memory, write-all-at-once databases (which we do in srvx-1.x). Reading or writing all the databases for GamesNET at once can take a few seconds, even on an Athlon 1 GHz. If separate databases are used, it is easier for them to be out of sync. So this is mostly dependent on the next issue. I've seen three suggestions for the kind of database to use: - The current recdb code (or a faster version of it, such as saxdb in the CVS HEAD). This is a hierarchical, string-keyed human-readable database with several types of data. The limitations we have seen with it are: long read and write times; all the data being in-memory at once (which will not be much of a problem until we grow a lot); and lack of any way to do IPC operations on the data. (Any other process would have to talk to it through IRC, which is ugly and unreliable.) - An out-of-process relational database, such as SQL. This eliminates the read and write time issues, moves the problem of what data to keep in memory to another process, and easily allows others to access the data. However, it has drawbacks: any read or write becomes much more expensive, due to the context switch (or even network access); we need to add locking logic in many places; and our database schema is constrained to what is in the database (or how we talk to it). To be more precise, updating the database schema is hard. - A flat database with a relational information and serialization layer on top. This is approximately how my proposed "oodb" is structured; Berkeley DB (or something similar) could provide the flat database operations. It solves the long read and write times (by only writing some dirty data at once, it amortizes storing data over time) and allows us to control how much data we keep in memory. Direct access to the database from other processes is probably harder than with SQL. The third issue is how to give other processes access to the data. If we use a standard format (or protocol) to store the data, we can directly access it from other processes. If we use a proprietary format (or protocol), we would have to provide some sort of IPC bridge. This might be through IRC (eww), or might be through some other direct TCP connection. The drawbacks with giving other processes direct access to the data are that we need to have rigid (and correct) rules about how to keep the database consistent (this probably requires transactions) and about how to lock parts of the database (or the whole thing) while you work on it, and it is hard to extend the format. If we provide an IPC bridge, we can provide atomic operations and an extensible format through that interface. (SOAPy RPC over XML over HTTP! Zoom zooooom!!!</winer>) If we look at what's practical, I see these choices: 1) Multi-file saxdb with IPC bridge - Status quo 2) One-file saxdb with IPC bridge - Probably too slow 3) SQL server with direct access - Buzzword compliant! 4) SQL server with IPC bridge - Almost no point in having the DB out of process 5) oodb on Berkeley DB with IPC bridge - Possible code reuse advantages; freedom of storage format 6) oodb on Berkeley DB with direct access - Possible code reuse advantages 7) oodb with proprietary DB with IPC bridge - Best control over caching and writing back data, but much code Right now we use (1), but I think we want to get rid of it, and we probably don't want (2) either. Did I leave anything out? -- Entrope |