From: Jason S. <js...@in...> - 2010-08-31 22:55:20
|
Quick OS Survey: What operating systems is everyone using? For purposes of finding some common white-boarding app. I'm good with Windows and/or Linux. Is anyone planning to attend Linux-only? -----Original Message----- From: Wolfgang Meier [mailto:wol...@ex...] Sent: Tuesday, August 31, 2010 4:04 PM To: Jason Smith Cc: Adam Retter; Paul Ryan; eXist development; Michael J. Pelikan; Todd Gochenour Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries. > For example, if you have a single dom.dbx file, can you write to that from multiple threads at the same time safely? If you can't, then it doesn't make much sense to write-lock at the collection level, right? The bigger picture is much more complex. You have to talk about transactions, recovery, caching and more. You do not write to dom.dbx directly. You write to the page cache. And there's more than just dom.dbx. Writing the actual data just takes a small part of the overall indexing time. More time is spent with indexing, maintaining the transaction log etc. From my point of view, the next step in any redesign effort should be to remove the collection locks entirely. We have discussed this before. It will greatly simplify the locking and transaction log. My roadmap roughly looks like this: 1) remove collection dependency from core indexes: 1a) structural index, DONE 1b) range index, IN PROGESS 1c) remove document metadata from collection store and keep it separately. a collection is just a sequence of (arbitrary) document ids. a document can be linked to more than one collection. 2) drop all collection locks, except for the case where the collection metadata itself is modified Those steps have to be completed before we address other things. We need to simplify the architecture first, then try to do further redesigns. Any help will be welcome. As an added value, 1a and b will improve update/write performance in general. > Still, it seems like it would be nice to write to one document while querying against another document in the same collection... Normally, eXist will acquire a lock on the collection, acquire one on the document, release the collection lock, continue parsing the document. In some cases (node updates), the transaction handling has forced us to keep the lock on the collection longer than desired. But this can be changed (see above). > And this all has to be done in a system that currently does not support true transactional rollback (I think). eXist maintains a transaction log and does redo/undo on recovery. The only limitation is that the transaction log is incomplete, i.e. it does not cover any secondary indexes. Still transactional integrity has to be preserved and puts further requirements on the locking (this is the reason why collection locks are often not released early). Well, I will stop writing emails now and better explain everything on Thursday. Wolfgang |