From: Wolfgang M. <wol...@ex...> - 2010-08-30 16:37:54
|
> I understand. The solution we have in place right now is similar to the solution you > mentioned, but we put it in place a while ago. Augmenting the locking with a singleton > lock does, indeed, work. Internally, eXist does need the singleton lock in rare cases only, mainly when reading or storing the collection configuration document for a collection, or when locking documents for an XQuery update expression. Otherwise, eXist just avoids locking multiple collections at once wherever possible as it is known to be an expensive operation and limits concurrency. > The second replacement I've come up with allows two read queries to run > simultaneously, even when they target the same collection, and when multiple collections > are used simultaneously. As I said before, I welcome any exploration in this area. As James just suggested, we may want to have a skype telecon on this to discuss the possibilities and dangers. Finally, just as a note to other users who code against the internal API: Dannes' WebDAV reimplementation shows some clean examples of how to use internals: http://exist.svn.sourceforge.net/viewvc/exist/branches/dizzzz/trunk-webdav-upgrade/extensions/webdav/src/org/exist/webdav/ Wolfgang |
From: Adam R. <ad...@ex...> - 2010-08-31 13:39:09
|
> Otherwise, eXist just avoids locking multiple collections at once > wherever possible as it is known to be an expensive operation and > limits concurrency. We have discussed several times replacing eXist-db's current collection mechanism with a virtualised implementation where Collections are just another number in the system. This was discussed for the purposes of performance when large collections are involved. Would this simplify the overal problem domain? If so, perhaps this work should be undertaken before a redesign of the locking system? >> The second replacement I've come up with allows two read queries to run >> simultaneously, even when they target the same collection, and when multiple collections >> are used simultaneously. > > As I said before, I welcome any exploration in this area. As James > just suggested, we may want to have a skype telecon on this to discuss > the possibilities and dangers. Skype teleconference would be good :-) -- Adam Retter eXist Developer { United Kingdom } ad...@ex... irc://irc.freenode.net/existdb |
From: Jason S. <js...@in...> - 2010-08-31 21:23:32
|
I think we need to talk about how much granularity is actually valuable. For example, if you have a single dom.dbx file, can you write to that from multiple threads at the same time safely? If you can't, then it doesn't make much sense to write-lock at the collection level, right? Also, if you have a maximally granular locking mechanism, allowing locks on collections and resources, both deep locks and shallow locks, with multiple readers and a single writer allowed on each, the deadlock detection gets really complex. Performance on deadlock detection can blow up. Still, it seems like it would be nice to write to one document while querying against another document in the same collection... How the collections are implemented underneath isn't that important to the locking mechanism, other than this affects how much concurrency that you can actually take advantage of. I think, though, that you **need to have the locking mechanism in place pretty early on.** If you design a wonderfully concurrent back end, but you control access though a global mutex, well, what have you got? :-) Plus, in any non-trivial locking mechanism, there will be deadlocks and deadlock detection-and-recovery. And the software, as a whole, has to use the standards for detection-and-recovery if you want to be able to take advantage of the more concurrent locking. And this all has to be done in a system that currently does not support true transactional rollback (I think). Which means there are some additional rules when it comes to write locking... Too much information. I'll write something up for Thursday, and hopefully all this stuff will become more clear. This is not an easy topic for anyone, including myself! -----Original Message----- From: Adam Retter [mailto:ad...@ex...] Sent: Tuesday, August 31, 2010 6:42 AM To: Wolfgang Meier Cc: Jason Smith; Paul Ryan; eXist development; Michael J. Pelikan; Todd Gochenour Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries. > Otherwise, eXist just avoids locking multiple collections at once > wherever possible as it is known to be an expensive operation and > limits concurrency. We have discussed several times replacing eXist-db's current collection mechanism with a virtualised implementation where Collections are just another number in the system. This was discussed for the purposes of performance when large collections are involved. Would this simplify the overal problem domain? If so, perhaps this work should be undertaken before a redesign of the locking system? >> The second replacement I've come up with allows two read queries to run >> simultaneously, even when they target the same collection, and when multiple collections >> are used simultaneously. > > As I said before, I welcome any exploration in this area. As James > just suggested, we may want to have a skype telecon on this to discuss > the possibilities and dangers. Skype teleconference would be good :-) -- Adam Retter eXist Developer { United Kingdom } ad...@ex... irc://irc.freenode.net/existdb |
From: Wolfgang M. <wol...@ex...> - 2010-08-31 22:04:28
|
> For example, if you have a single dom.dbx file, can you write to that from multiple threads at the same time safely? If you can't, then it doesn't make much sense to write-lock at the collection level, right? The bigger picture is much more complex. You have to talk about transactions, recovery, caching and more. You do not write to dom.dbx directly. You write to the page cache. And there's more than just dom.dbx. Writing the actual data just takes a small part of the overall indexing time. More time is spent with indexing, maintaining the transaction log etc. >From my point of view, the next step in any redesign effort should be to remove the collection locks entirely. We have discussed this before. It will greatly simplify the locking and transaction log. My roadmap roughly looks like this: 1) remove collection dependency from core indexes: 1a) structural index, DONE 1b) range index, IN PROGESS 1c) remove document metadata from collection store and keep it separately. a collection is just a sequence of (arbitrary) document ids. a document can be linked to more than one collection. 2) drop all collection locks, except for the case where the collection metadata itself is modified Those steps have to be completed before we address other things. We need to simplify the architecture first, then try to do further redesigns. Any help will be welcome. As an added value, 1a and b will improve update/write performance in general. > Still, it seems like it would be nice to write to one document while querying against another document in the same collection... Normally, eXist will acquire a lock on the collection, acquire one on the document, release the collection lock, continue parsing the document. In some cases (node updates), the transaction handling has forced us to keep the lock on the collection longer than desired. But this can be changed (see above). > And this all has to be done in a system that currently does not support true transactional rollback (I think). eXist maintains a transaction log and does redo/undo on recovery. The only limitation is that the transaction log is incomplete, i.e. it does not cover any secondary indexes. Still transactional integrity has to be preserved and puts further requirements on the locking (this is the reason why collection locks are often not released early). Well, I will stop writing emails now and better explain everything on Thursday. Wolfgang |
From: Jason S. <js...@in...> - 2010-08-31 22:55:20
|
Quick OS Survey: What operating systems is everyone using? For purposes of finding some common white-boarding app. I'm good with Windows and/or Linux. Is anyone planning to attend Linux-only? -----Original Message----- From: Wolfgang Meier [mailto:wol...@ex...] Sent: Tuesday, August 31, 2010 4:04 PM To: Jason Smith Cc: Adam Retter; Paul Ryan; eXist development; Michael J. Pelikan; Todd Gochenour Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries. > For example, if you have a single dom.dbx file, can you write to that from multiple threads at the same time safely? If you can't, then it doesn't make much sense to write-lock at the collection level, right? The bigger picture is much more complex. You have to talk about transactions, recovery, caching and more. You do not write to dom.dbx directly. You write to the page cache. And there's more than just dom.dbx. Writing the actual data just takes a small part of the overall indexing time. More time is spent with indexing, maintaining the transaction log etc. From my point of view, the next step in any redesign effort should be to remove the collection locks entirely. We have discussed this before. It will greatly simplify the locking and transaction log. My roadmap roughly looks like this: 1) remove collection dependency from core indexes: 1a) structural index, DONE 1b) range index, IN PROGESS 1c) remove document metadata from collection store and keep it separately. a collection is just a sequence of (arbitrary) document ids. a document can be linked to more than one collection. 2) drop all collection locks, except for the case where the collection metadata itself is modified Those steps have to be completed before we address other things. We need to simplify the architecture first, then try to do further redesigns. Any help will be welcome. As an added value, 1a and b will improve update/write performance in general. > Still, it seems like it would be nice to write to one document while querying against another document in the same collection... Normally, eXist will acquire a lock on the collection, acquire one on the document, release the collection lock, continue parsing the document. In some cases (node updates), the transaction handling has forced us to keep the lock on the collection longer than desired. But this can be changed (see above). > And this all has to be done in a system that currently does not support true transactional rollback (I think). eXist maintains a transaction log and does redo/undo on recovery. The only limitation is that the transaction log is incomplete, i.e. it does not cover any secondary indexes. Still transactional integrity has to be preserved and puts further requirements on the locking (this is the reason why collection locks are often not released early). Well, I will stop writing emails now and better explain everything on Thursday. Wolfgang |
From: Jason S. <js...@in...> - 2010-08-30 16:09:39
|
If you are referring to org.exist.storage.lock.ReentrantReadWriteLock, which serves as both the collection locking mechanism and the one used by "dom.dbx", "collections.dbx", etc., the problem is that this lock is a mutex. The name is Reentrant... However, the implementation uses a mutex over reads and writes. The ideal would be to allow multiple readers and a single writer to any resource at any time. The standard locking mechanism, when used with "dom.dbx", allows only one reader or writer at any time. For long, unoptimized read queries, this results in a choke point on dom.dbx that looks to me like it slows down even optimized queries. I hope I answered the right question... :-) -----Original Message----- From: Dmitriy Shabanov [mailto:sha...@gm...] Sent: Sunday, August 29, 2010 12:56 AM To: Wolfgang Meier Cc: Jason Smith; eXist development Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries. On Sat, 2010-08-28 at 19:06 +0200, Wolfgang Meier wrote: > > Instead, eXist has artificially limited access to dom.dbx to a single thread (at a time). > > The assumption is that - during a query - dom.dbx is only read at > serialization time and only to read out a sequence of pages to display > the final query result to the user. > > It's a complex interplay between cache manager, transaction log and > other components. I agree there could be ways to allow concurrent read > access to dom.dbx at the same time, but we would need to carefully > discuss the implications. Would 'normal' lock mechanism be suitable here? Or any restrictions that do not allow to use it? -- Cheers, Dmitriy Shabanov |