From: Wolfgang M. <wol...@ex...> - 2010-08-28 16:44:07
|
> Java is capable of reading data from the same file from multiple threads simultaneously. In > smaller data sets (hundreds of MB), the entire database will be cached in memory by the > OS, so multiple concurrent access is fast. You forgot one thing: eXist does not access the dbx files directly, but always through the cache manager, which caches the btree as well as data pages. Only the cache manager writes or reads to files and you need to make sure that all threads see the same pages at any time. I agree that access to dom.dbx could be more fine-grained (e.g. on the page level), but I think you should rather look into other aspects (see below). > And please let me know if I have reached a wrong conclusion here. It's a complex subject, > and it's easy to miss things. You ignored most of what I wrote in my previous emails, which I find a bit unfriendly: if your query is formulated in the right way and you have the proper indexes in place, the query engine SHOULD NOT access dom.dbx AT ALL!!!!!!!!!!!!!!! I don't think dom.dbx is the bottleneck - it's the QUERY. Even for testing, please make sure your query is properly optimized or your test won't be realistic. I'd like to move this discussion over to the development list as it will get too technical. Wolfgang |
From: Jason S. <js...@in...> - 2010-08-30 15:40:02
|
Okay, I'll try to keep this short. :-) I have a tendency to go long on emails... So in my company, we are using eXist in a way that tends to query against multiple collections simultaneously. In doing so, we ran into deadlocks in org.exist.storage.lock.ReentrantReadWrite lock. I looked deeply into that locking mechanism and realized it is deeply flawed. * The deadlock detection mechanism does not work, and has no chance of working. It can't be made to work at all. * Deadlocks using this locking mechanism are easy to create. * The read locks are mutexes. * The locks don't recognize collection hierarchy, so you aren't locking what you think you are. * There is a separate mechanism for locking resources (to work, it needs to be a combined mechanism). So I have now rewritten org.exist.storage.lock.ReentrantReadWrite, twice. :-) Once to prove that I could completely serialize all access to eXist safely (it worked), and the second time to allow concurrent reads. This also works, but concurrent performance is limited. That is, a short query will not be blocked by a long query, which is a very positive result. The second rewrite lets me selectively either have mutex-read (legacy) or concurrent-read locking, depending on the thread. It is, well, complicated. But it's doable. This isn't something we could put directly into eXist. Well, we could, but it would likely slow down some operations - you can't take advantage of the concurrent-read locking without deadlock detection code at every entry point - where Sessions are created. And the legacy locking mode is a global mutex, to prevent deadlocks. Hey, it works. We'd have to talk about the implications. My lock can deadlock!!! eXist code would have to be modified at multiple points to handle this. However, I have worked out a reasonable scheme for recovering from deadlocks. It would require some recoding to take advantage of, and if you don't do the recoding, it defaults to original behavior, which is mutex. If you guys have time, I can go deeper into the theory (maintaining legacy compatibility makes it kind of heady stuff). And show you some code. I have failed, once again, to keep it short. :-/ -----Original Message----- From: Wolfgang Meier [mailto:wol...@ex...] Sent: Sunday, August 29, 2010 3:34 AM To: Dmitriy Shabanov Cc: Jason Smith; eXist development Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries. > Would 'normal' lock mechanism be suitable here? Or any restrictions that > do not allow to use it? I'm not yet sure of the consequences. I do believe without further exploration that we could switch to a multi-read/exclusive write lock mechanism in some places, though this would require some changes to the cache management (which could - in return - result in new locks being introduced ;-). The goal, let me repeat, would be to speed up non index-assisted, non-optimized access to the DOM. Index-assisted access itself is pretty fast and does allow for good concurrency. But we have to be very careful here since the architecture is complex: you have to consider transactional integrity, journalling, caching and other aspects. If we change anything, we have to proceed carefully and in very small steps. Stability is always my top priority. Wolfgang |
From: Wolfgang M. <wol...@ex...> - 2010-08-30 16:06:50
|
> * Deadlocks using this locking mechanism are easy to create. > * The read locks are mutexes. > * The locks don't recognize collection hierarchy, so you aren't locking what you think you are. > * There is a separate mechanism for locking resources (to work, it needs to be a combined mechanism). Wait a second, I think we have to go back to the start of our discussion months back. Parts of your application are written against eXist's internal API, not against a public API like XML:DB, REST or the XQuery interface. The internal API was never really meant to be used in end-user applications (I recognize the fact that more and more users work with it, so we'll need to document it and clean it up). Anyway, as I tried to explain a few weeks back, the locking is not fail-safe. You can produce deadlocks if you do not follow certain conventions (which are hard to know since they are not documented). In particular, acquiring a lock across multiple collections can cause a deadlock. In this case, eXist-internal code acquires a lock on the global collection cache, which is a singleton, before acquiring the lock on more than one collection. This is safe, though it puts the db into single-task mode (but it happens only in one or two special cases anyway). If I remember well, I sent you a junit test to demonstrate this. Did you check your code if it does try to work across multiple collections? I think it does. If so, please try my fix. All public APIs are tested in environments with many concurrent users. I'm not aware of any major deadlock situations, except for those caused by WebDAV locking (the current WebDAV implementation will be replaced soon). Using the internal API is risky. I do recognize it has to be improved to allow developers to write code against it more easily. I welcome any attempt to improve the locking code to become easier and more fail-safe. However, you have to be fair and give my suggestions a try before stating that everything is flawed, while you are programming against an API which has not been intended for general use. Wolfgang |
From: Jason S. <js...@in...> - 2010-08-30 16:59:33
|
Wolfgang, if you are interested, I could merge my code into a branch of eXist (not intended for human consumption), so that we could talk using concrete examples. In particular, about improving the locking mechanism so that it actually works (mine is more of a proof of concept, not a final solution). It looks to me that a number of the concurrency limitations in eXist originate from having to work around the limitations of the current locking mechanism. Again, I would not expect a final solution to be implemented the way I have done it. I just want to be able to show some possibilities. I am not expecting this conversation to take place quickly. This is one of those "long-haul" things - perhaps over years, and several major releases. I'm not in a hurry to do the wrong thing quickly!!! -----Original Message----- From: Wolfgang Meier [mailto:wol...@ex...] Sent: Monday, August 30, 2010 10:38 AM To: Jason Smith Cc: Dmitriy Shabanov; eXist development; Paul Ryan; Michael J. Pelikan; Todd Gochenour Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries. > I understand. The solution we have in place right now is similar to the solution you > mentioned, but we put it in place a while ago. Augmenting the locking with a singleton > lock does, indeed, work. Internally, eXist does need the singleton lock in rare cases only, mainly when reading or storing the collection configuration document for a collection, or when locking documents for an XQuery update expression. Otherwise, eXist just avoids locking multiple collections at once wherever possible as it is known to be an expensive operation and limits concurrency. > The second replacement I've come up with allows two read queries to run > simultaneously, even when they target the same collection, and when multiple collections > are used simultaneously. As I said before, I welcome any exploration in this area. As James just suggested, we may want to have a skype telecon on this to discuss the possibilities and dangers. Finally, just as a note to other users who code against the internal API: Dannes' WebDAV reimplementation shows some clean examples of how to use internals: http://exist.svn.sourceforge.net/viewvc/exist/branches/dizzzz/trunk-webdav-upgrade/extensions/webdav/src/org/exist/webdav/ Wolfgang |
From: Adam R. <ad...@ex...> - 2010-08-31 12:47:23
|
> Again, I would not expect a final solution to be implemented the way I have done it. I just want to be able to show some possibilities. I am really excited to see such enthusiasm and knowledge of the eXist-db internals and suggestions for improvements that could be made. > I am not expecting this conversation to take place quickly. This is one of those "long-haul" things - perhaps over years, and several major releases. I'm not in a hurry to do the wrong thing quickly!!! > Lets not loose this though, I think that any performance and scalability enhancements are great. And actually there is a flip side here that no one has mentioned: Whilst everyone has said that you should have optimised queries and appropriate indexes, which is invariably true for production systems, what about new users and developers who come to eXist-db? As a new user of eXist-db you dont necessarily understand about optimising your queries or even how to create the correct indexes. Not all eXist-db users are software developers! It would be great if these new users saw great performance from the start, even if they havent set up indexes and their queries are doing full scans of the dbx files :-) > -----Original Message----- > From: Wolfgang Meier [mailto:wol...@ex...] > Sent: Monday, August 30, 2010 10:38 AM > To: Jason Smith > Cc: Dmitriy Shabanov; eXist development; Paul Ryan; Michael J. Pelikan; Todd Gochenour > Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries. > >> I understand. The solution we have in place right now is similar to the solution you >> mentioned, but we put it in place a while ago. Augmenting the locking with a singleton >> lock does, indeed, work. > > Internally, eXist does need the singleton lock in rare cases only, > mainly when reading or storing the collection configuration document > for a collection, or when locking documents for an XQuery update > expression. > > Otherwise, eXist just avoids locking multiple collections at once > wherever possible as it is known to be an expensive operation and > limits concurrency. > >> The second replacement I've come up with allows two read queries to run >> simultaneously, even when they target the same collection, and when multiple collections >> are used simultaneously. > > As I said before, I welcome any exploration in this area. As James > just suggested, we may want to have a skype telecon on this to discuss > the possibilities and dangers. > > Finally, just as a note to other users who code against the internal > API: Dannes' WebDAV reimplementation shows some clean examples of how > to use internals: > > http://exist.svn.sourceforge.net/viewvc/exist/branches/dizzzz/trunk-webdav-upgrade/extensions/webdav/src/org/exist/webdav/ > > Wolfgang > ------------------------------------------------------------------------------ > This SF.net Dev2Dev email is sponsored by: > > Show off your parallel programming skills. > Enter the Intel(R) Threading Challenge 2010. > http://p.sf.net/sfu/intel-thread-sfd > _______________________________________________ > Exist-development mailing list > Exi...@li... > https://lists.sourceforge.net/lists/listinfo/exist-development > -- Adam Retter eXist Developer { United Kingdom } ad...@ex... irc://irc.freenode.net/existdb |
From: Jason S. <js...@in...> - 2010-08-31 20:55:30
|
The reason I have been sort of a pain in the tuckus about this is not to get a quick fix, but to get this on the roadmap. :-) I definitely don't want to lose it. -----Original Message----- From: Adam Retter [mailto:ad...@ex...] Sent: Tuesday, August 31, 2010 6:47 AM To: Jason Smith Cc: Wolfgang Meier; Paul Ryan; eXist development; Michael J. Pelikan; Todd Gochenour Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries. > Again, I would not expect a final solution to be implemented the way I have done it. I just want to be able to show some possibilities. I am really excited to see such enthusiasm and knowledge of the eXist-db internals and suggestions for improvements that could be made. > I am not expecting this conversation to take place quickly. This is one of those "long-haul" things - perhaps over years, and several major releases. I'm not in a hurry to do the wrong thing quickly!!! > Lets not loose this though, I think that any performance and scalability enhancements are great. And actually there is a flip side here that no one has mentioned: Whilst everyone has said that you should have optimised queries and appropriate indexes, which is invariably true for production systems, what about new users and developers who come to eXist-db? As a new user of eXist-db you dont necessarily understand about optimising your queries or even how to create the correct indexes. Not all eXist-db users are software developers! It would be great if these new users saw great performance from the start, even if they havent set up indexes and their queries are doing full scans of the dbx files :-) |
From: Dmitriy S. <sha...@gm...> - 2010-08-31 15:39:31
Attachments:
smime.p7s
|
On Tue, 2010-08-31 at 13:42 +0100, Adam Retter wrote: > > As I said before, I welcome any exploration in this area. As James > > just suggested, we may want to have a skype telecon on this to > discuss > > the possibilities and dangers. > > Skype teleconference would be good :-) When? (ready to join) -- Cheers, Dmitriy Shabanov |
From: Wolfgang M. <wol...@ex...> - 2010-08-31 15:50:44
|
>> Skype teleconference would be good :-) > > When? (ready to join) Ok, I think a teleconference would make it easier for me to explain my roadmap for eXist and for Jason to describe his ideas in more detail. I would have time for a talk tomorrow, but would prefer Thursday if possible. Anytime between 10am and 9pm CEST. Wolfgang |
From: Adam R. <ad...@ex...> - 2010-08-31 16:24:36
|
On 31 August 2010 16:43, Wolfgang Meier <wol...@ex...> wrote: >>> Skype teleconference would be good :-) >> >> When? (ready to join) > > Ok, I think a teleconference would make it easier for me to explain my > roadmap for eXist and for Jason to describe his ideas in more detail. > I would have time for a talk tomorrow, but would prefer Thursday if > possible. Anytime between 10am and 9pm CEST. Thursday is fine for me also, but will have to be before 5pm BST. > Wolfgang > -- Adam Retter eXist Developer { United Kingdom } ad...@ex... irc://irc.freenode.net/existdb |
From: Jason S. <js...@in...> - 2010-08-31 16:19:44
|
I guess I will have to install Skype. :-) Thursday is fine. No hurries. -----Original Message----- From: Adam Retter [mailto:ad...@ex...] Sent: Tuesday, August 31, 2010 10:17 AM To: Wolfgang Meier Cc: Dmitriy Shabanov; eXist development; Paul Ryan; Michael J. Pelikan; Todd Gochenour; Jason Smith; loren Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries. On 31 August 2010 16:43, Wolfgang Meier <wol...@ex...> wrote: >>> Skype teleconference would be good :-) >> >> When? (ready to join) > > Ok, I think a teleconference would make it easier for me to explain my > roadmap for eXist and for Jason to describe his ideas in more detail. > I would have time for a talk tomorrow, but would prefer Thursday if > possible. Anytime between 10am and 9pm CEST. Thursday is fine for me also, but will have to be before 5pm BST. > Wolfgang > -- Adam Retter eXist Developer { United Kingdom } ad...@ex... irc://irc.freenode.net/existdb |
From: Dmitriy S. <sha...@gm...> - 2010-08-31 18:50:01
Attachments:
smime.p7s
|
On Tue, 2010-08-31 at 09:19 -0700, Jason Smith wrote: > I guess I will have to install Skype. :-) Thursday is fine. No hurries. > > -----Original Message----- > From: Adam Retter [mailto:ad...@ex...] > Sent: Tuesday, August 31, 2010 10:17 AM > To: Wolfgang Meier > Cc: Dmitriy Shabanov; eXist development; Paul Ryan; Michael J. Pelikan; Todd Gochenour; Jason Smith; loren > Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries. > > On 31 August 2010 16:43, Wolfgang Meier <wol...@ex...> wrote: > >>> Skype teleconference would be good :-) > >> > >> When? (ready to join) > > > > Ok, I think a teleconference would make it easier for me to explain my > > roadmap for eXist and for Jason to describe his ideas in more detail. > > I would have time for a talk tomorrow, but would prefer Thursday if > > possible. Anytime between 10am and 9pm CEST. > > Thursday is fine for me also, but will have to be before 5pm BST. Thursday, almost any time. -- Cheers, Dmitriy Shabanov |
From: Jason S. <js...@in...> - 2010-08-31 19:57:45
|
I'm in the US, MST (Denver, CO). Be kind when picking times, please. :-) -----Original Message----- From: Wolfgang Meier [mailto:wol...@ex...] Sent: Tuesday, August 31, 2010 9:44 AM To: Dmitriy Shabanov Cc: Adam Retter; eXist development; Paul Ryan; Michael J. Pelikan; Todd Gochenour; Jason Smith; loren Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries. >> Skype teleconference would be good :-) > > When? (ready to join) Ok, I think a teleconference would make it easier for me to explain my roadmap for eXist and for Jason to describe his ideas in more detail. I would have time for a talk tomorrow, but would prefer Thursday if possible. Anytime between 10am and 9pm CEST. Wolfgang |
From: Adam R. <ad...@ex...> - 2010-08-31 20:17:05
|
How about 2pm UTC, if I have this correct (???) - 3pm my time (BST) 4pm Wolfgangs time (CEST) 9am your time??? On 31 August 2010 20:57, Jason Smith <js...@in...> wrote: > I'm in the US, MST (Denver, CO). Be kind when picking times, please. :-) > > -----Original Message----- > From: Wolfgang Meier [mailto:wol...@ex...] > Sent: Tuesday, August 31, 2010 9:44 AM > To: Dmitriy Shabanov > Cc: Adam Retter; eXist development; Paul Ryan; Michael J. Pelikan; Todd Gochenour; Jason Smith; loren > Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries. > >>> Skype teleconference would be good :-) >> >> When? (ready to join) > > Ok, I think a teleconference would make it easier for me to explain my > roadmap for eXist and for Jason to describe his ideas in more detail. > I would have time for a talk tomorrow, but would prefer Thursday if > possible. Anytime between 10am and 9pm CEST. > > Wolfgang > -- Adam Retter eXist Developer { United Kingdom } ad...@ex... irc://irc.freenode.net/existdb |
From: Jason S. <js...@in...> - 2010-08-31 20:25:24
|
I believe that is 8am my time. I think I can swing that. -----Original Message----- From: Adam Retter [mailto:ad...@ex...] Sent: Tuesday, August 31, 2010 2:17 PM To: Jason Smith Cc: Wolfgang Meier; Dmitriy Shabanov; eXist development; Paul Ryan; Michael J. Pelikan; Todd Gochenour; loren Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries. How about 2pm UTC, if I have this correct (???) - 3pm my time (BST) 4pm Wolfgangs time (CEST) 9am your time??? On 31 August 2010 20:57, Jason Smith <js...@in...> wrote: > I'm in the US, MST (Denver, CO). Be kind when picking times, please. :-) > > -----Original Message----- > From: Wolfgang Meier [mailto:wol...@ex...] > Sent: Tuesday, August 31, 2010 9:44 AM > To: Dmitriy Shabanov > Cc: Adam Retter; eXist development; Paul Ryan; Michael J. Pelikan; Todd Gochenour; Jason Smith; loren > Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries. > >>> Skype teleconference would be good :-) >> >> When? (ready to join) > > Ok, I think a teleconference would make it easier for me to explain my > roadmap for eXist and for Jason to describe his ideas in more detail. > I would have time for a talk tomorrow, but would prefer Thursday if > possible. Anytime between 10am and 9pm CEST. > > Wolfgang > -- Adam Retter eXist Developer { United Kingdom } ad...@ex... irc://irc.freenode.net/existdb |
From: Adam R. <ad...@ex...> - 2010-08-31 20:27:05
|
Wow steady on there! Thats a bit early! how about 3pm UTC? On 31 August 2010 21:25, Jason Smith <js...@in...> wrote: > I believe that is 8am my time. I think I can swing that. > > -----Original Message----- > From: Adam Retter [mailto:ad...@ex...] > Sent: Tuesday, August 31, 2010 2:17 PM > To: Jason Smith > Cc: Wolfgang Meier; Dmitriy Shabanov; eXist development; Paul Ryan; Michael J. Pelikan; Todd Gochenour; loren > Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries. > > How about 2pm UTC, if I have this correct (???) - > > 3pm my time (BST) > 4pm Wolfgangs time (CEST) > 9am your time??? > > > On 31 August 2010 20:57, Jason Smith <js...@in...> wrote: >> I'm in the US, MST (Denver, CO). Be kind when picking times, please. :-) >> >> -----Original Message----- >> From: Wolfgang Meier [mailto:wol...@ex...] >> Sent: Tuesday, August 31, 2010 9:44 AM >> To: Dmitriy Shabanov >> Cc: Adam Retter; eXist development; Paul Ryan; Michael J. Pelikan; Todd Gochenour; Jason Smith; loren >> Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries. >> >>>> Skype teleconference would be good :-) >>> >>> When? (ready to join) >> >> Ok, I think a teleconference would make it easier for me to explain my >> roadmap for eXist and for Jason to describe his ideas in more detail. >> I would have time for a talk tomorrow, but would prefer Thursday if >> possible. Anytime between 10am and 9pm CEST. >> >> Wolfgang >> > > > > -- > Adam Retter > > eXist Developer > { United Kingdom } > ad...@ex... > irc://irc.freenode.net/existdb > -- Adam Retter eXist Developer { United Kingdom } ad...@ex... irc://irc.freenode.net/existdb |
From: Jason S. <js...@in...> - 2010-08-31 20:29:58
|
That will work too. Kind of late for Wolfgang though. -----Original Message----- From: Adam Retter [mailto:ad...@ex...] Sent: Tuesday, August 31, 2010 2:27 PM To: Jason Smith Cc: Wolfgang Meier; Dmitriy Shabanov; eXist development; Paul Ryan; Michael J. Pelikan; Todd Gochenour; loren Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries. Wow steady on there! Thats a bit early! how about 3pm UTC? On 31 August 2010 21:25, Jason Smith <js...@in...> wrote: > I believe that is 8am my time. I think I can swing that. > > -----Original Message----- > From: Adam Retter [mailto:ad...@ex...] > Sent: Tuesday, August 31, 2010 2:17 PM > To: Jason Smith > Cc: Wolfgang Meier; Dmitriy Shabanov; eXist development; Paul Ryan; Michael J. Pelikan; Todd Gochenour; loren > Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries. > > How about 2pm UTC, if I have this correct (???) - > > 3pm my time (BST) > 4pm Wolfgangs time (CEST) > 9am your time??? > > > On 31 August 2010 20:57, Jason Smith <js...@in...> wrote: >> I'm in the US, MST (Denver, CO). Be kind when picking times, please. :-) >> >> -----Original Message----- >> From: Wolfgang Meier [mailto:wol...@ex...] >> Sent: Tuesday, August 31, 2010 9:44 AM >> To: Dmitriy Shabanov >> Cc: Adam Retter; eXist development; Paul Ryan; Michael J. Pelikan; Todd Gochenour; Jason Smith; loren >> Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries. >> >>>> Skype teleconference would be good :-) >>> >>> When? (ready to join) >> >> Ok, I think a teleconference would make it easier for me to explain my >> roadmap for eXist and for Jason to describe his ideas in more detail. >> I would have time for a talk tomorrow, but would prefer Thursday if >> possible. Anytime between 10am and 9pm CEST. >> >> Wolfgang >> > > > > -- > Adam Retter > > eXist Developer > { United Kingdom } > ad...@ex... > irc://irc.freenode.net/existdb > -- Adam Retter eXist Developer { United Kingdom } ad...@ex... irc://irc.freenode.net/existdb |
From: Wolfgang M. <wol...@ex...> - 2010-08-31 20:32:30
|
> That will work too. Kind of late for Wolfgang though. No, that's fine. Wolfgang |
From: Jason S. <js...@in...> - 2010-08-31 20:37:20
|
I'd like to prepare some materials. What is your preferred format? MS Office, Open Office, Maven XDoc... ? I will try to avoid PowerPoint. -----Original Message----- From: Wolfgang Meier [mailto:wol...@ex...] Sent: Tuesday, August 31, 2010 2:32 PM To: Jason Smith Cc: Adam Retter; Dmitriy Shabanov; eXist development; Paul Ryan; Michael J. Pelikan; Todd Gochenour; loren Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries. > That will work too. Kind of late for Wolfgang though. No, that's fine. Wolfgang |
From: Adam R. <ad...@ex...> - 2010-08-31 21:03:41
|
Anything that can be opened by either OpenOffice or a Web Browser should be fine - that includes the de-facto standard MS office formats... On 31 August 2010 21:36, Jason Smith <js...@in...> wrote: > I'd like to prepare some materials. What is your preferred format? MS Office, Open Office, Maven XDoc... ? I will try to avoid PowerPoint. > > -----Original Message----- > From: Wolfgang Meier [mailto:wol...@ex...] > Sent: Tuesday, August 31, 2010 2:32 PM > To: Jason Smith > Cc: Adam Retter; Dmitriy Shabanov; eXist development; Paul Ryan; Michael J. Pelikan; Todd Gochenour; loren > Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries. > >> That will work too. Kind of late for Wolfgang though. > > No, that's fine. > > Wolfgang > -- Adam Retter eXist Developer { United Kingdom } ad...@ex... irc://irc.freenode.net/existdb |
From: Dannes W. <da...@ex...> - 2010-08-31 20:41:36
|
We should probably use http://www.doodle.com/ for these kind of international meetings :-) On 31 Aug 2010, at 22:36 , Jason Smith wrote: >> Kind of late for Wolfgang though. Kind regards Dannes -- eXist-db Native XML Database - http://exist-db.org Join us on linked-in: http://www.linkedin.com/groups?gid=35624 |
From: Jason S. <js...@in...> - 2010-08-31 22:37:46
|
No, write some more on this. I have heard you talk about removing the collection locks, but I am not sure how you plan to replace their purpose. That would be good to understand before Thursday. If you remove collection locks, does that mean you are planning to lock at the resource level? Wouldn't that potentially lead to an awful lot of locks being taken? Am I misreading the idea? -----Original Message----- From: Wolfgang Meier [mailto:wol...@ex...] Sent: Tuesday, August 31, 2010 4:04 PM To: Jason Smith Cc: Adam Retter; Paul Ryan; eXist development; Michael J. Pelikan; Todd Gochenour Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries. > For example, if you have a single dom.dbx file, can you write to that from multiple threads at the same time safely? If you can't, then it doesn't make much sense to write-lock at the collection level, right? The bigger picture is much more complex. You have to talk about transactions, recovery, caching and more. You do not write to dom.dbx directly. You write to the page cache. And there's more than just dom.dbx. Writing the actual data just takes a small part of the overall indexing time. More time is spent with indexing, maintaining the transaction log etc. From my point of view, the next step in any redesign effort should be to remove the collection locks entirely. We have discussed this before. It will greatly simplify the locking and transaction log. My roadmap roughly looks like this: 1) remove collection dependency from core indexes: 1a) structural index, DONE 1b) range index, IN PROGESS 1c) remove document metadata from collection store and keep it separately. a collection is just a sequence of (arbitrary) document ids. a document can be linked to more than one collection. 2) drop all collection locks, except for the case where the collection metadata itself is modified Those steps have to be completed before we address other things. We need to simplify the architecture first, then try to do further redesigns. Any help will be welcome. As an added value, 1a and b will improve update/write performance in general. > Still, it seems like it would be nice to write to one document while querying against another document in the same collection... Normally, eXist will acquire a lock on the collection, acquire one on the document, release the collection lock, continue parsing the document. In some cases (node updates), the transaction handling has forced us to keep the lock on the collection longer than desired. But this can be changed (see above). > And this all has to be done in a system that currently does not support true transactional rollback (I think). eXist maintains a transaction log and does redo/undo on recovery. The only limitation is that the transaction log is incomplete, i.e. it does not cover any secondary indexes. Still transactional integrity has to be preserved and puts further requirements on the locking (this is the reason why collection locks are often not released early). Well, I will stop writing emails now and better explain everything on Thursday. Wolfgang |
From: Leif-Jöran O. <lj...@ex...> - 2010-09-01 06:37:00
|
Den 2010-09-01 00:55, Jason Smith skrev: > Quick OS Survey: > > What operating systems is everyone using? For purposes of finding some common white-boarding app. I'm good with Windows and/or Linux. Is anyone planning to attend Linux-only? Yes, if I will attend it will be GNU/Linux only. |
From: James F. <jam...@ex...> - 2010-09-01 06:50:45
|
2010/9/1 Leif-Jöran Olsson <lj...@ex...>: > Den 2010-09-01 00:55, Jason Smith skrev: >> Quick OS Survey: >> >> What operating systems is everyone using? For purposes of finding some common white-boarding app. I'm good with Windows and/or Linux. Is anyone planning to attend Linux-only? > > Yes, if I will attend it will be GNU/Linux only. me too. J |
From: Jason S. <js...@in...> - 2010-09-01 14:07:05
|
I had kind of assumed as much, but wanted to verify. Thanks! -----Original Message----- From: James Fuller [mailto:jam...@ex...] Sent: Wednesday, September 01, 2010 12:51 AM To: Leif-Jöran Olsson Cc: Jason Smith; Paul Ryan; eXist development; Michael J. Pelikan; Todd Gochenour Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries. 2010/9/1 Leif-Jöran Olsson <lj...@ex...>: > Den 2010-09-01 00:55, Jason Smith skrev: >> Quick OS Survey: >> >> What operating systems is everyone using? For purposes of finding some common white-boarding app. I'm good with Windows and/or Linux. Is anyone planning to attend Linux-only? > > Yes, if I will attend it will be GNU/Linux only. me too. J |
From: Wolfgang M. <wol...@ex...> - 2010-09-01 08:53:15
|
> If you remove collection locks, does that mean you are planning to lock at the resource level? Yes. In the current design, a resource does physically belong to a collection. To read or write a resource, you have to access the collection object first, then acquire a lock on the resource. Removing this direct dependency between collection and resource will have a number of benefits: 1) resource updates will become faster and scale better: indexes are currently organized by collection, which introduces a dependency between collection size and update speed (this has already been dropped for the structural index in trunk). The larger the collection, the slower your updates. Removing this dependency will increase scalability. You will be able to index or reindex a single resource without touching or locking the rest of the collection. 2) queries will consume less memory: right now, a query needs to load all required collections plus the internal metadata (name, permissions, owner...) for all resources at the start. This is slow and takes a lot of space. If resources are decoupled from collections, a query will just need the document id plus the lock for every resource. We no longer have to retrieve the actual document object. Instead, a lock manager maintains a simple map of documentId -> lock and the query uses the documentId only. A stored node in eXist is currently represented by a pair <DocumentImpl doc, long nodeId>. This will change to <int docId, long nodeId>. 3) the transaction log will become much easier to maintain. Right now we have to make sure that transactional integrity is preserved for both: dom.dbx and collections.dbx, at the same time, which introduces a number of problems. Decoupling them simplifies the transaction log since both indexes become independent and can be maintained independently. 4) the main deadlock issue, which is caused by the hierarchy of collection and resource locks, will disappear. If the collection is just a virtual entity, you only need to lock it if you modify its metadata (name, owner, permissions). Writing or reading a resource will not require a lock on the collection anymore since the resources are just loosely assigned to a collection. 5) if collections are entirely virtual, you can assign a resource to more than one collection. On the other hand, it may become possible for a resource to not be a member of any collection at all (in which case it is handled as a direct child of the root collection?). > Wouldn't that potentially lead to an awful lot of locks being taken? Am I misreading the idea? You won't need to take more locks than you do right now (rather less, since the collection locks disappear). Related to this discussion is the question of dirty reads: currently the query engine does allow dirty reads to some extent. I'm not yet sure if we should keep it like that (and just handle them more transparently) or disallow them completely. I'll try to come up with some graphic or mind map to explain the overall picture. Wolfgang |