Thread: Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

eXist-db is a feature rich Open Source native XML database

Brought to you by: deliriumsky, dizzzz, windauer, wolfgang_m

exist-development

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Wolfgang M. <wol...@ex...> - 2010-08-28 16:44:07

> Java is capable of reading data from the same file from multiple threads simultaneously.  In
> smaller data sets (hundreds of MB), the entire database will be cached in memory by the
> OS, so multiple concurrent access is fast.

You forgot one thing: eXist does not access the dbx files directly,
but always through the cache manager, which caches the btree as well
as data pages. Only the cache manager writes or reads to files and you
need to make sure that all threads see the same pages at any time.

I agree that access to dom.dbx could be more fine-grained (e.g. on the
page level), but I think you should rather look into other aspects
(see below).

> And please let me know if I have reached a wrong conclusion here.  It's a complex subject,
> and it's easy to miss things.

You ignored most of what I wrote in my previous emails, which I find a
bit unfriendly: if your query is formulated in the right way and you
have the proper indexes in place, the query engine SHOULD NOT access
dom.dbx AT ALL!!!!!!!!!!!!!!! I don't think dom.dbx is the bottleneck
- it's the QUERY.

Even for testing, please make sure your query is properly optimized or
your test won't be realistic.

I'd like to move this discussion over to the development list as it
will get too technical.

Wolfgang

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Jason S. <js...@in...> - 2010-08-30 15:40:02

Okay, I'll try to keep this short.  :-)  I have a tendency to go long on emails...

So in my company, we are using eXist in a way that tends to query against multiple collections simultaneously.  In doing so, we ran into deadlocks in org.exist.storage.lock.ReentrantReadWrite lock.  I looked deeply into that locking mechanism and realized it is deeply flawed.
* The deadlock detection mechanism does not work, and has no chance of working.  It can't be made to work at all.
* Deadlocks using this locking mechanism are easy to create.
* The read locks are mutexes.
* The locks don't recognize collection hierarchy, so you aren't locking what you think you are.
* There is a separate mechanism for locking resources (to work, it needs to be a combined mechanism).

So I have now rewritten org.exist.storage.lock.ReentrantReadWrite, twice. :-) Once to prove that I could completely serialize all access to eXist safely (it worked), and the second time to allow concurrent reads.  This also works, but concurrent performance is limited.  That is, a short query will not be blocked by a long query, which is a very positive result.

The second rewrite lets me selectively either have mutex-read (legacy) or concurrent-read locking, depending on the thread.  It is, well, complicated.  But it's doable.

This isn't something we could put directly into eXist.  Well, we could, but it would likely slow down some operations - you can't take advantage of the concurrent-read locking without deadlock detection code at every entry point - where Sessions are created.  And the legacy locking mode is a global mutex, to prevent deadlocks.  Hey, it works.

We'd have to talk about the implications.  My lock can deadlock!!!  eXist code would have to be modified at multiple points to handle this.  However, I have worked out a reasonable scheme for recovering from deadlocks.  It would require some recoding to take advantage of, and if you don't do the recoding, it defaults to original behavior, which is mutex.

If you guys have time, I can go deeper into the theory (maintaining legacy compatibility makes it kind of heady stuff).  And show you some code.

I have failed, once again, to keep it short. :-/

-----Original Message-----
From: Wolfgang Meier [mailto:wol...@ex...] 
Sent: Sunday, August 29, 2010 3:34 AM
To: Dmitriy Shabanov
Cc: Jason Smith; eXist development
Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

> Would 'normal' lock mechanism be suitable here? Or any restrictions that
> do not allow to use it?

I'm not yet sure of the consequences. I do believe without further
exploration that we could switch to a multi-read/exclusive write lock
mechanism in some places, though this would require some changes to
the cache management (which could - in return - result in new locks
being introduced ;-). The goal, let me repeat, would be to speed up
non index-assisted, non-optimized access to the DOM. Index-assisted
access itself is pretty fast and does allow for good concurrency.

But we have to be very careful here since the architecture is complex:
you have to consider transactional integrity, journalling, caching and
other aspects. If we change anything, we have to proceed carefully and
in very small steps. Stability is always my top priority.

Wolfgang

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Wolfgang M. <wol...@ex...> - 2010-08-30 16:06:50

> * Deadlocks using this locking mechanism are easy to create.
> * The read locks are mutexes.
> * The locks don't recognize collection hierarchy, so you aren't locking what you think you are.
> * There is a separate mechanism for locking resources (to work, it needs to be a combined mechanism).

Wait a second, I think we have to go back to the start of our
discussion months back. Parts of your application are written against
eXist's internal API, not against a public API like XML:DB, REST or
the XQuery interface. The internal API was never really meant to be
used in end-user applications (I recognize the fact that more and more
users work with it, so we'll need to document it and clean it up).

Anyway, as I tried to explain a few weeks back, the locking is not
fail-safe. You can produce deadlocks if you do not follow certain
conventions (which are hard to know since they are not documented). In
particular, acquiring a lock across multiple collections can cause a
deadlock.

In this case, eXist-internal code acquires a lock on the global
collection cache, which is a singleton, before acquiring the lock on
more than one collection. This is safe, though it puts the db into
single-task mode (but it happens only in one or two special cases
anyway). If I remember well, I sent you a junit test to demonstrate
this. Did you check your code if it does try to work across multiple
collections? I think it does. If so, please try my fix.

All public APIs are tested in environments with many concurrent users.
I'm not aware of any major deadlock situations, except for those
caused by WebDAV locking (the current WebDAV implementation will be
replaced soon). Using the internal API is risky. I do recognize it has
to be improved to allow developers to write code against it more
easily.

I welcome any attempt to improve the locking code to become easier and
more fail-safe. However, you have to be fair and give my suggestions a
try before stating that everything is flawed, while you are
programming against an API which has not been intended for general
use.

Wolfgang

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Jason S. <js...@in...> - 2010-08-30 16:59:33

Wolfgang, if you are interested, I could merge my code into a branch of eXist (not intended for human consumption), so that we could talk using concrete examples. In particular, about improving the locking mechanism so that it actually works (mine is more of a proof of concept, not a final solution).  It looks to me that a number of the concurrency limitations in eXist originate from having to work around the limitations of the current locking mechanism.

Again, I would not expect a final solution to be implemented the way I have done it.  I just want to be able to show some possibilities.

I am not expecting this conversation to take place quickly.  This is one of those "long-haul" things - perhaps over years, and several major releases.  I'm not in a hurry to do the wrong thing quickly!!!


-----Original Message-----
From: Wolfgang Meier [mailto:wol...@ex...] 
Sent: Monday, August 30, 2010 10:38 AM
To: Jason Smith
Cc: Dmitriy Shabanov; eXist development; Paul Ryan; Michael J. Pelikan; Todd Gochenour
Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

> I understand.  The solution we have in place right now is similar to the solution you
> mentioned, but we put it in place a while ago.  Augmenting the locking with a singleton
> lock does, indeed, work.

Internally, eXist does need the singleton lock in rare cases only,
mainly when reading or storing the collection configuration document
for a collection, or when locking documents for an XQuery update
expression.

Otherwise, eXist just avoids locking multiple collections at once
wherever possible as it is known to be an expensive operation and
limits concurrency.

> The second replacement I've come up with allows two read queries to run
> simultaneously, even when they target the same collection, and when multiple collections
> are used simultaneously.

As I said before, I welcome any exploration in this area. As James
just suggested, we may want to have a skype telecon on this to discuss
the possibilities and dangers.

Finally, just as a note to other users who code against the internal
API: Dannes' WebDAV reimplementation shows some clean examples of how
to use internals:

http://exist.svn.sourceforge.net/viewvc/exist/branches/dizzzz/trunk-webdav-upgrade/extensions/webdav/src/org/exist/webdav/

Wolfgang

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Adam R. <ad...@ex...> - 2010-08-31 12:47:23

> Again, I would not expect a final solution to be implemented the way I have done it.  I just want to be able to show some possibilities.

I am really excited to see such enthusiasm and knowledge of the
eXist-db internals and suggestions for improvements that could be
made.

> I am not expecting this conversation to take place quickly.  This is one of those "long-haul" things - perhaps over years, and several major releases.  I'm not in a hurry to do the wrong thing quickly!!!
>

Lets not loose this though, I think that any performance and
scalability enhancements are great.

And actually there is a flip side here that no one has mentioned:
Whilst everyone has said that you should have optimised queries and
appropriate indexes, which is invariably true for production systems,
what about new users and developers who come to eXist-db?
As a new user of eXist-db you dont necessarily understand about
optimising your queries or even how to create the correct indexes. Not
all eXist-db users are software developers! It would be great if these
new users saw great performance from the start, even if they havent
set up indexes and their queries are doing full scans of the dbx files
:-)


> -----Original Message-----
> From: Wolfgang Meier [mailto:wol...@ex...]
> Sent: Monday, August 30, 2010 10:38 AM
> To: Jason Smith
> Cc: Dmitriy Shabanov; eXist development; Paul Ryan; Michael J. Pelikan; Todd Gochenour
> Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries.
>
>> I understand.  The solution we have in place right now is similar to the solution you
>> mentioned, but we put it in place a while ago.  Augmenting the locking with a singleton
>> lock does, indeed, work.
>
> Internally, eXist does need the singleton lock in rare cases only,
> mainly when reading or storing the collection configuration document
> for a collection, or when locking documents for an XQuery update
> expression.
>
> Otherwise, eXist just avoids locking multiple collections at once
> wherever possible as it is known to be an expensive operation and
> limits concurrency.
>
>> The second replacement I've come up with allows two read queries to run
>> simultaneously, even when they target the same collection, and when multiple collections
>> are used simultaneously.
>
> As I said before, I welcome any exploration in this area. As James
> just suggested, we may want to have a skype telecon on this to discuss
> the possibilities and dangers.
>
> Finally, just as a note to other users who code against the internal
> API: Dannes' WebDAV reimplementation shows some clean examples of how
> to use internals:
>
> http://exist.svn.sourceforge.net/viewvc/exist/branches/dizzzz/trunk-webdav-upgrade/extensions/webdav/src/org/exist/webdav/
>
> Wolfgang
> ------------------------------------------------------------------------------
> This SF.net Dev2Dev email is sponsored by:
>
> Show off your parallel programming skills.
> Enter the Intel(R) Threading Challenge 2010.
> http://p.sf.net/sfu/intel-thread-sfd
> _______________________________________________
> Exist-development mailing list
> Exi...@li...
> https://lists.sourceforge.net/lists/listinfo/exist-development
>



-- 
Adam Retter

eXist Developer
{ United Kingdom }
ad...@ex...
irc://irc.freenode.net/existdb

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Jason S. <js...@in...> - 2010-08-31 20:55:30

The reason I have been sort of a pain in the tuckus about this is not to get a quick fix, but to get this on the roadmap. :-) I definitely don't want to lose it.

-----Original Message-----
From: Adam Retter [mailto:ad...@ex...] 
Sent: Tuesday, August 31, 2010 6:47 AM
To: Jason Smith
Cc: Wolfgang Meier; Paul Ryan; eXist development; Michael J. Pelikan; Todd Gochenour
Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

> Again, I would not expect a final solution to be implemented the way I have done it.  I just want to be able to show some possibilities.

I am really excited to see such enthusiasm and knowledge of the
eXist-db internals and suggestions for improvements that could be
made.

> I am not expecting this conversation to take place quickly.  This is one of those "long-haul" things - perhaps over years, and several major releases.  I'm not in a hurry to do the wrong thing quickly!!!
>

Lets not loose this though, I think that any performance and
scalability enhancements are great.

And actually there is a flip side here that no one has mentioned:
Whilst everyone has said that you should have optimised queries and
appropriate indexes, which is invariably true for production systems,
what about new users and developers who come to eXist-db?
As a new user of eXist-db you dont necessarily understand about
optimising your queries or even how to create the correct indexes. Not
all eXist-db users are software developers! It would be great if these
new users saw great performance from the start, even if they havent
set up indexes and their queries are doing full scans of the dbx files
:-)

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Dmitriy S. <sha...@gm...> - 2010-08-31 15:39:31

Attachments: smime.p7s

On Tue, 2010-08-31 at 13:42 +0100, Adam Retter wrote:
> > As I said before, I welcome any exploration in this area. As James
> > just suggested, we may want to have a skype telecon on this to
> discuss
> > the possibilities and dangers.
> 
> Skype teleconference would be good :-)

When? (ready to join)

-- 
Cheers,

Dmitriy Shabanov

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Wolfgang M. <wol...@ex...> - 2010-08-31 15:50:44

>> Skype teleconference would be good :-)
>
> When? (ready to join)

Ok, I think a teleconference would make it easier for me to explain my
roadmap for eXist and for Jason to describe his ideas in more detail.
I would have time for a talk tomorrow, but would prefer Thursday if
possible. Anytime between 10am and 9pm CEST.

Wolfgang

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Adam R. <ad...@ex...> - 2010-08-31 16:24:36

On 31 August 2010 16:43, Wolfgang Meier <wol...@ex...> wrote:
>>> Skype teleconference would be good :-)
>>
>> When? (ready to join)
>
> Ok, I think a teleconference would make it easier for me to explain my
> roadmap for eXist and for Jason to describe his ideas in more detail.
> I would have time for a talk tomorrow, but would prefer Thursday if
> possible. Anytime between 10am and 9pm CEST.

Thursday is fine for me also, but will have to be before 5pm BST.

> Wolfgang
>



-- 
Adam Retter

eXist Developer
{ United Kingdom }
ad...@ex...
irc://irc.freenode.net/existdb

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Jason S. <js...@in...> - 2010-08-31 16:19:44

I guess I will have to install Skype.  :-)  Thursday is fine.  No hurries.

-----Original Message-----
From: Adam Retter [mailto:ad...@ex...] 
Sent: Tuesday, August 31, 2010 10:17 AM
To: Wolfgang Meier
Cc: Dmitriy Shabanov; eXist development; Paul Ryan; Michael J. Pelikan; Todd Gochenour; Jason Smith; loren
Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

On 31 August 2010 16:43, Wolfgang Meier <wol...@ex...> wrote:
>>> Skype teleconference would be good :-)
>>
>> When? (ready to join)
>
> Ok, I think a teleconference would make it easier for me to explain my
> roadmap for eXist and for Jason to describe his ideas in more detail.
> I would have time for a talk tomorrow, but would prefer Thursday if
> possible. Anytime between 10am and 9pm CEST.

Thursday is fine for me also, but will have to be before 5pm BST.

> Wolfgang
>



-- 
Adam Retter

eXist Developer
{ United Kingdom }
ad...@ex...
irc://irc.freenode.net/existdb

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Dmitriy S. <sha...@gm...> - 2010-08-31 18:50:01

Attachments: smime.p7s

On Tue, 2010-08-31 at 09:19 -0700, Jason Smith wrote:
> I guess I will have to install Skype.  :-)  Thursday is fine.  No hurries.
> 
> -----Original Message-----
> From: Adam Retter [mailto:ad...@ex...] 
> Sent: Tuesday, August 31, 2010 10:17 AM
> To: Wolfgang Meier
> Cc: Dmitriy Shabanov; eXist development; Paul Ryan; Michael J. Pelikan; Todd Gochenour; Jason Smith; loren
> Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries.
> 
> On 31 August 2010 16:43, Wolfgang Meier <wol...@ex...> wrote:
> >>> Skype teleconference would be good :-)
> >>
> >> When? (ready to join)
> >
> > Ok, I think a teleconference would make it easier for me to explain my
> > roadmap for eXist and for Jason to describe his ideas in more detail.
> > I would have time for a talk tomorrow, but would prefer Thursday if
> > possible. Anytime between 10am and 9pm CEST.
> 
> Thursday is fine for me also, but will have to be before 5pm BST.

Thursday, almost any time.

-- 
Cheers,

Dmitriy Shabanov

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Jason S. <js...@in...> - 2010-08-31 19:57:45

I'm in the US, MST (Denver, CO).  Be kind when picking times, please.  :-)

-----Original Message-----
From: Wolfgang Meier [mailto:wol...@ex...] 
Sent: Tuesday, August 31, 2010 9:44 AM
To: Dmitriy Shabanov
Cc: Adam Retter; eXist development; Paul Ryan; Michael J. Pelikan; Todd Gochenour; Jason Smith; loren
Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

>> Skype teleconference would be good :-)
>
> When? (ready to join)

Ok, I think a teleconference would make it easier for me to explain my
roadmap for eXist and for Jason to describe his ideas in more detail.
I would have time for a talk tomorrow, but would prefer Thursday if
possible. Anytime between 10am and 9pm CEST.

Wolfgang

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Adam R. <ad...@ex...> - 2010-08-31 20:17:05

How about 2pm UTC, if I have this correct (???) -

3pm my time (BST)
4pm Wolfgangs time (CEST)
9am your time???


On 31 August 2010 20:57, Jason Smith <js...@in...> wrote:
> I'm in the US, MST (Denver, CO).  Be kind when picking times, please.  :-)
>
> -----Original Message-----
> From: Wolfgang Meier [mailto:wol...@ex...]
> Sent: Tuesday, August 31, 2010 9:44 AM
> To: Dmitriy Shabanov
> Cc: Adam Retter; eXist development; Paul Ryan; Michael J. Pelikan; Todd Gochenour; Jason Smith; loren
> Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries.
>
>>> Skype teleconference would be good :-)
>>
>> When? (ready to join)
>
> Ok, I think a teleconference would make it easier for me to explain my
> roadmap for eXist and for Jason to describe his ideas in more detail.
> I would have time for a talk tomorrow, but would prefer Thursday if
> possible. Anytime between 10am and 9pm CEST.
>
> Wolfgang
>



-- 
Adam Retter

eXist Developer
{ United Kingdom }
ad...@ex...
irc://irc.freenode.net/existdb

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Jason S. <js...@in...> - 2010-08-31 20:25:24

I believe that is 8am my time.  I think I can swing that.

-----Original Message-----
From: Adam Retter [mailto:ad...@ex...] 
Sent: Tuesday, August 31, 2010 2:17 PM
To: Jason Smith
Cc: Wolfgang Meier; Dmitriy Shabanov; eXist development; Paul Ryan; Michael J. Pelikan; Todd Gochenour; loren
Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

How about 2pm UTC, if I have this correct (???) -

3pm my time (BST)
4pm Wolfgangs time (CEST)
9am your time???


On 31 August 2010 20:57, Jason Smith <js...@in...> wrote:
> I'm in the US, MST (Denver, CO).  Be kind when picking times, please.  :-)
>
> -----Original Message-----
> From: Wolfgang Meier [mailto:wol...@ex...]
> Sent: Tuesday, August 31, 2010 9:44 AM
> To: Dmitriy Shabanov
> Cc: Adam Retter; eXist development; Paul Ryan; Michael J. Pelikan; Todd Gochenour; Jason Smith; loren
> Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries.
>
>>> Skype teleconference would be good :-)
>>
>> When? (ready to join)
>
> Ok, I think a teleconference would make it easier for me to explain my
> roadmap for eXist and for Jason to describe his ideas in more detail.
> I would have time for a talk tomorrow, but would prefer Thursday if
> possible. Anytime between 10am and 9pm CEST.
>
> Wolfgang
>



-- 
Adam Retter

eXist Developer
{ United Kingdom }
ad...@ex...
irc://irc.freenode.net/existdb

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Adam R. <ad...@ex...> - 2010-08-31 20:27:05

Wow steady on there! Thats a bit early! how about 3pm UTC?


On 31 August 2010 21:25, Jason Smith <js...@in...> wrote:
> I believe that is 8am my time.  I think I can swing that.
>
> -----Original Message-----
> From: Adam Retter [mailto:ad...@ex...]
> Sent: Tuesday, August 31, 2010 2:17 PM
> To: Jason Smith
> Cc: Wolfgang Meier; Dmitriy Shabanov; eXist development; Paul Ryan; Michael J. Pelikan; Todd Gochenour; loren
> Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries.
>
> How about 2pm UTC, if I have this correct (???) -
>
> 3pm my time (BST)
> 4pm Wolfgangs time (CEST)
> 9am your time???
>
>
> On 31 August 2010 20:57, Jason Smith <js...@in...> wrote:
>> I'm in the US, MST (Denver, CO).  Be kind when picking times, please.  :-)
>>
>> -----Original Message-----
>> From: Wolfgang Meier [mailto:wol...@ex...]
>> Sent: Tuesday, August 31, 2010 9:44 AM
>> To: Dmitriy Shabanov
>> Cc: Adam Retter; eXist development; Paul Ryan; Michael J. Pelikan; Todd Gochenour; Jason Smith; loren
>> Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries.
>>
>>>> Skype teleconference would be good :-)
>>>
>>> When? (ready to join)
>>
>> Ok, I think a teleconference would make it easier for me to explain my
>> roadmap for eXist and for Jason to describe his ideas in more detail.
>> I would have time for a talk tomorrow, but would prefer Thursday if
>> possible. Anytime between 10am and 9pm CEST.
>>
>> Wolfgang
>>
>
>
>
> --
> Adam Retter
>
> eXist Developer
> { United Kingdom }
> ad...@ex...
> irc://irc.freenode.net/existdb
>



-- 
Adam Retter

eXist Developer
{ United Kingdom }
ad...@ex...
irc://irc.freenode.net/existdb

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Jason S. <js...@in...> - 2010-08-31 20:29:58

That will work too.  Kind of late for Wolfgang though.

-----Original Message-----
From: Adam Retter [mailto:ad...@ex...] 
Sent: Tuesday, August 31, 2010 2:27 PM
To: Jason Smith
Cc: Wolfgang Meier; Dmitriy Shabanov; eXist development; Paul Ryan; Michael J. Pelikan; Todd Gochenour; loren
Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

Wow steady on there! Thats a bit early! how about 3pm UTC?


On 31 August 2010 21:25, Jason Smith <js...@in...> wrote:
> I believe that is 8am my time.  I think I can swing that.
>
> -----Original Message-----
> From: Adam Retter [mailto:ad...@ex...]
> Sent: Tuesday, August 31, 2010 2:17 PM
> To: Jason Smith
> Cc: Wolfgang Meier; Dmitriy Shabanov; eXist development; Paul Ryan; Michael J. Pelikan; Todd Gochenour; loren
> Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries.
>
> How about 2pm UTC, if I have this correct (???) -
>
> 3pm my time (BST)
> 4pm Wolfgangs time (CEST)
> 9am your time???
>
>
> On 31 August 2010 20:57, Jason Smith <js...@in...> wrote:
>> I'm in the US, MST (Denver, CO).  Be kind when picking times, please.  :-)
>>
>> -----Original Message-----
>> From: Wolfgang Meier [mailto:wol...@ex...]
>> Sent: Tuesday, August 31, 2010 9:44 AM
>> To: Dmitriy Shabanov
>> Cc: Adam Retter; eXist development; Paul Ryan; Michael J. Pelikan; Todd Gochenour; Jason Smith; loren
>> Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries.
>>
>>>> Skype teleconference would be good :-)
>>>
>>> When? (ready to join)
>>
>> Ok, I think a teleconference would make it easier for me to explain my
>> roadmap for eXist and for Jason to describe his ideas in more detail.
>> I would have time for a talk tomorrow, but would prefer Thursday if
>> possible. Anytime between 10am and 9pm CEST.
>>
>> Wolfgang
>>
>
>
>
> --
> Adam Retter
>
> eXist Developer
> { United Kingdom }
> ad...@ex...
> irc://irc.freenode.net/existdb
>



-- 
Adam Retter

eXist Developer
{ United Kingdom }
ad...@ex...
irc://irc.freenode.net/existdb

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Wolfgang M. <wol...@ex...> - 2010-08-31 20:32:30

> That will work too.  Kind of late for Wolfgang though.

No, that's fine.

Wolfgang

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Jason S. <js...@in...> - 2010-08-31 20:37:20

I'd like to prepare some materials.  What is your preferred format?  MS Office, Open Office, Maven XDoc... ?  I will try to avoid PowerPoint.

-----Original Message-----
From: Wolfgang Meier [mailto:wol...@ex...] 
Sent: Tuesday, August 31, 2010 2:32 PM
To: Jason Smith
Cc: Adam Retter; Dmitriy Shabanov; eXist development; Paul Ryan; Michael J. Pelikan; Todd Gochenour; loren
Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

> That will work too.  Kind of late for Wolfgang though.

No, that's fine.

Wolfgang

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Adam R. <ad...@ex...> - 2010-08-31 21:03:41

Anything that can be opened by either OpenOffice or a Web Browser
should be fine - that includes the de-facto standard MS office
formats...

On 31 August 2010 21:36, Jason Smith <js...@in...> wrote:
> I'd like to prepare some materials.  What is your preferred format?  MS Office, Open Office, Maven XDoc... ?  I will try to avoid PowerPoint.
>
> -----Original Message-----
> From: Wolfgang Meier [mailto:wol...@ex...]
> Sent: Tuesday, August 31, 2010 2:32 PM
> To: Jason Smith
> Cc: Adam Retter; Dmitriy Shabanov; eXist development; Paul Ryan; Michael J. Pelikan; Todd Gochenour; loren
> Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries.
>
>> That will work too.  Kind of late for Wolfgang though.
>
> No, that's fine.
>
> Wolfgang
>



-- 
Adam Retter

eXist Developer
{ United Kingdom }
ad...@ex...
irc://irc.freenode.net/existdb

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Dannes W. <da...@ex...> - 2010-08-31 20:41:36

We should probably use http://www.doodle.com/ for these kind of international meetings :-)


On 31 Aug 2010, at 22:36 , Jason Smith wrote:

>> Kind of late for Wolfgang though.

Kind regards

Dannes

--
eXist-db Native XML Database - http://exist-db.org
Join us on linked-in: http://www.linkedin.com/groups?gid=35624

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Jason S. <js...@in...> - 2010-08-31 22:37:46

No, write some more on this.  I have heard you talk about removing the collection locks, but I am not sure how you plan to replace their purpose.  That would be good to understand before Thursday.

If you remove collection locks, does that mean you are planning to lock at the resource level?

Wouldn't that potentially lead to an awful lot of locks being taken?  Am I misreading the idea?


-----Original Message-----
From: Wolfgang Meier [mailto:wol...@ex...] 
Sent: Tuesday, August 31, 2010 4:04 PM
To: Jason Smith
Cc: Adam Retter; Paul Ryan; eXist development; Michael J. Pelikan; Todd Gochenour
Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

> For example, if you have a single dom.dbx file, can you write to that from multiple threads at the same time safely?  If you can't, then it doesn't make much sense to write-lock at the collection level, right?

The bigger picture is much more complex. You have to talk about
transactions, recovery, caching and more. You do not write to dom.dbx
directly. You write to the page cache. And there's more than just
dom.dbx. Writing the actual data just takes a small part of the
overall indexing time. More time is spent with indexing, maintaining
the transaction log etc.

From my point of view, the next step in any redesign effort should be
to remove the collection locks entirely. We have discussed this
before. It will greatly simplify the locking and transaction log. My
roadmap roughly looks like this:

1) remove collection dependency from core indexes:
1a) structural index, DONE
1b) range index, IN PROGESS
1c) remove document metadata from collection store and keep it
separately. a collection is just a sequence of (arbitrary) document
ids. a document can be linked to more than one collection.
2) drop all collection locks, except for the case where the collection
metadata itself is modified

Those steps have to be completed before we address other things. We
need to simplify the architecture first, then try to do further
redesigns. Any help will be welcome. As an added value, 1a and b will
improve update/write performance in general.

> Still, it seems like it would be nice to write to one document while querying against another document in the same collection...

Normally, eXist will acquire a lock on the collection, acquire one on
the document, release the collection lock, continue parsing the
document. In some cases (node updates), the transaction handling has
forced us to keep the lock on the collection longer than desired. But
this can be changed (see above).

> And this all has to be done in a system that currently does not support true transactional rollback (I think).

eXist maintains a transaction log and does redo/undo on recovery. The
only limitation is that the transaction log is incomplete, i.e. it
does not cover any secondary indexes. Still transactional integrity
has to be preserved and puts further requirements on the locking (this
is the reason why collection locks are often not released early).

Well, I will stop writing emails now and better explain everything on Thursday.

Wolfgang

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Leif-Jöran O. <lj...@ex...> - 2010-09-01 06:37:00

Den 2010-09-01 00:55, Jason Smith skrev:
> Quick OS Survey:
>
> What operating systems is everyone using?  For purposes of finding some common white-boarding app.  I'm good with Windows and/or Linux.  Is anyone planning to attend Linux-only?

Yes, if I will attend it will be GNU/Linux only.

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: James F. <jam...@ex...> - 2010-09-01 06:50:45

2010/9/1 Leif-Jöran Olsson <lj...@ex...>:
> Den 2010-09-01 00:55, Jason Smith skrev:
>> Quick OS Survey:
>>
>> What operating systems is everyone using?  For purposes of finding some common white-boarding app.  I'm good with Windows and/or Linux.  Is anyone planning to attend Linux-only?
>
> Yes, if I will attend it will be GNU/Linux only.

me too.

J

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Jason S. <js...@in...> - 2010-09-01 14:07:05

I had kind of assumed as much, but wanted to verify.  Thanks!

-----Original Message-----
From: James Fuller [mailto:jam...@ex...] 
Sent: Wednesday, September 01, 2010 12:51 AM
To: Leif-Jöran Olsson
Cc: Jason Smith; Paul Ryan; eXist development; Michael J. Pelikan; Todd Gochenour
Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

2010/9/1 Leif-Jöran Olsson <lj...@ex...>:
> Den 2010-09-01 00:55, Jason Smith skrev:
>> Quick OS Survey:
>>
>> What operating systems is everyone using?  For purposes of finding some common white-boarding app.  I'm good with Windows and/or Linux.  Is anyone planning to attend Linux-only?
>
> Yes, if I will attend it will be GNU/Linux only.

me too.

J

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Wolfgang M. <wol...@ex...> - 2010-09-01 08:53:15

> If you remove collection locks, does that mean you are planning to lock at the resource level?

Yes. In the current design, a resource does physically belong to a
collection. To read or write a resource, you have to access the
collection object first, then acquire a lock on the resource. Removing
this direct dependency between collection and resource will have a
number of benefits:

1) resource updates will become faster and scale better: indexes are
currently organized by collection, which introduces a dependency
between collection size and update speed (this has already been
dropped for the structural index in trunk). The larger the collection,
the slower your updates. Removing this dependency will increase
scalability. You will be able to index or reindex a single resource
without touching or locking the rest of the collection.

2) queries will consume less memory: right now, a query needs to load
all required collections plus the internal metadata (name,
permissions, owner...) for all resources at the start. This is slow
and takes a lot of space. If resources are decoupled from collections,
a query will just need the document id plus the lock for every
resource. We no longer have to retrieve the actual document object.
Instead, a lock manager maintains a simple map of documentId -> lock
and the query uses the documentId only. A stored node in eXist is
currently represented by a pair <DocumentImpl doc, long nodeId>. This
will change to <int docId, long nodeId>.

3) the transaction log will become much easier to maintain. Right now
we have to make sure that transactional integrity is preserved for
both: dom.dbx and collections.dbx, at the same time, which introduces
a number of problems. Decoupling them simplifies the transaction log
since both indexes become independent and can be maintained
independently.

4) the main deadlock issue, which is caused by the hierarchy of
collection and resource locks, will disappear. If the collection is
just a virtual entity, you only need to lock it if you modify its
metadata (name, owner, permissions). Writing or reading a resource
will not require a lock on the collection anymore since the resources
are just loosely assigned to a collection.

5) if collections are entirely virtual, you can assign a resource to
more than one collection. On the other hand, it may become possible
for a resource to not be a member of any collection at all (in which
case it is handled as a direct child of the root collection?).

> Wouldn't that potentially lead to an awful lot of locks being taken? Am I misreading the idea?

You won't need to take more locks than you do right now (rather less,
since the collection locks disappear).

Related to this discussion is the question of dirty reads: currently
the query engine does allow dirty reads to some extent. I'm not yet
sure if we should keep it like that (and just handle them more
transparently) or disallow them completely.

I'll try to come up with some graphic or mind map to explain the
overall picture.

Wolfgang

1 2 3 > >> (Page 1 of 3)