Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

If used from the front-end APIs, eXist does not deadlock on typical XQueries.  I'm not sure what you mean by "collections are always ordered the same" though.  I can reference fn:collection("...") in multiple places in a single XPath, and these are not guaranteed to execute in the same order from one XQuery to the next.  Fn:collection("...") must acquire a collection lock at some point, correct?

Also, keep in mind that deadlock detection in eXist is currently broken, and I don't think it can be fixed.  I'm working on a write-up.

Other databases do have high concurrency.  I don't know about XML databases though.  I've worked with one in particular that is commercial.  It has a collection-based locking model, and it throws deadlock errors occasionally.  When this happens, you rollback and start over.  And this other database appears to have some concurrency limitations as well, just from some of the testing we've done (very preliminary).

So this is what is driving the conversation - how do we get to granular, concurrent access without killing ourselves with deadlocking?  A small number of intention locks is manageable.  Too few results in limited concurrency.  Too many results in a really difficult deadlock problem (detection is expensive).  Any solution that does not need locking at all should get CAREFUL consideration!!!  :-) IMHO

-----Original Message-----
From: Wolfgang Meier [mailto:wol...@ex...] 
Sent: Wednesday, September 01, 2010 9:24 AM
To: Jason Smith
Cc: Adam Retter; Paul Ryan; eXist development; Michael J. Pelikan; Todd Gochenour
Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

> The problem with deadlocks (the one I ran into) doesn't actually have anything to do with
> the collection hierarchy.  They are simply caused by taking two locks out of order.

Seems we need a bunch of low-level test cases first (there are some
higher level tests I used to debug similar issues). Sure, thread T1
can take a lock on doc A, then doc B, while thread T2 locks B then A
(see XQuery below). This case should indeed be handled by the deadlock
detection. However, in most cases eXist does take locks in order (due
to the fact that collections are always ordered the same).

> If I understand this correctly, the new design will need to manage, potentially, hundreds
> of thousands of locks that can be taken in arbitrary order.

Most traditional databases use an even finer granularity (single
pages, records, tuples). The new design would in no way be different
from the current situation. The query optimizer can effectively limit
the number of resources to be locked. I'm convinced that investing
time into improving the optimizer is the best way to achieve quick
improvements.

By redesigning the storage backend, you can increase performance by
50% or maybe 70% if things go well. Improvements to the query
optimizer often result in performance wins up to 500% and more. I just
did it again for some full text queries (not committed yet).

> 2) If you have more than one lock, guarantee that all locks are taken in the same order every time.

Given the possibilities of XQuery, this is going to be difficult to
guarantee,e.g.:

let $doc := request:get-parameter("doc", ())
let $a := doc($doc)/root
let $b := doc($doc//link/@href/string())
return
   (: do something, even read-only, with $a and $b :)

Since $b is determined by querying $a, you don't know in advance where
it will point to. If there's a circular link between $a and $b, you
deadlock. Allowing dirty reads helps a bit here.

> 3) Don't use locking at all.  Use a scheme that avoids the need for locking.

There are (relational) databases which avoid locking, e.g. by using a
shadow page concept on low-level storage and make the transaction fail
if a conflict occurs when it is committed. This requires a different
design on all levels though - up to the user who has to live with the
fact that transactions can fail and need to be redone.

> Option 2 is actually feasible.  This is the Clojure approach to the world - all data
> structures are read only, and you "mutate" something by reconstructing a new copy with
> the changes (sharing the old data wherever possible).

Yes, it is feasible, see above. But very difficult to implement.

> I am assuming, going forward, that the structures in the database are intended to be
> mutable, and they need to be protected by locks that can be taken in arbitrary order.  If
> that is correct, then deadlocks are going to occur.

Correct. See above.

Wolfgang

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

eXist-db is a feature rich Open Source native XML database

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.