Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

> The problem with deadlocks (the one I ran into) doesn't actually have anything to do with
> the collection hierarchy.  They are simply caused by taking two locks out of order.

Seems we need a bunch of low-level test cases first (there are some
higher level tests I used to debug similar issues). Sure, thread T1
can take a lock on doc A, then doc B, while thread T2 locks B then A
(see XQuery below). This case should indeed be handled by the deadlock
detection. However, in most cases eXist does take locks in order (due
to the fact that collections are always ordered the same).

> If I understand this correctly, the new design will need to manage, potentially, hundreds
> of thousands of locks that can be taken in arbitrary order.

Most traditional databases use an even finer granularity (single
pages, records, tuples). The new design would in no way be different
from the current situation. The query optimizer can effectively limit
the number of resources to be locked. I'm convinced that investing
time into improving the optimizer is the best way to achieve quick
improvements.

By redesigning the storage backend, you can increase performance by
50% or maybe 70% if things go well. Improvements to the query
optimizer often result in performance wins up to 500% and more. I just
did it again for some full text queries (not committed yet).

> 2) If you have more than one lock, guarantee that all locks are taken in the same order every time.

Given the possibilities of XQuery, this is going to be difficult to
guarantee,e.g.:

let $doc := request:get-parameter("doc", ())
let $a := doc($doc)/root
let $b := doc($doc//link/@href/string())
return
   (: do something, even read-only, with $a and $b :)

Since $b is determined by querying $a, you don't know in advance where
it will point to. If there's a circular link between $a and $b, you
deadlock. Allowing dirty reads helps a bit here.

> 3) Don't use locking at all.  Use a scheme that avoids the need for locking.

There are (relational) databases which avoid locking, e.g. by using a
shadow page concept on low-level storage and make the transaction fail
if a conflict occurs when it is committed. This requires a different
design on all levels though - up to the user who has to live with the
fact that transactions can fail and need to be redone.

> Option 2 is actually feasible.  This is the Clojure approach to the world - all data
> structures are read only, and you "mutate" something by reconstructing a new copy with
> the changes (sharing the old data wherever possible).

Yes, it is feasible, see above. But very difficult to implement.

> I am assuming, going forward, that the structures in the database are intended to be
> mutable, and they need to be protected by locks that can be taken in arbitrary order.  If
> that is correct, then deadlocks are going to occur.

Correct. See above.

Wolfgang

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

eXist-db is a feature rich Open Source native XML database

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.