exist-development Mailing List for eXist-db (Page 85)

eXist-db is a feature rich Open Source native XML database

Brought to you by: deliriumsky, dizzzz, windauer, wolfgang_m

exist-development — eXist Developer's List

You can subscribe to this list here.

2009	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (22)	Nov (85)	Dec (20)
2010	Jan (47)	Feb (127)	Mar (268)	Apr (78)	May (47)	Jun (38)	Jul (131)	Aug (221)	Sep (187)	Oct (54)	Nov (111)	Dec (84)
2011	Jan (152)	Feb (106)	Mar (94)	Apr (90)	May (53)	Jun (20)	Jul (24)	Aug (37)	Sep (32)	Oct (70)	Nov (22)	Dec (15)
2012	Jan (33)	Feb (110)	Mar (24)	Apr (1)	May (11)	Jun (8)	Jul (12)	Aug (37)	Sep (39)	Oct (81)	Nov (38)	Dec (50)
2013	Jan (23)	Feb (53)	Mar (23)	Apr (5)	May (19)	Jun (16)	Jul (16)	Aug (9)	Sep (21)	Oct (1)	Nov (2)	Dec (8)
2014	Jan (16)	Feb (6)	Mar (27)	Apr (1)	May (10)	Jun (1)	Jul (4)	Aug (10)	Sep (19)	Oct (22)	Nov (4)	Dec (6)
2015	Jan (3)	Feb (6)	Mar (9)	Apr	May (11)	Jun (23)	Jul (14)	Aug (10)	Sep (10)	Oct (9)	Nov (18)	Dec (4)
2016	Jan (5)	Feb (5)	Mar	Apr (2)	May (15)	Jun (2)	Jul (8)	Aug (2)	Sep (6)	Oct	Nov	Dec
2017	Jan (2)	Feb (12)	Mar (22)	Apr (6)	May	Jun	Jul (1)	Aug (1)	Sep (5)	Oct (2)	Nov	Dec
2018	Jan (2)	Feb	Mar	Apr	May	Jun (5)	Jul (3)	Aug	Sep (7)	Oct (19)	Nov	Dec
2021	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (1)	Nov	Dec
2022	Jan	Feb	Mar	Apr	May	Jun (3)	Jul	Aug	Sep	Oct	Nov	Dec

Flat | Threaded

<< < 1 .. 83 84 85 86 87 .. 128 > >> (Page 85 of 128)

Re: [Exist-development] Function Documentation is terminally slow!

From: Andrzej J. T. <an...@ch...> - 2010-09-01 16:29:15

Wolfgang:

>>> documentation, the page hits sourceforge.net?!!?!?!
> 
> Yes, sourceforge.net has been quite slow during the past days.
> 
>>> This doesn't seem right...
>>
>> Logos?
> 
> Indeed. Could we just remove them everywhere and just keep them for index.xml?

It's not the logos...they are being sourced from the local instance.

I think it's something in the javascript that is being used to do fancy formatting of the function doc entries.  I'll
bet it's trying to grab other .js libraries from external sources, when it should be sourcing them locally.  Could be
the YUI stuff perhaps?

-- 
Andrzej Taramina
Chaeron Corporation: Enterprise System Solutions
http://www.chaeron.com

Re: [Exist-development] Function Documentation is terminally slow!

From: Andrzej J. T. <an...@ch...> - 2010-09-01 16:13:29

Wolfgang:

>>> documentation, the page hits sourceforge.net?!!?!?!
> 
> Yes, sourceforge.net has been quite slow during the past days.
> 
>>> This doesn't seem right...
>>
>> Logos?
> 
> Indeed. Could we just remove them everywhere and just keep them for index.xml?

Why isn't the page sourcing the logos (or anything else from that matter) from localhost?  It would seem that using a
fully qualified URL for this stuff doesn't make sense.

-- 
Andrzej Taramina
Chaeron Corporation: Enterprise System Solutions
http://www.chaeron.com

Re: [Exist-development] [Exist-commits] SF.net SVN: exist:[12622] trunk/eXist/src/org/exist/xquery/functions/util/ Eval.java

From: Adam R. <ad...@ex...> - 2010-09-01 15:53:01

This code does not look correct to me, unless I am mistaken the
database broker of the original XQuery is reused in the listener which
is actually invoked in a separate thread - there is no guarantee that
that is thread safe.

A new broker should be allocated and release appropriately for the listener!


On 1 September 2010 16:44, Evgeny Gazdovsky <gaz...@gm...> wrote:
> can you look the source code of XMMPChatFunction.java
> It is use the callback function when creating chat.
> After resiving a message this callback will be called.
> --
> Evgeny
>
> ------------------------------------------------------------------------------
> This SF.net Dev2Dev email is sponsored by:
>
> Show off your parallel programming skills.
> Enter the Intel(R) Threading Challenge 2010.
> http://p.sf.net/sfu/intel-thread-sfd
> _______________________________________________
> Exist-development mailing list
> Exi...@li...
> https://lists.sourceforge.net/lists/listinfo/exist-development
>
>



-- 
Adam Retter

eXist Developer
{ United Kingdom }
ad...@ex...
irc://irc.freenode.net/existdb

[Exist-development] [Exist-commits] SF.net SVN: exist:[12622] trunk/eXist/src/org/exist/xquery/functions/util/ Eval.java

From: Evgeny G. <gaz...@gm...> - 2010-09-01 15:44:08

can you look the source code of XMMPChatFunction.java
It is use the callback function when creating chat.
After resiving a message this callback will be called.
--
Evgeny

Re: [Exist-development] [Exist-commits] SF.net SVN: exist:[12622] trunk/eXist/src/org/exist/xquery/functions/util/ Eval.java

From: Joe W. <jo...@gm...> - 2010-09-01 15:41:59

Hi Adam and Dmitriy,

A similar function exists in MarkLogic, xdmp:spawn().  For your
reference/discussion:

http://developer.marklogic.com/pubs/4.1/apidocs/Ext-6.html#xdmp:spawn

Joe


On Wed, Sep 1, 2010 at 11:33 AM, Adam Retter <ad...@ex...> wrote:
> NO it cant - because the callback function is in the original compiled
> xquery and not the xquery executed by the new thread. By time the new
> thread finishes executing there is no guarantee that the original
> xquery has not finished executing and been discarded (the callback
> function along with it)!
>
> On 1 September 2010 16:32, Dmitriy Shabanov <sha...@gm...> wrote:
>> It can be same thread:
>>
>> run() {
>>
>>   fire-start-event() //through call-back function
>>   //function body
>>   //it can have functions to send event through call-back function
>>
>>   fire-end-event() //through call-back function
>> }
>>
>> On Wed, 2010-09-01 at 16:23 +0100, Adam Retter wrote:
>>> Problem with a callback is that it would itself have to be another
>>> thread, as by this point the main XQuery may have finished executing
>>> and been cleaned up. So I dont really see that it gives you anything
>>> else, as you get the thread id anyway, and if we add further functions
>>> in future you could use these to monitor, cancel thread execution.
>>>
>>> On 1 September 2010 16:21, Evgeny Gazdovsky <gaz...@gm...> wrote:
>>> > run anything after thread will be finished, possible pass into callback the
>>> > results of thread
>>> >
>>> > 2010/9/1 Adam Retter <ada...@go...>
>>> >>
>>> >> What is the purpose of the callback?
>>> >>
>>> >> On 1 September 2010 14:51, Evgeny Gazdovsky <gaz...@gm...> wrote:
>>> >> > what about more common async function, like
>>> >> > util:async($any-code, $callback-function)?
>>> >> > So we can use one like:
>>> >> > declare function:local:calback($thread-id){
>>> >> > .....
>>> >> > };
>>> >> > util:asynk(
>>> >> >      (any code here, like at system:run-as())  ,
>>> >> >       util:function("local:callback",1)
>>> >> > )
>>> >> >
>>
>> --
>> Cheers,
>>
>> Dmitriy Shabanov
>>
>
>
>
> --
> Adam Retter
>
> eXist Developer
> { United Kingdom }
> ad...@ex...
> irc://irc.freenode.net/existdb
>
> ------------------------------------------------------------------------------
> This SF.net Dev2Dev email is sponsored by:
>
> Show off your parallel programming skills.
> Enter the Intel(R) Threading Challenge 2010.
> http://p.sf.net/sfu/intel-thread-sfd
> _______________________________________________
> Exist-commits mailing list
> Exi...@li...
> https://lists.sourceforge.net/lists/listinfo/exist-commits
>

Re: [Exist-development] Function Documentation is terminally slow!

From: Wolfgang M. <wol...@ex...> - 2010-09-01 15:35:25

>> documentation, the page hits sourceforge.net?!!?!?!

Yes, sourceforge.net has been quite slow during the past days.

>> This doesn't seem right...
>
> Logos?

Indeed. Could we just remove them everywhere and just keep them for index.xml?

Wolfgang

Re: [Exist-development] Function Documentation is terminally slow!

From: Dmitriy S. <sha...@gm...> - 2010-09-01 15:29:00

Attachments: smime.p7s

On Wed, 2010-09-01 at 10:15 -0400, Andrzej Jan Taramina wrote:
> More on this problem....I just noticed this: even though I'm running on a local instance, when I access the function
> documentation, the page hits sourceforge.net?!!?!?!
> 
> That's probably what is causing the delay.....why in the world is a local instance of eXist hitting an external web site
> to display the function docs?
> 
> This doesn't seem right...

Logos?

-- 
Cheers,

Dmitriy Shabanov

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Wolfgang M. <wol...@ex...> - 2010-09-01 15:23:40

> The problem with deadlocks (the one I ran into) doesn't actually have anything to do with
> the collection hierarchy.  They are simply caused by taking two locks out of order.

Seems we need a bunch of low-level test cases first (there are some
higher level tests I used to debug similar issues). Sure, thread T1
can take a lock on doc A, then doc B, while thread T2 locks B then A
(see XQuery below). This case should indeed be handled by the deadlock
detection. However, in most cases eXist does take locks in order (due
to the fact that collections are always ordered the same).

> If I understand this correctly, the new design will need to manage, potentially, hundreds
> of thousands of locks that can be taken in arbitrary order.

Most traditional databases use an even finer granularity (single
pages, records, tuples). The new design would in no way be different
from the current situation. The query optimizer can effectively limit
the number of resources to be locked. I'm convinced that investing
time into improving the optimizer is the best way to achieve quick
improvements.

By redesigning the storage backend, you can increase performance by
50% or maybe 70% if things go well. Improvements to the query
optimizer often result in performance wins up to 500% and more. I just
did it again for some full text queries (not committed yet).

> 2) If you have more than one lock, guarantee that all locks are taken in the same order every time.

Given the possibilities of XQuery, this is going to be difficult to
guarantee,e.g.:

let $doc := request:get-parameter("doc", ())
let $a := doc($doc)/root
let $b := doc($doc//link/@href/string())
return
   (: do something, even read-only, with $a and $b :)

Since $b is determined by querying $a, you don't know in advance where
it will point to. If there's a circular link between $a and $b, you
deadlock. Allowing dirty reads helps a bit here.

> 3) Don't use locking at all.  Use a scheme that avoids the need for locking.

There are (relational) databases which avoid locking, e.g. by using a
shadow page concept on low-level storage and make the transaction fail
if a conflict occurs when it is committed. This requires a different
design on all levels though - up to the user who has to live with the
fact that transactions can fail and need to be redone.

> Option 2 is actually feasible.  This is the Clojure approach to the world - all data
> structures are read only, and you "mutate" something by reconstructing a new copy with
> the changes (sharing the old data wherever possible).

Yes, it is feasible, see above. But very difficult to implement.

> I am assuming, going forward, that the structures in the database are intended to be
> mutable, and they need to be protected by locks that can be taken in arbitrary order.  If
> that is correct, then deadlocks are going to occur.

Correct. See above.

Wolfgang

[Exist-development] Function Documentation is terminally slow!

From: Andrzej J. T. <an...@ch...> - 2010-09-01 14:15:24

More on this problem....I just noticed this: even though I'm running on a local instance, when I access the function
documentation, the page hits sourceforge.net?!!?!?!

That's probably what is causing the delay.....why in the world is a local instance of eXist hitting an external web site
to display the function docs?

This doesn't seem right...

-- 
Andrzej Taramina
Chaeron Corporation: Enterprise System Solutions
http://www.chaeron.com

[Exist-development] Function Documentation is terminally slow!

From: Andrzej J. T. <an...@ch...> - 2010-09-01 14:11:26

I've noticed that when I try to display the Function Documentation web page, when you do a search or browse for a
specific set of functions in a namespace (eg. Util), it takes forever to display the page.  And it seems to lock up my
whole machine, not just Firefox when it's rendering the page.

It never used to do this....

Any ideas what is causing this?  I'm running the latest Firefox (3.6.8) on a 64-bit Ubuntu, fast quad processor box with
gobs of memory.....it should fly like the wind...but instead my machine will become non-responsive for 20-30 seconds.
Running an SVN build from late July (since the security stuff broke everything, and I haven't had a chance to test the
latest stuff of a few days ago).

Anyone have any idea what might be causing this massive slowdown?

Thanks.

-- 
Andrzej Taramina
Chaeron Corporation: Enterprise System Solutions
http://www.chaeron.com

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Jason S. <js...@in...> - 2010-09-01 14:07:05

I had kind of assumed as much, but wanted to verify.  Thanks!

-----Original Message-----
From: James Fuller [mailto:jam...@ex...] 
Sent: Wednesday, September 01, 2010 12:51 AM
To: Leif-Jöran Olsson
Cc: Jason Smith; Paul Ryan; eXist development; Michael J. Pelikan; Todd Gochenour
Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

2010/9/1 Leif-Jöran Olsson <lj...@ex...>:
> Den 2010-09-01 00:55, Jason Smith skrev:
>> Quick OS Survey:
>>
>> What operating systems is everyone using?  For purposes of finding some common white-boarding app.  I'm good with Windows and/or Linux.  Is anyone planning to attend Linux-only?
>
> Yes, if I will attend it will be GNU/Linux only.

me too.

J

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Jason S. <js...@in...> - 2010-09-01 14:06:01

The problem with deadlocks (the one I ran into) doesn't actually have anything to do with the collection hierarchy.  They are simply caused by taking two locks out of order. The fact that ReentrantReadWriteLock appears to be using a hierarchy (it actually isn't) is a red herring.

If I understand this correctly, the new design will need to manage, potentially, hundreds of thousands of locks that can be taken in arbitrary order.  If that's true, there are still going to be deadlocks.  And since this is such fantastically granular locking, finding the deadlocks will be very slow (I think it's an n^2 algorithm - at least Java's deadlock detection appears to be n^2).

There are only three ways to avoid deadlocks.

THE THREE WAYS TO PREVENT DEADLOCKS

1) Use a single global mutex to lock.  A single mutex cannot deadlock.
2) If you have more than one lock, guarantee that all locks are taken in the same order every time.
3) Don't use locking at all.  Use a scheme that avoids the need for locking.

Otherwise, if you are using mutex locks, you will have deadlocks.  They are, unfortunately, unavoidable if the design doesn't fall into one of these 3 categories.

Option 2 is actually feasible.  This is the Clojure approach to the world - all data structures are read only, and you "mutate" something by reconstructing a new copy with the changes (sharing the old data wherever possible).  

I am assuming, going forward, that the structures in the database are intended to be mutable, and they need to be protected by locks that can be taken in arbitrary order.  If that is correct, then deadlocks are going to occur.  

Does this make sense, or am I out in left field? :-)



-----Original Message-----
From: Wolfgang Meier [mailto:wol...@ex...] 
Sent: Wednesday, September 01, 2010 2:53 AM
To: Jason Smith
Cc: Adam Retter; Paul Ryan; eXist development; Michael J. Pelikan; Todd Gochenour
Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

> If you remove collection locks, does that mean you are planning to lock at the resource level?

Yes. In the current design, a resource does physically belong to a
collection. To read or write a resource, you have to access the
collection object first, then acquire a lock on the resource. Removing
this direct dependency between collection and resource will have a
number of benefits:

1) resource updates will become faster and scale better: indexes are
currently organized by collection, which introduces a dependency
between collection size and update speed (this has already been
dropped for the structural index in trunk). The larger the collection,
the slower your updates. Removing this dependency will increase
scalability. You will be able to index or reindex a single resource
without touching or locking the rest of the collection.

2) queries will consume less memory: right now, a query needs to load
all required collections plus the internal metadata (name,
permissions, owner...) for all resources at the start. This is slow
and takes a lot of space. If resources are decoupled from collections,
a query will just need the document id plus the lock for every
resource. We no longer have to retrieve the actual document object.
Instead, a lock manager maintains a simple map of documentId -> lock
and the query uses the documentId only. A stored node in eXist is
currently represented by a pair <DocumentImpl doc, long nodeId>. This
will change to <int docId, long nodeId>.

3) the transaction log will become much easier to maintain. Right now
we have to make sure that transactional integrity is preserved for
both: dom.dbx and collections.dbx, at the same time, which introduces
a number of problems. Decoupling them simplifies the transaction log
since both indexes become independent and can be maintained
independently.

4) the main deadlock issue, which is caused by the hierarchy of
collection and resource locks, will disappear. If the collection is
just a virtual entity, you only need to lock it if you modify its
metadata (name, owner, permissions). Writing or reading a resource
will not require a lock on the collection anymore since the resources
are just loosely assigned to a collection.

5) if collections are entirely virtual, you can assign a resource to
more than one collection. On the other hand, it may become possible
for a resource to not be a member of any collection at all (in which
case it is handled as a direct child of the root collection?).

> Wouldn't that potentially lead to an awful lot of locks being taken?  Am I misreading the idea?

You won't need to take more locks than you do right now (rather less,
since the collection locks disappear).

Related to this discussion is the question of dirty reads: currently
the query engine does allow dirty reads to some extent. I'm not yet
sure if we should keep it like that (and just handle them more
transparently) or disallow them completely.

I'll try to come up with some graphic or mind map to explain the
overall picture.

Wolfgang

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Wolfgang M. <wol...@ex...> - 2010-09-01 09:00:04

> A stored node in eXist is
> currently represented by a pair <DocumentImpl doc, long nodeId>. This
> will change to <int docId, long nodeId>.

Sorry, it is <DocumentImpl doc, NodeId nodeId> and will become <int
docId, NodeId nodeId> where NodeId is essentially a binary encoded
hierarchical identifier.

Wolfgang

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Wolfgang M. <wol...@ex...> - 2010-09-01 08:53:15

> If you remove collection locks, does that mean you are planning to lock at the resource level?

Yes. In the current design, a resource does physically belong to a
collection. To read or write a resource, you have to access the
collection object first, then acquire a lock on the resource. Removing
this direct dependency between collection and resource will have a
number of benefits:

1) resource updates will become faster and scale better: indexes are
currently organized by collection, which introduces a dependency
between collection size and update speed (this has already been
dropped for the structural index in trunk). The larger the collection,
the slower your updates. Removing this dependency will increase
scalability. You will be able to index or reindex a single resource
without touching or locking the rest of the collection.

2) queries will consume less memory: right now, a query needs to load
all required collections plus the internal metadata (name,
permissions, owner...) for all resources at the start. This is slow
and takes a lot of space. If resources are decoupled from collections,
a query will just need the document id plus the lock for every
resource. We no longer have to retrieve the actual document object.
Instead, a lock manager maintains a simple map of documentId -> lock
and the query uses the documentId only. A stored node in eXist is
currently represented by a pair <DocumentImpl doc, long nodeId>. This
will change to <int docId, long nodeId>.

3) the transaction log will become much easier to maintain. Right now
we have to make sure that transactional integrity is preserved for
both: dom.dbx and collections.dbx, at the same time, which introduces
a number of problems. Decoupling them simplifies the transaction log
since both indexes become independent and can be maintained
independently.

4) the main deadlock issue, which is caused by the hierarchy of
collection and resource locks, will disappear. If the collection is
just a virtual entity, you only need to lock it if you modify its
metadata (name, owner, permissions). Writing or reading a resource
will not require a lock on the collection anymore since the resources
are just loosely assigned to a collection.

5) if collections are entirely virtual, you can assign a resource to
more than one collection. On the other hand, it may become possible
for a resource to not be a member of any collection at all (in which
case it is handled as a direct child of the root collection?).

> Wouldn't that potentially lead to an awful lot of locks being taken? Am I misreading the idea?

You won't need to take more locks than you do right now (rather less,
since the collection locks disappear).

Related to this discussion is the question of dirty reads: currently
the query engine does allow dirty reads to some extent. I'm not yet
sure if we should keep it like that (and just handle them more
transparently) or disallow them completely.

I'll try to come up with some graphic or mind map to explain the
overall picture.

Wolfgang

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: James F. <jam...@ex...> - 2010-09-01 06:50:45

2010/9/1 Leif-Jöran Olsson <lj...@ex...>:
> Den 2010-09-01 00:55, Jason Smith skrev:
>> Quick OS Survey:
>>
>> What operating systems is everyone using?  For purposes of finding some common white-boarding app.  I'm good with Windows and/or Linux.  Is anyone planning to attend Linux-only?
>
> Yes, if I will attend it will be GNU/Linux only.

me too.

J

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Leif-Jöran O. <lj...@ex...> - 2010-09-01 06:37:00

Den 2010-09-01 00:55, Jason Smith skrev:
> Quick OS Survey:
>
> What operating systems is everyone using?  For purposes of finding some common white-boarding app.  I'm good with Windows and/or Linux.  Is anyone planning to attend Linux-only?

Yes, if I will attend it will be GNU/Linux only.

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Jason S. <js...@in...> - 2010-08-31 22:55:20

Quick OS Survey:

What operating systems is everyone using?  For purposes of finding some common white-boarding app.  I'm good with Windows and/or Linux.  Is anyone planning to attend Linux-only?

-----Original Message-----
From: Wolfgang Meier [mailto:wol...@ex...] 
Sent: Tuesday, August 31, 2010 4:04 PM
To: Jason Smith
Cc: Adam Retter; Paul Ryan; eXist development; Michael J. Pelikan; Todd Gochenour
Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

> For example, if you have a single dom.dbx file, can you write to that from multiple threads at the same time safely?  If you can't, then it doesn't make much sense to write-lock at the collection level, right?

The bigger picture is much more complex. You have to talk about
transactions, recovery, caching and more. You do not write to dom.dbx
directly. You write to the page cache. And there's more than just
dom.dbx. Writing the actual data just takes a small part of the
overall indexing time. More time is spent with indexing, maintaining
the transaction log etc.

From my point of view, the next step in any redesign effort should be
to remove the collection locks entirely. We have discussed this
before. It will greatly simplify the locking and transaction log. My
roadmap roughly looks like this:

1) remove collection dependency from core indexes:
1a) structural index, DONE
1b) range index, IN PROGESS
1c) remove document metadata from collection store and keep it
separately. a collection is just a sequence of (arbitrary) document
ids. a document can be linked to more than one collection.
2) drop all collection locks, except for the case where the collection
metadata itself is modified

Those steps have to be completed before we address other things. We
need to simplify the architecture first, then try to do further
redesigns. Any help will be welcome. As an added value, 1a and b will
improve update/write performance in general.

> Still, it seems like it would be nice to write to one document while querying against another document in the same collection...

Normally, eXist will acquire a lock on the collection, acquire one on
the document, release the collection lock, continue parsing the
document. In some cases (node updates), the transaction handling has
forced us to keep the lock on the collection longer than desired. But
this can be changed (see above).

> And this all has to be done in a system that currently does not support true transactional rollback (I think).

eXist maintains a transaction log and does redo/undo on recovery. The
only limitation is that the transaction log is incomplete, i.e. it
does not cover any secondary indexes. Still transactional integrity
has to be preserved and puts further requirements on the locking (this
is the reason why collection locks are often not released early).

Well, I will stop writing emails now and better explain everything on Thursday.

Wolfgang

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Jason S. <js...@in...> - 2010-08-31 22:37:46

No, write some more on this.  I have heard you talk about removing the collection locks, but I am not sure how you plan to replace their purpose.  That would be good to understand before Thursday.

If you remove collection locks, does that mean you are planning to lock at the resource level?

Wouldn't that potentially lead to an awful lot of locks being taken?  Am I misreading the idea?


-----Original Message-----
From: Wolfgang Meier [mailto:wol...@ex...] 
Sent: Tuesday, August 31, 2010 4:04 PM
To: Jason Smith
Cc: Adam Retter; Paul Ryan; eXist development; Michael J. Pelikan; Todd Gochenour
Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

> For example, if you have a single dom.dbx file, can you write to that from multiple threads at the same time safely?  If you can't, then it doesn't make much sense to write-lock at the collection level, right?

The bigger picture is much more complex. You have to talk about
transactions, recovery, caching and more. You do not write to dom.dbx
directly. You write to the page cache. And there's more than just
dom.dbx. Writing the actual data just takes a small part of the
overall indexing time. More time is spent with indexing, maintaining
the transaction log etc.

From my point of view, the next step in any redesign effort should be
to remove the collection locks entirely. We have discussed this
before. It will greatly simplify the locking and transaction log. My
roadmap roughly looks like this:

1) remove collection dependency from core indexes:
1a) structural index, DONE
1b) range index, IN PROGESS
1c) remove document metadata from collection store and keep it
separately. a collection is just a sequence of (arbitrary) document
ids. a document can be linked to more than one collection.
2) drop all collection locks, except for the case where the collection
metadata itself is modified

Those steps have to be completed before we address other things. We
need to simplify the architecture first, then try to do further
redesigns. Any help will be welcome. As an added value, 1a and b will
improve update/write performance in general.

> Still, it seems like it would be nice to write to one document while querying against another document in the same collection...

Normally, eXist will acquire a lock on the collection, acquire one on
the document, release the collection lock, continue parsing the
document. In some cases (node updates), the transaction handling has
forced us to keep the lock on the collection longer than desired. But
this can be changed (see above).

> And this all has to be done in a system that currently does not support true transactional rollback (I think).

eXist maintains a transaction log and does redo/undo on recovery. The
only limitation is that the transaction log is incomplete, i.e. it
does not cover any secondary indexes. Still transactional integrity
has to be preserved and puts further requirements on the locking (this
is the reason why collection locks are often not released early).

Well, I will stop writing emails now and better explain everything on Thursday.

Wolfgang

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Wolfgang M. <wol...@ex...> - 2010-08-31 22:04:28

> For example, if you have a single dom.dbx file, can you write to that from multiple threads at the same time safely?  If you can't, then it doesn't make much sense to write-lock at the collection level, right?

The bigger picture is much more complex. You have to talk about
transactions, recovery, caching and more. You do not write to dom.dbx
directly. You write to the page cache. And there's more than just
dom.dbx. Writing the actual data just takes a small part of the
overall indexing time. More time is spent with indexing, maintaining
the transaction log etc.

>From my point of view, the next step in any redesign effort should be
to remove the collection locks entirely. We have discussed this
before. It will greatly simplify the locking and transaction log. My
roadmap roughly looks like this:

1) remove collection dependency from core indexes:
1a) structural index, DONE
1b) range index, IN PROGESS
1c) remove document metadata from collection store and keep it
separately. a collection is just a sequence of (arbitrary) document
ids. a document can be linked to more than one collection.
2) drop all collection locks, except for the case where the collection
metadata itself is modified

Those steps have to be completed before we address other things. We
need to simplify the architecture first, then try to do further
redesigns. Any help will be welcome. As an added value, 1a and b will
improve update/write performance in general.

> Still, it seems like it would be nice to write to one document while querying against another document in the same collection...

Normally, eXist will acquire a lock on the collection, acquire one on
the document, release the collection lock, continue parsing the
document. In some cases (node updates), the transaction handling has
forced us to keep the lock on the collection longer than desired. But
this can be changed (see above).

> And this all has to be done in a system that currently does not support true transactional rollback (I think).

eXist maintains a transaction log and does redo/undo on recovery. The
only limitation is that the transaction log is incomplete, i.e. it
does not cover any secondary indexes. Still transactional integrity
has to be preserved and puts further requirements on the locking (this
is the reason why collection locks are often not released early).

Well, I will stop writing emails now and better explain everything on Thursday.

Wolfgang

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Jason S. <js...@in...> - 2010-08-31 21:30:55

There is also dimdim.  I've used that one before.  Preferences?  It would be nice to have a whiteboard app.

From: Loren Cahlander [mailto:lor...@gm...]
Sent: Tuesday, August 31, 2010 2:49 PM
To: Dannes Wessels
Cc: Loren Cahlander; Jason Smith; Wolfgang Meier; eXist development; Michael J. Pelikan; Todd Gochenour; Paul Ryan
Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

How about Google Docs?  http://docs.google.com

On Aug 31, 2010, at 03:41 PM, Dannes Wessels wrote:

We should probably use http://www.doodle.com/ for these kind of international meetings :-)

On 31 Aug 2010, at 22:36 , Jason Smith wrote:

Kind of late for Wolfgang though.

Kind regards

Dannes

--
eXist-db Native XML Database - http://exist-db.org<http://exist-db.org/>
Join us on linked-in: http://www.linkedin.com/groups?gid=35624

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Jason S. <js...@in...> - 2010-08-31 21:23:32

I think we need to talk about how much granularity is actually valuable.  For example, if you have a single dom.dbx file, can you write to that from multiple threads at the same time safely?  If you can't, then it doesn't make much sense to write-lock at the collection level, right?

Also, if you have a maximally granular locking mechanism, allowing locks on collections and resources, both deep locks and shallow locks, with multiple readers and a single writer allowed on each, the deadlock detection gets really complex.  Performance on deadlock detection can blow up.

Still, it seems like it would be nice to write to one document while querying against another document in the same collection...

How the collections are implemented underneath isn't that important to the locking mechanism, other than this affects how much concurrency that you can actually take advantage of.  

I think, though, that you **need to have the locking mechanism in place pretty early on.**  If you design a wonderfully concurrent back end, but you control access though a global mutex, well, what have you got?  :-)

Plus, in any non-trivial locking mechanism, there will be deadlocks and deadlock detection-and-recovery.  And the software, as a whole, has to use the standards for detection-and-recovery if you want to be able to take advantage of the more concurrent locking. 

And this all has to be done in a system that currently does not support true transactional rollback (I think).  Which means there are some additional rules when it comes to write locking...

Too much information.  I'll write something up for Thursday, and hopefully all this stuff will become more clear.  This is not an easy topic for anyone, including myself!


-----Original Message-----
From: Adam Retter [mailto:ad...@ex...] 
Sent: Tuesday, August 31, 2010 6:42 AM
To: Wolfgang Meier
Cc: Jason Smith; Paul Ryan; eXist development; Michael J. Pelikan; Todd Gochenour
Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

> Otherwise, eXist just avoids locking multiple collections at once
> wherever possible as it is known to be an expensive operation and
> limits concurrency.

We have discussed several times replacing eXist-db's current
collection mechanism with a virtualised implementation where
Collections are just another number in the system. This was discussed
for the purposes of performance when large collections are involved.

Would this simplify the overal problem domain? If so, perhaps this
work should be undertaken before a redesign of the locking system?

>> The second replacement I've come up with allows two read queries to run
>> simultaneously, even when they target the same collection, and when multiple collections
>> are used simultaneously.
>
> As I said before, I welcome any exploration in this area. As James
> just suggested, we may want to have a skype telecon on this to discuss
> the possibilities and dangers.

Skype teleconference would be good :-)

-- 
Adam Retter

eXist Developer
{ United Kingdom }
ad...@ex...
irc://irc.freenode.net/existdb

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Adam R. <ad...@ex...> - 2010-08-31 21:03:41

Anything that can be opened by either OpenOffice or a Web Browser
should be fine - that includes the de-facto standard MS office
formats...

On 31 August 2010 21:36, Jason Smith <js...@in...> wrote:
> I'd like to prepare some materials.  What is your preferred format?  MS Office, Open Office, Maven XDoc... ?  I will try to avoid PowerPoint.
>
> -----Original Message-----
> From: Wolfgang Meier [mailto:wol...@ex...]
> Sent: Tuesday, August 31, 2010 2:32 PM
> To: Jason Smith
> Cc: Adam Retter; Dmitriy Shabanov; eXist development; Paul Ryan; Michael J. Pelikan; Todd Gochenour; loren
> Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries.
>
>> That will work too.  Kind of late for Wolfgang though.
>
> No, that's fine.
>
> Wolfgang
>



-- 
Adam Retter

eXist Developer
{ United Kingdom }
ad...@ex...
irc://irc.freenode.net/existdb

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Jason S. <js...@in...> - 2010-08-31 20:55:30

The reason I have been sort of a pain in the tuckus about this is not to get a quick fix, but to get this on the roadmap. :-) I definitely don't want to lose it.

-----Original Message-----
From: Adam Retter [mailto:ad...@ex...] 
Sent: Tuesday, August 31, 2010 6:47 AM
To: Jason Smith
Cc: Wolfgang Meier; Paul Ryan; eXist development; Michael J. Pelikan; Todd Gochenour
Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

> Again, I would not expect a final solution to be implemented the way I have done it.  I just want to be able to show some possibilities.

I am really excited to see such enthusiasm and knowledge of the
eXist-db internals and suggestions for improvements that could be
made.

> I am not expecting this conversation to take place quickly.  This is one of those "long-haul" things - perhaps over years, and several major releases.  I'm not in a hurry to do the wrong thing quickly!!!
>

Lets not loose this though, I think that any performance and
scalability enhancements are great.

And actually there is a flip side here that no one has mentioned:
Whilst everyone has said that you should have optimised queries and
appropriate indexes, which is invariably true for production systems,
what about new users and developers who come to eXist-db?
As a new user of eXist-db you dont necessarily understand about
optimising your queries or even how to create the correct indexes. Not
all eXist-db users are software developers! It would be great if these
new users saw great performance from the start, even if they havent
set up indexes and their queries are doing full scans of the dbx files
:-)

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Dannes W. <da...@ex...> - 2010-08-31 20:41:36

We should probably use http://www.doodle.com/ for these kind of international meetings :-)


On 31 Aug 2010, at 22:36 , Jason Smith wrote:

>> Kind of late for Wolfgang though.

Kind regards

Dannes

--
eXist-db Native XML Database - http://exist-db.org
Join us on linked-in: http://www.linkedin.com/groups?gid=35624

Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

From: Jason S. <js...@in...> - 2010-08-31 20:37:20

I'd like to prepare some materials.  What is your preferred format?  MS Office, Open Office, Maven XDoc... ?  I will try to avoid PowerPoint.

-----Original Message-----
From: Wolfgang Meier [mailto:wol...@ex...] 
Sent: Tuesday, August 31, 2010 2:32 PM
To: Jason Smith
Cc: Adam Retter; Dmitriy Shabanov; eXist development; Paul Ryan; Michael J. Pelikan; Todd Gochenour; loren
Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries.

> That will work too.  Kind of late for Wolfgang though.

No, that's fine.

Wolfgang

64 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 83 84 85 86 87 .. 128 > >> (Page 85 of 128)