You can subscribe to this list here.
2009 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(22) |
Nov
(85) |
Dec
(20) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2010 |
Jan
(47) |
Feb
(127) |
Mar
(268) |
Apr
(78) |
May
(47) |
Jun
(38) |
Jul
(131) |
Aug
(221) |
Sep
(187) |
Oct
(54) |
Nov
(111) |
Dec
(84) |
2011 |
Jan
(152) |
Feb
(106) |
Mar
(94) |
Apr
(90) |
May
(53) |
Jun
(20) |
Jul
(24) |
Aug
(37) |
Sep
(32) |
Oct
(70) |
Nov
(22) |
Dec
(15) |
2012 |
Jan
(33) |
Feb
(110) |
Mar
(24) |
Apr
(1) |
May
(11) |
Jun
(8) |
Jul
(12) |
Aug
(37) |
Sep
(39) |
Oct
(81) |
Nov
(38) |
Dec
(50) |
2013 |
Jan
(23) |
Feb
(53) |
Mar
(23) |
Apr
(5) |
May
(19) |
Jun
(16) |
Jul
(16) |
Aug
(9) |
Sep
(21) |
Oct
(1) |
Nov
(2) |
Dec
(8) |
2014 |
Jan
(16) |
Feb
(6) |
Mar
(27) |
Apr
(1) |
May
(10) |
Jun
(1) |
Jul
(4) |
Aug
(10) |
Sep
(19) |
Oct
(22) |
Nov
(4) |
Dec
(6) |
2015 |
Jan
(3) |
Feb
(6) |
Mar
(9) |
Apr
|
May
(11) |
Jun
(23) |
Jul
(14) |
Aug
(10) |
Sep
(10) |
Oct
(9) |
Nov
(18) |
Dec
(4) |
2016 |
Jan
(5) |
Feb
(5) |
Mar
|
Apr
(2) |
May
(15) |
Jun
(2) |
Jul
(8) |
Aug
(2) |
Sep
(6) |
Oct
|
Nov
|
Dec
|
2017 |
Jan
(2) |
Feb
(12) |
Mar
(22) |
Apr
(6) |
May
|
Jun
|
Jul
(1) |
Aug
(1) |
Sep
(5) |
Oct
(2) |
Nov
|
Dec
|
2018 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
(5) |
Jul
(3) |
Aug
|
Sep
(7) |
Oct
(19) |
Nov
|
Dec
|
2021 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(3) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Andrzej J. T. <an...@ch...> - 2010-09-01 16:29:15
|
Wolfgang: >>> documentation, the page hits sourceforge.net?!!?!?! > > Yes, sourceforge.net has been quite slow during the past days. > >>> This doesn't seem right... >> >> Logos? > > Indeed. Could we just remove them everywhere and just keep them for index.xml? It's not the logos...they are being sourced from the local instance. I think it's something in the javascript that is being used to do fancy formatting of the function doc entries. I'll bet it's trying to grab other .js libraries from external sources, when it should be sourcing them locally. Could be the YUI stuff perhaps? -- Andrzej Taramina Chaeron Corporation: Enterprise System Solutions http://www.chaeron.com |
From: Andrzej J. T. <an...@ch...> - 2010-09-01 16:13:29
|
Wolfgang: >>> documentation, the page hits sourceforge.net?!!?!?! > > Yes, sourceforge.net has been quite slow during the past days. > >>> This doesn't seem right... >> >> Logos? > > Indeed. Could we just remove them everywhere and just keep them for index.xml? Why isn't the page sourcing the logos (or anything else from that matter) from localhost? It would seem that using a fully qualified URL for this stuff doesn't make sense. -- Andrzej Taramina Chaeron Corporation: Enterprise System Solutions http://www.chaeron.com |
From: Adam R. <ad...@ex...> - 2010-09-01 15:53:01
|
This code does not look correct to me, unless I am mistaken the database broker of the original XQuery is reused in the listener which is actually invoked in a separate thread - there is no guarantee that that is thread safe. A new broker should be allocated and release appropriately for the listener! On 1 September 2010 16:44, Evgeny Gazdovsky <gaz...@gm...> wrote: > can you look the source code of XMMPChatFunction.java > It is use the callback function when creating chat. > After resiving a message this callback will be called. > -- > Evgeny > > ------------------------------------------------------------------------------ > This SF.net Dev2Dev email is sponsored by: > > Show off your parallel programming skills. > Enter the Intel(R) Threading Challenge 2010. > http://p.sf.net/sfu/intel-thread-sfd > _______________________________________________ > Exist-development mailing list > Exi...@li... > https://lists.sourceforge.net/lists/listinfo/exist-development > > -- Adam Retter eXist Developer { United Kingdom } ad...@ex... irc://irc.freenode.net/existdb |
From: Evgeny G. <gaz...@gm...> - 2010-09-01 15:44:08
|
can you look the source code of XMMPChatFunction.java It is use the callback function when creating chat. After resiving a message this callback will be called. -- Evgeny |
From: Joe W. <jo...@gm...> - 2010-09-01 15:41:59
|
Hi Adam and Dmitriy, A similar function exists in MarkLogic, xdmp:spawn(). For your reference/discussion: http://developer.marklogic.com/pubs/4.1/apidocs/Ext-6.html#xdmp:spawn Joe On Wed, Sep 1, 2010 at 11:33 AM, Adam Retter <ad...@ex...> wrote: > NO it cant - because the callback function is in the original compiled > xquery and not the xquery executed by the new thread. By time the new > thread finishes executing there is no guarantee that the original > xquery has not finished executing and been discarded (the callback > function along with it)! > > On 1 September 2010 16:32, Dmitriy Shabanov <sha...@gm...> wrote: >> It can be same thread: >> >> run() { >> >> fire-start-event() //through call-back function >> //function body >> //it can have functions to send event through call-back function >> >> fire-end-event() //through call-back function >> } >> >> On Wed, 2010-09-01 at 16:23 +0100, Adam Retter wrote: >>> Problem with a callback is that it would itself have to be another >>> thread, as by this point the main XQuery may have finished executing >>> and been cleaned up. So I dont really see that it gives you anything >>> else, as you get the thread id anyway, and if we add further functions >>> in future you could use these to monitor, cancel thread execution. >>> >>> On 1 September 2010 16:21, Evgeny Gazdovsky <gaz...@gm...> wrote: >>> > run anything after thread will be finished, possible pass into callback the >>> > results of thread >>> > >>> > 2010/9/1 Adam Retter <ada...@go...> >>> >> >>> >> What is the purpose of the callback? >>> >> >>> >> On 1 September 2010 14:51, Evgeny Gazdovsky <gaz...@gm...> wrote: >>> >> > what about more common async function, like >>> >> > util:async($any-code, $callback-function)? >>> >> > So we can use one like: >>> >> > declare function:local:calback($thread-id){ >>> >> > ..... >>> >> > }; >>> >> > util:asynk( >>> >> > (any code here, like at system:run-as()) , >>> >> > util:function("local:callback",1) >>> >> > ) >>> >> > >> >> -- >> Cheers, >> >> Dmitriy Shabanov >> > > > > -- > Adam Retter > > eXist Developer > { United Kingdom } > ad...@ex... > irc://irc.freenode.net/existdb > > ------------------------------------------------------------------------------ > This SF.net Dev2Dev email is sponsored by: > > Show off your parallel programming skills. > Enter the Intel(R) Threading Challenge 2010. > http://p.sf.net/sfu/intel-thread-sfd > _______________________________________________ > Exist-commits mailing list > Exi...@li... > https://lists.sourceforge.net/lists/listinfo/exist-commits > |
From: Wolfgang M. <wol...@ex...> - 2010-09-01 15:35:25
|
>> documentation, the page hits sourceforge.net?!!?!?! Yes, sourceforge.net has been quite slow during the past days. >> This doesn't seem right... > > Logos? Indeed. Could we just remove them everywhere and just keep them for index.xml? Wolfgang |
From: Dmitriy S. <sha...@gm...> - 2010-09-01 15:29:00
|
On Wed, 2010-09-01 at 10:15 -0400, Andrzej Jan Taramina wrote: > More on this problem....I just noticed this: even though I'm running on a local instance, when I access the function > documentation, the page hits sourceforge.net?!!?!?! > > That's probably what is causing the delay.....why in the world is a local instance of eXist hitting an external web site > to display the function docs? > > This doesn't seem right... Logos? -- Cheers, Dmitriy Shabanov |
From: Wolfgang M. <wol...@ex...> - 2010-09-01 15:23:40
|
> The problem with deadlocks (the one I ran into) doesn't actually have anything to do with > the collection hierarchy. They are simply caused by taking two locks out of order. Seems we need a bunch of low-level test cases first (there are some higher level tests I used to debug similar issues). Sure, thread T1 can take a lock on doc A, then doc B, while thread T2 locks B then A (see XQuery below). This case should indeed be handled by the deadlock detection. However, in most cases eXist does take locks in order (due to the fact that collections are always ordered the same). > If I understand this correctly, the new design will need to manage, potentially, hundreds > of thousands of locks that can be taken in arbitrary order. Most traditional databases use an even finer granularity (single pages, records, tuples). The new design would in no way be different from the current situation. The query optimizer can effectively limit the number of resources to be locked. I'm convinced that investing time into improving the optimizer is the best way to achieve quick improvements. By redesigning the storage backend, you can increase performance by 50% or maybe 70% if things go well. Improvements to the query optimizer often result in performance wins up to 500% and more. I just did it again for some full text queries (not committed yet). > 2) If you have more than one lock, guarantee that all locks are taken in the same order every time. Given the possibilities of XQuery, this is going to be difficult to guarantee,e.g.: let $doc := request:get-parameter("doc", ()) let $a := doc($doc)/root let $b := doc($doc//link/@href/string()) return (: do something, even read-only, with $a and $b :) Since $b is determined by querying $a, you don't know in advance where it will point to. If there's a circular link between $a and $b, you deadlock. Allowing dirty reads helps a bit here. > 3) Don't use locking at all. Use a scheme that avoids the need for locking. There are (relational) databases which avoid locking, e.g. by using a shadow page concept on low-level storage and make the transaction fail if a conflict occurs when it is committed. This requires a different design on all levels though - up to the user who has to live with the fact that transactions can fail and need to be redone. > Option 2 is actually feasible. This is the Clojure approach to the world - all data > structures are read only, and you "mutate" something by reconstructing a new copy with > the changes (sharing the old data wherever possible). Yes, it is feasible, see above. But very difficult to implement. > I am assuming, going forward, that the structures in the database are intended to be > mutable, and they need to be protected by locks that can be taken in arbitrary order. If > that is correct, then deadlocks are going to occur. Correct. See above. Wolfgang |
From: Andrzej J. T. <an...@ch...> - 2010-09-01 14:15:24
|
More on this problem....I just noticed this: even though I'm running on a local instance, when I access the function documentation, the page hits sourceforge.net?!!?!?! That's probably what is causing the delay.....why in the world is a local instance of eXist hitting an external web site to display the function docs? This doesn't seem right... -- Andrzej Taramina Chaeron Corporation: Enterprise System Solutions http://www.chaeron.com |
From: Andrzej J. T. <an...@ch...> - 2010-09-01 14:11:26
|
I've noticed that when I try to display the Function Documentation web page, when you do a search or browse for a specific set of functions in a namespace (eg. Util), it takes forever to display the page. And it seems to lock up my whole machine, not just Firefox when it's rendering the page. It never used to do this.... Any ideas what is causing this? I'm running the latest Firefox (3.6.8) on a 64-bit Ubuntu, fast quad processor box with gobs of memory.....it should fly like the wind...but instead my machine will become non-responsive for 20-30 seconds. Running an SVN build from late July (since the security stuff broke everything, and I haven't had a chance to test the latest stuff of a few days ago). Anyone have any idea what might be causing this massive slowdown? Thanks. -- Andrzej Taramina Chaeron Corporation: Enterprise System Solutions http://www.chaeron.com |
From: Jason S. <js...@in...> - 2010-09-01 14:07:05
|
I had kind of assumed as much, but wanted to verify. Thanks! -----Original Message----- From: James Fuller [mailto:jam...@ex...] Sent: Wednesday, September 01, 2010 12:51 AM To: Leif-Jöran Olsson Cc: Jason Smith; Paul Ryan; eXist development; Michael J. Pelikan; Todd Gochenour Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries. 2010/9/1 Leif-Jöran Olsson <lj...@ex...>: > Den 2010-09-01 00:55, Jason Smith skrev: >> Quick OS Survey: >> >> What operating systems is everyone using? For purposes of finding some common white-boarding app. I'm good with Windows and/or Linux. Is anyone planning to attend Linux-only? > > Yes, if I will attend it will be GNU/Linux only. me too. J |
From: Jason S. <js...@in...> - 2010-09-01 14:06:01
|
The problem with deadlocks (the one I ran into) doesn't actually have anything to do with the collection hierarchy. They are simply caused by taking two locks out of order. The fact that ReentrantReadWriteLock appears to be using a hierarchy (it actually isn't) is a red herring. If I understand this correctly, the new design will need to manage, potentially, hundreds of thousands of locks that can be taken in arbitrary order. If that's true, there are still going to be deadlocks. And since this is such fantastically granular locking, finding the deadlocks will be very slow (I think it's an n^2 algorithm - at least Java's deadlock detection appears to be n^2). There are only three ways to avoid deadlocks. THE THREE WAYS TO PREVENT DEADLOCKS 1) Use a single global mutex to lock. A single mutex cannot deadlock. 2) If you have more than one lock, guarantee that all locks are taken in the same order every time. 3) Don't use locking at all. Use a scheme that avoids the need for locking. Otherwise, if you are using mutex locks, you will have deadlocks. They are, unfortunately, unavoidable if the design doesn't fall into one of these 3 categories. Option 2 is actually feasible. This is the Clojure approach to the world - all data structures are read only, and you "mutate" something by reconstructing a new copy with the changes (sharing the old data wherever possible). I am assuming, going forward, that the structures in the database are intended to be mutable, and they need to be protected by locks that can be taken in arbitrary order. If that is correct, then deadlocks are going to occur. Does this make sense, or am I out in left field? :-) -----Original Message----- From: Wolfgang Meier [mailto:wol...@ex...] Sent: Wednesday, September 01, 2010 2:53 AM To: Jason Smith Cc: Adam Retter; Paul Ryan; eXist development; Michael J. Pelikan; Todd Gochenour Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries. > If you remove collection locks, does that mean you are planning to lock at the resource level? Yes. In the current design, a resource does physically belong to a collection. To read or write a resource, you have to access the collection object first, then acquire a lock on the resource. Removing this direct dependency between collection and resource will have a number of benefits: 1) resource updates will become faster and scale better: indexes are currently organized by collection, which introduces a dependency between collection size and update speed (this has already been dropped for the structural index in trunk). The larger the collection, the slower your updates. Removing this dependency will increase scalability. You will be able to index or reindex a single resource without touching or locking the rest of the collection. 2) queries will consume less memory: right now, a query needs to load all required collections plus the internal metadata (name, permissions, owner...) for all resources at the start. This is slow and takes a lot of space. If resources are decoupled from collections, a query will just need the document id plus the lock for every resource. We no longer have to retrieve the actual document object. Instead, a lock manager maintains a simple map of documentId -> lock and the query uses the documentId only. A stored node in eXist is currently represented by a pair <DocumentImpl doc, long nodeId>. This will change to <int docId, long nodeId>. 3) the transaction log will become much easier to maintain. Right now we have to make sure that transactional integrity is preserved for both: dom.dbx and collections.dbx, at the same time, which introduces a number of problems. Decoupling them simplifies the transaction log since both indexes become independent and can be maintained independently. 4) the main deadlock issue, which is caused by the hierarchy of collection and resource locks, will disappear. If the collection is just a virtual entity, you only need to lock it if you modify its metadata (name, owner, permissions). Writing or reading a resource will not require a lock on the collection anymore since the resources are just loosely assigned to a collection. 5) if collections are entirely virtual, you can assign a resource to more than one collection. On the other hand, it may become possible for a resource to not be a member of any collection at all (in which case it is handled as a direct child of the root collection?). > Wouldn't that potentially lead to an awful lot of locks being taken? Am I misreading the idea? You won't need to take more locks than you do right now (rather less, since the collection locks disappear). Related to this discussion is the question of dirty reads: currently the query engine does allow dirty reads to some extent. I'm not yet sure if we should keep it like that (and just handle them more transparently) or disallow them completely. I'll try to come up with some graphic or mind map to explain the overall picture. Wolfgang |
From: Wolfgang M. <wol...@ex...> - 2010-09-01 09:00:04
|
> A stored node in eXist is > currently represented by a pair <DocumentImpl doc, long nodeId>. This > will change to <int docId, long nodeId>. Sorry, it is <DocumentImpl doc, NodeId nodeId> and will become <int docId, NodeId nodeId> where NodeId is essentially a binary encoded hierarchical identifier. Wolfgang |
From: Wolfgang M. <wol...@ex...> - 2010-09-01 08:53:15
|
> If you remove collection locks, does that mean you are planning to lock at the resource level? Yes. In the current design, a resource does physically belong to a collection. To read or write a resource, you have to access the collection object first, then acquire a lock on the resource. Removing this direct dependency between collection and resource will have a number of benefits: 1) resource updates will become faster and scale better: indexes are currently organized by collection, which introduces a dependency between collection size and update speed (this has already been dropped for the structural index in trunk). The larger the collection, the slower your updates. Removing this dependency will increase scalability. You will be able to index or reindex a single resource without touching or locking the rest of the collection. 2) queries will consume less memory: right now, a query needs to load all required collections plus the internal metadata (name, permissions, owner...) for all resources at the start. This is slow and takes a lot of space. If resources are decoupled from collections, a query will just need the document id plus the lock for every resource. We no longer have to retrieve the actual document object. Instead, a lock manager maintains a simple map of documentId -> lock and the query uses the documentId only. A stored node in eXist is currently represented by a pair <DocumentImpl doc, long nodeId>. This will change to <int docId, long nodeId>. 3) the transaction log will become much easier to maintain. Right now we have to make sure that transactional integrity is preserved for both: dom.dbx and collections.dbx, at the same time, which introduces a number of problems. Decoupling them simplifies the transaction log since both indexes become independent and can be maintained independently. 4) the main deadlock issue, which is caused by the hierarchy of collection and resource locks, will disappear. If the collection is just a virtual entity, you only need to lock it if you modify its metadata (name, owner, permissions). Writing or reading a resource will not require a lock on the collection anymore since the resources are just loosely assigned to a collection. 5) if collections are entirely virtual, you can assign a resource to more than one collection. On the other hand, it may become possible for a resource to not be a member of any collection at all (in which case it is handled as a direct child of the root collection?). > Wouldn't that potentially lead to an awful lot of locks being taken? Am I misreading the idea? You won't need to take more locks than you do right now (rather less, since the collection locks disappear). Related to this discussion is the question of dirty reads: currently the query engine does allow dirty reads to some extent. I'm not yet sure if we should keep it like that (and just handle them more transparently) or disallow them completely. I'll try to come up with some graphic or mind map to explain the overall picture. Wolfgang |
From: James F. <jam...@ex...> - 2010-09-01 06:50:45
|
2010/9/1 Leif-Jöran Olsson <lj...@ex...>: > Den 2010-09-01 00:55, Jason Smith skrev: >> Quick OS Survey: >> >> What operating systems is everyone using? For purposes of finding some common white-boarding app. I'm good with Windows and/or Linux. Is anyone planning to attend Linux-only? > > Yes, if I will attend it will be GNU/Linux only. me too. J |
From: Leif-Jöran O. <lj...@ex...> - 2010-09-01 06:37:00
|
Den 2010-09-01 00:55, Jason Smith skrev: > Quick OS Survey: > > What operating systems is everyone using? For purposes of finding some common white-boarding app. I'm good with Windows and/or Linux. Is anyone planning to attend Linux-only? Yes, if I will attend it will be GNU/Linux only. |
From: Jason S. <js...@in...> - 2010-08-31 22:55:20
|
Quick OS Survey: What operating systems is everyone using? For purposes of finding some common white-boarding app. I'm good with Windows and/or Linux. Is anyone planning to attend Linux-only? -----Original Message----- From: Wolfgang Meier [mailto:wol...@ex...] Sent: Tuesday, August 31, 2010 4:04 PM To: Jason Smith Cc: Adam Retter; Paul Ryan; eXist development; Michael J. Pelikan; Todd Gochenour Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries. > For example, if you have a single dom.dbx file, can you write to that from multiple threads at the same time safely? If you can't, then it doesn't make much sense to write-lock at the collection level, right? The bigger picture is much more complex. You have to talk about transactions, recovery, caching and more. You do not write to dom.dbx directly. You write to the page cache. And there's more than just dom.dbx. Writing the actual data just takes a small part of the overall indexing time. More time is spent with indexing, maintaining the transaction log etc. From my point of view, the next step in any redesign effort should be to remove the collection locks entirely. We have discussed this before. It will greatly simplify the locking and transaction log. My roadmap roughly looks like this: 1) remove collection dependency from core indexes: 1a) structural index, DONE 1b) range index, IN PROGESS 1c) remove document metadata from collection store and keep it separately. a collection is just a sequence of (arbitrary) document ids. a document can be linked to more than one collection. 2) drop all collection locks, except for the case where the collection metadata itself is modified Those steps have to be completed before we address other things. We need to simplify the architecture first, then try to do further redesigns. Any help will be welcome. As an added value, 1a and b will improve update/write performance in general. > Still, it seems like it would be nice to write to one document while querying against another document in the same collection... Normally, eXist will acquire a lock on the collection, acquire one on the document, release the collection lock, continue parsing the document. In some cases (node updates), the transaction handling has forced us to keep the lock on the collection longer than desired. But this can be changed (see above). > And this all has to be done in a system that currently does not support true transactional rollback (I think). eXist maintains a transaction log and does redo/undo on recovery. The only limitation is that the transaction log is incomplete, i.e. it does not cover any secondary indexes. Still transactional integrity has to be preserved and puts further requirements on the locking (this is the reason why collection locks are often not released early). Well, I will stop writing emails now and better explain everything on Thursday. Wolfgang |
From: Jason S. <js...@in...> - 2010-08-31 22:37:46
|
No, write some more on this. I have heard you talk about removing the collection locks, but I am not sure how you plan to replace their purpose. That would be good to understand before Thursday. If you remove collection locks, does that mean you are planning to lock at the resource level? Wouldn't that potentially lead to an awful lot of locks being taken? Am I misreading the idea? -----Original Message----- From: Wolfgang Meier [mailto:wol...@ex...] Sent: Tuesday, August 31, 2010 4:04 PM To: Jason Smith Cc: Adam Retter; Paul Ryan; eXist development; Michael J. Pelikan; Todd Gochenour Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries. > For example, if you have a single dom.dbx file, can you write to that from multiple threads at the same time safely? If you can't, then it doesn't make much sense to write-lock at the collection level, right? The bigger picture is much more complex. You have to talk about transactions, recovery, caching and more. You do not write to dom.dbx directly. You write to the page cache. And there's more than just dom.dbx. Writing the actual data just takes a small part of the overall indexing time. More time is spent with indexing, maintaining the transaction log etc. From my point of view, the next step in any redesign effort should be to remove the collection locks entirely. We have discussed this before. It will greatly simplify the locking and transaction log. My roadmap roughly looks like this: 1) remove collection dependency from core indexes: 1a) structural index, DONE 1b) range index, IN PROGESS 1c) remove document metadata from collection store and keep it separately. a collection is just a sequence of (arbitrary) document ids. a document can be linked to more than one collection. 2) drop all collection locks, except for the case where the collection metadata itself is modified Those steps have to be completed before we address other things. We need to simplify the architecture first, then try to do further redesigns. Any help will be welcome. As an added value, 1a and b will improve update/write performance in general. > Still, it seems like it would be nice to write to one document while querying against another document in the same collection... Normally, eXist will acquire a lock on the collection, acquire one on the document, release the collection lock, continue parsing the document. In some cases (node updates), the transaction handling has forced us to keep the lock on the collection longer than desired. But this can be changed (see above). > And this all has to be done in a system that currently does not support true transactional rollback (I think). eXist maintains a transaction log and does redo/undo on recovery. The only limitation is that the transaction log is incomplete, i.e. it does not cover any secondary indexes. Still transactional integrity has to be preserved and puts further requirements on the locking (this is the reason why collection locks are often not released early). Well, I will stop writing emails now and better explain everything on Thursday. Wolfgang |
From: Wolfgang M. <wol...@ex...> - 2010-08-31 22:04:28
|
> For example, if you have a single dom.dbx file, can you write to that from multiple threads at the same time safely? If you can't, then it doesn't make much sense to write-lock at the collection level, right? The bigger picture is much more complex. You have to talk about transactions, recovery, caching and more. You do not write to dom.dbx directly. You write to the page cache. And there's more than just dom.dbx. Writing the actual data just takes a small part of the overall indexing time. More time is spent with indexing, maintaining the transaction log etc. >From my point of view, the next step in any redesign effort should be to remove the collection locks entirely. We have discussed this before. It will greatly simplify the locking and transaction log. My roadmap roughly looks like this: 1) remove collection dependency from core indexes: 1a) structural index, DONE 1b) range index, IN PROGESS 1c) remove document metadata from collection store and keep it separately. a collection is just a sequence of (arbitrary) document ids. a document can be linked to more than one collection. 2) drop all collection locks, except for the case where the collection metadata itself is modified Those steps have to be completed before we address other things. We need to simplify the architecture first, then try to do further redesigns. Any help will be welcome. As an added value, 1a and b will improve update/write performance in general. > Still, it seems like it would be nice to write to one document while querying against another document in the same collection... Normally, eXist will acquire a lock on the collection, acquire one on the document, release the collection lock, continue parsing the document. In some cases (node updates), the transaction handling has forced us to keep the lock on the collection longer than desired. But this can be changed (see above). > And this all has to be done in a system that currently does not support true transactional rollback (I think). eXist maintains a transaction log and does redo/undo on recovery. The only limitation is that the transaction log is incomplete, i.e. it does not cover any secondary indexes. Still transactional integrity has to be preserved and puts further requirements on the locking (this is the reason why collection locks are often not released early). Well, I will stop writing emails now and better explain everything on Thursday. Wolfgang |
From: Jason S. <js...@in...> - 2010-08-31 21:30:55
|
There is also dimdim. I've used that one before. Preferences? It would be nice to have a whiteboard app. From: Loren Cahlander [mailto:lor...@gm...] Sent: Tuesday, August 31, 2010 2:49 PM To: Dannes Wessels Cc: Loren Cahlander; Jason Smith; Wolfgang Meier; eXist development; Michael J. Pelikan; Todd Gochenour; Paul Ryan Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries. How about Google Docs? http://docs.google.com On Aug 31, 2010, at 03:41 PM, Dannes Wessels wrote: We should probably use http://www.doodle.com/ for these kind of international meetings :-) On 31 Aug 2010, at 22:36 , Jason Smith wrote: Kind of late for Wolfgang though. Kind regards Dannes -- eXist-db Native XML Database - http://exist-db.org<http://exist-db.org/> Join us on linked-in: http://www.linkedin.com/groups?gid=35624 |
From: Jason S. <js...@in...> - 2010-08-31 21:23:32
|
I think we need to talk about how much granularity is actually valuable. For example, if you have a single dom.dbx file, can you write to that from multiple threads at the same time safely? If you can't, then it doesn't make much sense to write-lock at the collection level, right? Also, if you have a maximally granular locking mechanism, allowing locks on collections and resources, both deep locks and shallow locks, with multiple readers and a single writer allowed on each, the deadlock detection gets really complex. Performance on deadlock detection can blow up. Still, it seems like it would be nice to write to one document while querying against another document in the same collection... How the collections are implemented underneath isn't that important to the locking mechanism, other than this affects how much concurrency that you can actually take advantage of. I think, though, that you **need to have the locking mechanism in place pretty early on.** If you design a wonderfully concurrent back end, but you control access though a global mutex, well, what have you got? :-) Plus, in any non-trivial locking mechanism, there will be deadlocks and deadlock detection-and-recovery. And the software, as a whole, has to use the standards for detection-and-recovery if you want to be able to take advantage of the more concurrent locking. And this all has to be done in a system that currently does not support true transactional rollback (I think). Which means there are some additional rules when it comes to write locking... Too much information. I'll write something up for Thursday, and hopefully all this stuff will become more clear. This is not an easy topic for anyone, including myself! -----Original Message----- From: Adam Retter [mailto:ad...@ex...] Sent: Tuesday, August 31, 2010 6:42 AM To: Wolfgang Meier Cc: Jason Smith; Paul Ryan; eXist development; Michael J. Pelikan; Todd Gochenour Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries. > Otherwise, eXist just avoids locking multiple collections at once > wherever possible as it is known to be an expensive operation and > limits concurrency. We have discussed several times replacing eXist-db's current collection mechanism with a virtualised implementation where Collections are just another number in the system. This was discussed for the purposes of performance when large collections are involved. Would this simplify the overal problem domain? If so, perhaps this work should be undertaken before a redesign of the locking system? >> The second replacement I've come up with allows two read queries to run >> simultaneously, even when they target the same collection, and when multiple collections >> are used simultaneously. > > As I said before, I welcome any exploration in this area. As James > just suggested, we may want to have a skype telecon on this to discuss > the possibilities and dangers. Skype teleconference would be good :-) -- Adam Retter eXist Developer { United Kingdom } ad...@ex... irc://irc.freenode.net/existdb |
From: Adam R. <ad...@ex...> - 2010-08-31 21:03:41
|
Anything that can be opened by either OpenOffice or a Web Browser should be fine - that includes the de-facto standard MS office formats... On 31 August 2010 21:36, Jason Smith <js...@in...> wrote: > I'd like to prepare some materials. What is your preferred format? MS Office, Open Office, Maven XDoc... ? I will try to avoid PowerPoint. > > -----Original Message----- > From: Wolfgang Meier [mailto:wol...@ex...] > Sent: Tuesday, August 31, 2010 2:32 PM > To: Jason Smith > Cc: Adam Retter; Dmitriy Shabanov; eXist development; Paul Ryan; Michael J. Pelikan; Todd Gochenour; loren > Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries. > >> That will work too. Kind of late for Wolfgang though. > > No, that's fine. > > Wolfgang > -- Adam Retter eXist Developer { United Kingdom } ad...@ex... irc://irc.freenode.net/existdb |
From: Jason S. <js...@in...> - 2010-08-31 20:55:30
|
The reason I have been sort of a pain in the tuckus about this is not to get a quick fix, but to get this on the roadmap. :-) I definitely don't want to lose it. -----Original Message----- From: Adam Retter [mailto:ad...@ex...] Sent: Tuesday, August 31, 2010 6:47 AM To: Jason Smith Cc: Wolfgang Meier; Paul Ryan; eXist development; Michael J. Pelikan; Todd Gochenour Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries. > Again, I would not expect a final solution to be implemented the way I have done it. I just want to be able to show some possibilities. I am really excited to see such enthusiasm and knowledge of the eXist-db internals and suggestions for improvements that could be made. > I am not expecting this conversation to take place quickly. This is one of those "long-haul" things - perhaps over years, and several major releases. I'm not in a hurry to do the wrong thing quickly!!! > Lets not loose this though, I think that any performance and scalability enhancements are great. And actually there is a flip side here that no one has mentioned: Whilst everyone has said that you should have optimised queries and appropriate indexes, which is invariably true for production systems, what about new users and developers who come to eXist-db? As a new user of eXist-db you dont necessarily understand about optimising your queries or even how to create the correct indexes. Not all eXist-db users are software developers! It would be great if these new users saw great performance from the start, even if they havent set up indexes and their queries are doing full scans of the dbx files :-) |
From: Dannes W. <da...@ex...> - 2010-08-31 20:41:36
|
We should probably use http://www.doodle.com/ for these kind of international meetings :-) On 31 Aug 2010, at 22:36 , Jason Smith wrote: >> Kind of late for Wolfgang though. Kind regards Dannes -- eXist-db Native XML Database - http://exist-db.org Join us on linked-in: http://www.linkedin.com/groups?gid=35624 |
From: Jason S. <js...@in...> - 2010-08-31 20:37:20
|
I'd like to prepare some materials. What is your preferred format? MS Office, Open Office, Maven XDoc... ? I will try to avoid PowerPoint. -----Original Message----- From: Wolfgang Meier [mailto:wol...@ex...] Sent: Tuesday, August 31, 2010 2:32 PM To: Jason Smith Cc: Adam Retter; Dmitriy Shabanov; eXist development; Paul Ryan; Michael J. Pelikan; Todd Gochenour; loren Subject: Re: [Exist-development] [Exist-open] Performance of concurrent read queries. > That will work too. Kind of late for Wolfgang though. No, that's fine. Wolfgang |