exist-development Mailing List for eXist-db (Page 121)

eXist-db is a feature rich Open Source native XML database

Brought to you by: deliriumsky, dizzzz, windauer, wolfgang_m

exist-development — eXist Developer's List

You can subscribe to this list here.

2009	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (22)	Nov (85)	Dec (20)
2010	Jan (47)	Feb (127)	Mar (268)	Apr (78)	May (47)	Jun (38)	Jul (131)	Aug (221)	Sep (187)	Oct (54)	Nov (111)	Dec (84)
2011	Jan (152)	Feb (106)	Mar (94)	Apr (90)	May (53)	Jun (20)	Jul (24)	Aug (37)	Sep (32)	Oct (70)	Nov (22)	Dec (15)
2012	Jan (33)	Feb (110)	Mar (24)	Apr (1)	May (11)	Jun (8)	Jul (12)	Aug (37)	Sep (39)	Oct (81)	Nov (38)	Dec (50)
2013	Jan (23)	Feb (53)	Mar (23)	Apr (5)	May (19)	Jun (16)	Jul (16)	Aug (9)	Sep (21)	Oct (1)	Nov (2)	Dec (8)
2014	Jan (16)	Feb (6)	Mar (27)	Apr (1)	May (10)	Jun (1)	Jul (4)	Aug (10)	Sep (19)	Oct (22)	Nov (4)	Dec (6)
2015	Jan (3)	Feb (6)	Mar (9)	Apr	May (11)	Jun (23)	Jul (14)	Aug (10)	Sep (10)	Oct (9)	Nov (18)	Dec (4)
2016	Jan (5)	Feb (5)	Mar	Apr (2)	May (15)	Jun (2)	Jul (8)	Aug (2)	Sep (6)	Oct	Nov	Dec
2017	Jan (2)	Feb (12)	Mar (22)	Apr (6)	May	Jun	Jul (1)	Aug (1)	Sep (5)	Oct (2)	Nov	Dec
2018	Jan (2)	Feb	Mar	Apr	May	Jun (5)	Jul (3)	Aug	Sep (7)	Oct (19)	Nov	Dec
2021	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (1)	Nov	Dec
2022	Jan	Feb	Mar	Apr	May	Jun (3)	Jul	Aug	Sep	Oct	Nov	Dec

Flat | Threaded

<< < 1 .. 119 120 121 122 123 .. 128 > >> (Page 121 of 128)

Re: [Exist-development] Found it....When did xmldb:store() change?

From: Adam R. <ad...@ex...> - 2010-02-02 10:20:47

Or you could call util:parse to convert your xs:string into a node()

On 2 February 2010 01:00, Andrzej Jan Taramina <an...@ch...> wrote:
> It used to be that if you did an:
>
>        xmldb:store( $col, $doc, $content )
>
> where $content was a string, but an XML string, store() would store the document as XML, not as a binary document.
>
> Now it seems that if you do such a store, even though the $content string is valid XML it now stores the document as a
> binary document.
>
> Looks like Gev checked in a change on the 30th that broke the existing behaviour.
>
> I'm not convinced that the prior behaviour was a bug when you were passing in a string that was in fact XML.  But I've
> changed my code to pass in a mimetype parameter in our situation to resolve this.
>
> --
> Andrzej Taramina
> Chaeron Corporation: Enterprise System Solutions
> http://www.chaeron.com
>
> ------------------------------------------------------------------------------
> The Planet: dedicated and managed hosting, cloud storage, colocation
> Stay online with enterprise data centers and the best network in the business
> Choose flexible plans and management services without long-term contracts
> Personal 24x7 support from experience hosting pros just a phone call away.
> http://p.sf.net/sfu/theplanet-com
> _______________________________________________
> Exist-development mailing list
> Exi...@li...
> https://lists.sourceforge.net/lists/listinfo/exist-development
>



-- 
Adam Retter

eXist Developer
{ United Kingdom }
ad...@ex...
irc://irc.freenode.net/existdb

Re: [Exist-development] [Exist-commits] Context attributes....please vote!

From: Adam R. <ad...@ex...> - 2010-02-02 09:58:52

On 1 February 2010 19:34, Thomas White <tho...@gm...> wrote:
> I hope this will be taken in the best possible way.
>
> I could never understand  why a functionality that is used by some
> users could be proposed to be removed by other users either who don't use it
> or don't like it.
>
> If something have got a momentum it should stay. It is there, it does not
> require any additional resources and brings other routes to more solutions.

This is not how good software is built, but more importantly its not
how it is maintained. We have had plenty of contributions in the past,
some of excellent quality and some of not. But if we just allowed
anyone to add anything we would end up with a mess, everyone thinks
eXist should be something different, at present it is a lot of
different things to different people - but the core team tries not to
dilute the product too much. Maintainability is also a huge issue for
us. Basically we like features to be discussed by the core team.

> Richer functionalities allow more developers to find their own specific
> way in resolving their own specific problems.
> There could never be a single best way of dealing with any kind of problems.
>
> I strongly vote for 1).
>
> Kind regards,
> Thomas
>
> ------
>
> Thomas White
>
>
>>
>> Alternatives:
>>
>> 1) Leave it in and see if the world collapses
>> 2) Remove it completely
>> 3) Move it to it's own extension module.
>>
>> My vote is for option #3

-- 
Adam Retter

eXist Developer
{ United Kingdom }
ad...@ex...
irc://irc.freenode.net/existdb

Re: [Exist-development] Log file shrinkage and problems with database restarts....

From: Wolfgang M. <wol...@ex...> - 2010-02-02 09:07:33

Ok, I confirm there's an issue in SVN trunk which causes the shutdown
process to hang and blocks any checkpoints. I can reproduce it on one
machine. If you are using trunk, don't update just now...

Wolfgang

Re: [Exist-development] [Exist-commits] Context attributes....please vote!

From: Thomas W. <tho...@gm...> - 2010-02-02 08:32:12

Sorry for the confusion. You are absolutely right.

Thomas
-----------------
On 1 February 2010 20:14, Wolfgang Meier <wol...@ex...> wrote:

> > I could never understand  why a functionality that is used by some
> > users could be proposed to be removed by other users either who don't use
> it
> > or don't like it.
>
> Wait, you miss the point here. Andrzej's was 1) a newly committed
> feature, 2) a change to a core module. Ideally every commit to the
> repository should be reviewed by more than one person and this is what
> happened in this case. Two people objected to the commit and commented
> on it. That's a very positive sign, even if you disagree with them. A
> commit can be criticized and sometimes even rolled back. That's what
> you have SVN for.
>
> The more people watch and comment on the commits list, the better.
>
> Wolfgang
>

Re: [Exist-development] Log file shrinkage and problems with database restarts....

From: Wolfgang M. <wol...@ex...> - 2010-02-02 08:17:33

The log file should be cleared by the next checkpoint, which should at
least occur during shutdown, if not before. It definitely looks like
you have a hanging transaction somewhere in the background?

Can you get a stack trace from Java to see which thread is hanging? I
made some changes to the shutdown listeners yesterday, and I wonder if
they are causing the issue - though those changes were limited to
jetty only...

Wolfgang

[Exist-development] Log file shrinkage and problems with database restarts....

From: Andrzej J. T. <an...@ch...> - 2010-02-02 02:50:45

More on this issue:

On the recovery I get the following exceptions in the exist log:

2010-02-01 20:40:01,993 [main] WARN  (RecoveryManager.java [doRecovery]:223) - Exception caught while redoing
transactions. Aborting recovery.
java.lang.ArrayIndexOutOfBoundsException
	at java.lang.System.arraycopy(Native Method)
	at org.exist.util.FixedByteArray.copyTo(FixedByteArray.java:41)
	at org.exist.storage.index.BFile.storeValueHelper(BFile.java:1353)
	at org.exist.storage.index.BFile.redoStoreValue(BFile.java:1044)
	at org.exist.storage.index.StoreValueLoggable.redo(StoreValueLoggable.java:94)
	at org.exist.storage.recovery.RecoveryManager.doRecovery(RecoveryManager.java:217)
	at org.exist.storage.recovery.RecoveryManager.recover(RecoveryManager.java:154)
	at org.exist.storage.txn.TransactionManager.runRecovery(TransactionManager.java:137)
	at org.exist.storage.BrokerPool.initialize(BrokerPool.java:793)
	at org.exist.storage.BrokerPool.<init>(BrokerPool.java:660)
	at org.exist.storage.BrokerPool.configure(BrokerPool.java:216)
	at org.exist.storage.BrokerPool.configure(BrokerPool.java:188)
	at org.exist.http.servlets.EXistServlet.startup(EXistServlet.java:675)
	at org.exist.http.servlets.EXistServlet.init(EXistServlet.java:116)
	at org.apache.catalina.core.StandardWrapper.loadServlet(StandardWrapper.java:1173)
	at org.apache.catalina.core.StandardWrapper.load(StandardWrapper.java:993)
	at org.apache.catalina.core.StandardContext.loadOnStartup(StandardContext.java:4149)
	at org.apache.catalina.core.StandardContext.start(StandardContext.java:4458)
	at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
	at org.apache.catalina.core.StandardHost.start(StandardHost.java:722)
	at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
	at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
	at org.apache.catalina.core.StandardService.start(StandardService.java:516)
	at org.apache.catalina.core.StandardServer.start(StandardServer.java:710)
	at org.apache.catalina.startup.Catalina.start(Catalina.java:583)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
	at java.lang.reflect.Method.invoke(Unknown Source)
	at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288)
	at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413)
2010-02-01 20:40:02,009 [main] WARN  (RecoveryManager.java [doRecovery]:225) - Log entry that caused the exception: [0,
1389117] org.exist.storage.index.StoreValueLoggable  [BFile] - stored value with tid 1 on page 2
2010-02-01 20:40:02,009 [main] INFO  (RecoveryManager.java [doRecovery]:228) - Redo processed 89 out of 26721 transactions.


And then the database is unavailable.

Very frustrating, not being able to reload a clean database.

Any ideas what is causing this?

Are there some long running tasks that are not completing after my store of 25K documents? Or are there some shutdown
hooks that were put in Jetty, but aren't being executed in Tomcat?

Why would the recovery fail?  Then again, I'm puzzled why I'm even in recovery mode in the first place.

Help!

Thx!

-- 
Andrzej Taramina
Chaeron Corporation: Enterprise System Solutions
http://www.chaeron.com

[Exist-development] Log file shrinkage and problems with database restarts....

From: Andrzej J. T. <an...@ch...> - 2010-02-02 02:27:17

Just rebuilt my application eXist database from scratch, and then put 25K xml documents into it. Latest trunk from SVN.

I seem to have a 6GB 0000000000.log file in my data directory after the storing of the 25K xml documents.

When does this log file get shrunk?

Is there any way to force it to be sync'ed and shrunk?

So I left the database alone for about an hour after the reload.  And then shut it down (2 minute delay on shutdown)
using a normal Tomcat shutdown.

But now when I restart the database I get the long running "Scanning journal [=====..." task as the database starts up
in the Tomcat container.

What's with this massive journal scanning on restart?

It's like my transactions never committed and so it's reprocessing the monster 6GB log file.

Never used to do this....back a month or so ago.

Some help in explaining what might be going on would be appreciated.

Thanks!

-- 
Andrzej Taramina
Chaeron Corporation: Enterprise System Solutions
http://www.chaeron.com

[Exist-development] Found it....When did xmldb:store() change?

From: Andrzej J. T. <an...@ch...> - 2010-02-02 01:01:14

It used to be that if you did an:

	xmldb:store( $col, $doc, $content )

where $content was a string, but an XML string, store() would store the document as XML, not as a binary document.

Now it seems that if you do such a store, even though the $content string is valid XML it now stores the document as a
binary document.

Looks like Gev checked in a change on the 30th that broke the existing behaviour.

I'm not convinced that the prior behaviour was a bug when you were passing in a string that was in fact XML.  But I've
changed my code to pass in a mimetype parameter in our situation to resolve this.

-- 
Andrzej Taramina
Chaeron Corporation: Enterprise System Solutions
http://www.chaeron.com

Re: [Exist-development] [Exist-commits] Context attributes....please vote!

From: Andrzej J. T. <an...@ch...> - 2010-02-01 23:22:21

Loren:

> We have talked about this at our office.  We are discussing hosting an eXist users conference in Minneapolis. How
> many people would be interested in an eXist user group meeting in Minneapolis this summer?

Count me in!

In fact, I'll offer to do a detailed presentation and demo of what we've been up to in the healthcare space.

-- 
Andrzej Taramina
Chaeron Corporation: Enterprise System Solutions
http://www.chaeron.com

[Exist-development] When did xmldb:store() change?

From: Andrzej J. T. <an...@ch...> - 2010-02-01 23:17:14

It used to be that if you did an:

	xmldb:store( $col, $doc, $content )

where $content was a string, but an XML string, store() would store the document as XML, not as a binary document.

Now it seems that if you do such a store, even though the $content string is valid XML it now stores the document as a
binary document.

When did that behaviour change?  And why?

Thx!

-- 
Andrzej Taramina
Chaeron Corporation: Enterprise System Solutions
http://www.chaeron.com

Re: [Exist-development] [Exist-commits] Context attributes....please vote!

From: Andrzej J. T. <an...@ch...> - 2010-02-01 21:10:32

Wolfgang said:

> Wait, you miss the point here. Andrzej's was 1) a newly committed
> feature, 2) a change to a core module. Ideally every commit to the
> repository should be reviewed by more than one person and this is what
> happened in this case. Two people objected to the commit and commented
> on it. That's a very positive sign, even if you disagree with them. A
> commit can be criticized and sometimes even rolled back. That's what
> you have SVN for.
> 
> The more people watch and comment on the commits list, the better.

I heartily agree!  This makes eXist stronger!

Even though it was one of my features that was being commented on, and it has fostered a lively and interesting
discussion, I welcome the input, and as a result have made it an optional extension module for the moment (though it
still might disappear down the road if it's really not required).


-- 
Andrzej Taramina
Chaeron Corporation: Enterprise System Solutions
http://www.chaeron.com

Re: [Exist-development] [Exist-commits] Context attributes....please vote!

From: Wolfgang M. <wol...@ex...> - 2010-02-01 20:52:05

> Sure wish I could make it to Prague. I heard rumours of an eXist gathering in North America this year...?

Sure, I would like the idea and I think we should organize one.
Probably something to talk about in Prague as well.

Wolfgang

Re: [Exist-development] [Exist-commits] Context attributes....please vote!

From: Loren C. <lor...@gm...> - 2010-02-01 20:45:58

We have talked about this at our office.  We are discussing hosting an eXist users conference in Minneapolis. How many people would be interested in an eXist user group meeting in Minneapolis this summer?


On Feb 1, 2010, at 02:18 PM, Andrzej Jan Taramina wrote:

> Sure wish I could make it to Prague. I heard rumours of an eXist gathering in North America this year...?
> 
> -- 
> Andrzej Taramina
> Chaeron Corporation: Enterprise System Solutions
> http://www.chaeron.com
> 
> ------------------------------------------------------------------------------
> The Planet: dedicated and managed hosting, cloud storage, colocation
> Stay online with enterprise data centers and the best network in the business
> Choose flexible plans and management services without long-term contracts
> Personal 24x7 support from experience hosting pros just a phone call away.
> http://p.sf.net/sfu/theplanet-com
> _______________________________________________
> Exist-development mailing list
> Exi...@li...
> https://lists.sourceforge.net/lists/listinfo/exist-development

Re: [Exist-development] [Exist-commits] SF.net SVN: exist:[11099] trunk/eXist/src/org/exist/xquery/functions/util

From: Andrzej J. T. <an...@ch...> - 2010-02-01 20:44:03

Wolfgang:

> Andrzej: maybe you could provide a concrete example for which you
> would need your extensions. This would make it easier for other users
> to follow.

Here's a rough outline of what we've built....

We've created a generic analytics reporting module, where you specify what analytical functions you want run, input
variables with defaults and relationships between hierarchical metrics using a declarative XML specification. That is,
you define your report and analysis in XML. No procedural code per se.

There is an XQuery called request.xql which will read all report definitions stored in the database, and will generate a
user interface in HTML allowing the user to request execution of one of the reports.  It reads the xml definition,
creates required input parameters and the like, and will do a post with all this, including the id of the selected report.

The post is handled by render.xql.  The issue we ran into is that some of the reports require custom xquery code which
is fairly extensive, and others use common code. In all cases, the user can elect to receive the output in raw xml
format (for submission to upstream systems perhaps) or in a human readable HTML format.  The HTML format is created from
the xml output as a precursor step.

So the render.xql has a dispatcher table which lists, for each report definition in the database, what .xqm module to
load dynamically, which function to call to generate the xml and which function to call to generate html from the xml.

This allowed us to split one massive render xquery into multiple modules which can be dynamically imported at runtime,
depending on which report the user requested.  In a subsequent step, we'll be splitting each report .xqm module into one
to generate xml and the other to generate html, to further separate things.  All of these modules are rather
large...these are extremely complicated reports which use dynamic analytics against healthcare data.

So...if you as a user requested report "A" with human readable HTML output, render.xql would look up the the "A" module
document name (say a-report.xqm or some such), along with the prefix and namespace the module uses, and would load this
module dynamically.  Then it would look up the function to call to generate the XML and will use eval() to call that
function.  The function will return the xml as a result back to request.xql, which will then look up the function to
call to generate the final HTML, and will call that function using eval() passing it the xml results.

The complicating factor is that some reports can return more than one xml document as a result, and so we needed the xml
generator to do something like this:

	return <xml>
		 <result1>{ $result1 }</result1>
		 <result2>{ $result2 }</result2>
	       </xml>

from the xml generation function and then the html renderer needs to then index into the two result values. That is a
very slow and expensive (not to mention memory consuming) operation since it's using in-memory fragments to create the
result document, when we already have the two results available.

For us, it would be a lot more efficient if we did something like:

	let $set1 := context:set-context-attribute( "result1", $result1 )
	let $set2 := context:set-context-attribute( "result2", $result2 )

in the xml function, and just did the following to get the data back in the html function:

	let $get1 := context:set-context-attribute( "result1" )
	let $get2 := context:set-context-attribute( "result2" )

rather than go to the time and expense of building the composite xml document.

This is all because we use render.xql as a dynamic dispatcher which dynamically calls and/or chains functions together
from modules that were dynamically loaded, depending on what report the user wanted to run.

Like I said, the composite XML document approach works fine, it's just a bit slow is all, so we don't really "need" the
context attributes. But they will come in handy. ;-)


> You both provided valid arguments. It's a difficult question and I'm
> not yet sure towards which side I tend (in particular since I have a
> terrible cold which currently limits my thinking).

Sorry to hear that.

> Adam is certainly
> right in that functional languages should avoid side-effects wherever
> possible. On the other hand, nearly all functional languages violate
> this principle and provide a way to set variables with side effects.
> In the XQuery world, the XQueryP extension introduces a "set
> $variable" expression
> (http://www.flworfound.org/pubs/Dana-XML-2006.pdf). MarkLogic has a
> dictionary data type, whose key/value pairs can be set from anywhere
> within a query! Apparently, users like this feature, though it
> encourages them to stick to procedural style instead of writing
> functional code.

I think that in the real world, as you said, all functional languages have to violate the principle, to get any "real
work" done and to integrate with external systems.

> On the other hand, I guess there are situations in which I wished I
> could set a global variable, e.g. to pass information between modules.

The newish cache extension module would work nicely for that. I've used it to generate and cache expensive lookup tables
at startup, that take many minutes, and would not be very user friendly to generate in response to a user request.

> Sometimes my modules need to save state between function calls. Sure,
> you could use some XML fragment to save the state and pass it back and
> forth between caller and module. This leads to complex function calls
> though, which are irritating to the user. Fortunately, a module can
> always store state into the database by saving an XML fragment.
> However, this is no different from setting session variables or
> whatever: it again introduces side effects.

That is my point. Trying to support a pure functional approach just doesn't work with real world applications, as all
functional language implementors have discovered.

> I think it is difficult to use XQuery for complex real-world
> application without allowing some functions to have side effects
> (storing a document).

I would venture to say it would be pretty much impossible. And even if it was possible technically, a pure functional
approach would turn off the majority of potential users. You don't see much Haskel used for just this reason.

My leanings are towards a more pragmatic approach.

> But we have to be very cautious. I rejected the
> idea to provide a dictionary data type, because it introduces side
> effects through the backdoor. From this point of view, I tend to
> prefer the more radical approach of a  "set" operation. If a function
> has side effects or not should at least be obvious.

I agree with you, though the cache extension module is very much like a dictionary data type, albeit at a global level.


-- 
Andrzej Taramina
Chaeron Corporation: Enterprise System Solutions
http://www.chaeron.com

Re: [Exist-development] [Exist-commits] Context attributes....please vote!

From: Andrzej J. T. <an...@ch...> - 2010-02-01 20:19:03

Wolfgang:

> Since I need more time to think about it, I'd suggest to move it to
> its own extension module for now. We can further discuss the issue in
> Prague (unfortunately you won't be there, but I think I understand
> your point sufficiently well). Maybe we can come up with a clean
> solution which is acceptable to everyone.

I have moved the functionality to it's own extension module (context), and have left this module disabled in conf.xml.
So a user will have to consciously turn it on to use it for the moment.

Sure wish I could make it to Prague. I heard rumours of an eXist gathering in North America this year...?

-- 
Andrzej Taramina
Chaeron Corporation: Enterprise System Solutions
http://www.chaeron.com

Re: [Exist-development] [Exist-commits] Context attributes....please vote!

From: Wolfgang M. <wol...@ex...> - 2010-02-01 20:14:55

> I could never understand  why a functionality that is used by some
> users could be proposed to be removed by other users either who don't use it
> or don't like it.

Wait, you miss the point here. Andrzej's was 1) a newly committed
feature, 2) a change to a core module. Ideally every commit to the
repository should be reviewed by more than one person and this is what
happened in this case. Two people objected to the commit and commented
on it. That's a very positive sign, even if you disagree with them. A
commit can be criticized and sometimes even rolled back. That's what
you have SVN for.

The more people watch and comment on the commits list, the better.

Wolfgang

Re: [Exist-development] [Exist-commits] Context attributes....please vote!

From: Thomas W. <tho...@gm...> - 2010-02-01 19:34:48

I hope this will be taken in the best possible way.

I could never understand  why a functionality that is used by some
users could be proposed to be removed by other users either who don't use it
or don't like it.

If something have got a momentum it should stay. It is there, it does not
require any additional resources and brings other routes to more solutions.

Richer functionalities allow more developers to find their own specific
way in resolving their own specific problems.
There could never be a single best way of dealing with any kind of problems.

I strongly vote for 1).

Kind regards,
Thomas


------

Thomas White



>
>
> Alternatives:
>
> 1) Leave it in and see if the world collapses
> 2) Remove it completely
> 3) Move it to it's own extension module.
>
> My vote is for option #3

Re: [Exist-development] [Exist-commits] Context attributes....please vote!

From: Wolfgang M. <wol...@ex...> - 2010-02-01 18:21:05

> Alternatives:
>
> 1) Leave it in and see if the world collapses
> 2) Remove it completely
> 3) Move it to it's own extension module.
>
> My vote is for option #3.

Since I need more time to think about it, I'd suggest to move it to
its own extension module for now. We can further discuss the issue in
Prague (unfortunately you won't be there, but I think I understand
your point sufficiently well). Maybe we can come up with a clean
solution which is acceptable to everyone.

Wolfgang

Re: [Exist-development] [Exist-commits] Context attributes....please vote!

From: Andrzej J. T. <an...@ch...> - 2010-02-01 17:29:41

Adam:

> It breaks then tenement of side-effect free XQuery functions.
> Admittedly we have some of these, but they are extension functions and
> very specialised.

Actually, we have a lot of these:  session, request, sql, ldap, email, cache and more!

So we've really moved away from the religiously functional world, and I see that as a good thing, since real-world
applications typically need to integrated with non-functional real-world systems like servlets, sql database, ldap
directories, email and more.

That is a far cry from "some of these". Obviously there is a requirement for some side-effect generating, non-functional
features in eXist. Quite a few of them in fact. ;-)

> The problem was that I had not really discarded the shackles of
> procedural programming and not yet really *really* got my head around
> functional programming elegance. I now believe that such a facility is
> completely unnecessary once you learn the tricks of XQuery programming
> - it does take time though.

You're getting borderline preachy, Adam! ;-)  I don't want to get into a religious war on the benefits of being purely
functional, since we're far from that already and I have no interest in an academic argument.

I know the tricks and value of functional programming....but sometimes the real world intrudes.

> Perhaps you can convince me that such a functionality is really
> nessecary, but I would need to understand the use case and why that
> cannot be met currently.

It's not strictly necessary...more a performance/convenience issue.

I'm happy to remove the two get/set functions from trunk, if the others vote that this should be the way we go. My code
runs fine by using in-memory fragment creation...it's just slow and a bit convoluted to do that, since I have to create
intermediate xml fragments that aggregate multiple xml results so that they can be passed around.  No big deal there
though, since it does work fine.

I may consider putting the functions into a private extension library though, if it buys me enough
performance/convenience at the moment, given the state of in-memory fragments.

> The request attributes are only designed for setting attributes on the
> underlying HttpServletRequest, they were never added for the purpose
> of storing and reusing data within a single XQuery. Neither should
> they be used for transporting data between XQueries. They are really
> for internal interaction of the eXist pipeline, we should come up with
> a better mechanism in future as this ties us too tightly to the Java
> Http Servlet specification.

I don't buy this argument at all. XQuery is a beautiful tool for writing complete, self-contained web applications that
run in a servlet container. Heck...we even support REST that way. To do that you need session-based persistence for
things like authentication and the like.

Regardless of the pragmatic vs theoretical arguments, which are likely to just start a religious war, the cat is out of
the bag.  These features are heavily used in production systems, and so we need to keep them around.

As for what they were intended for, that's irrelevant. If it's available users will use the features....whether they
have side-effects or not.

> The session attributes are just that, designed for persisting (ideally
> small) amounts of data across a users requests during their session.
> Again this should not be used for inter-xquery communication. Ideally
> sessions should be avoided where possible, this leads to greater
> scalability possibilities - there are plenty of articles about
> developing stateless web applications so I wont get into a discussion
> here.

Should, could, would!  If it's there, it will be used....and "abused" in some people's minds.  That's the reality.

> You should not misuse the request or session objects even if they are available.

If all you have is a hammer...everything looks like a nail! LOL

> I dont see why this would become a requirement then, you should still
> be passing data into your XQuery using a different mechanism not the
> context. Each executing instance of an XQuery typically has its own
> context and in a multi-threaded environment this is no different.

Good point.

> It sounds like you have a lot of complexity and issues here that need
> to be broken down into smaller problems and modularised. Perhaps if
> you could give a small specific example of why you need such context
> functionality then I could better understand your exacting
> requirements?

Time demands preclude me from stripping down many thousands of lines of code to create a demonstration case.
Sorry....that's my reality right now. Too many real-world deadlines.

As for the complexity and such, I've already refactored the code into much smaller, more maintainable, decoupled,
independent units as I had described, using a dispatcher paradigm. This has greatly simplified our code.

Don't forget, Adam, I've been writing code for nearly 4 decades, and contrary to some opinions, I do know what I'm doing
and have a long history of writing extremely modular/maintainable code!  Preaching to the converted ain't gonna win you
many points with me. ;-)

> Actually I think such a feature makes code much dirtyier as you cant
> see the flow of data through the system. You just magically stuff
> something into the context using some sort of label and then later
> extract it (assuming its still there). This is very bad in any
> language. 

Tell that to Google, who use distributed hashmaps all over the place (Map/Reduce is based on the concept). Same with
Hadoop.  memcached.  This is a proven, performant design pattern, that unfortunately does not fit nicely into the purist
functional approach.

I don't believe it is a bad thing, if used appropriately. In such cases it can clarify the data flows and make code simpler.

It's all down to the skill/expertise of the individual, and how "good" their code is. That is independent of functional,
distributed hashmap, procedural, declarative or any other approach used.  Purist approaches rarely succeed in the real
world.  Erlang and Lisp come to mind as good examples. LOL

> When I got rid of all of this functionality from a large
> system that I developed in XQuery, the code actually became much
> simpler, and more readable through the extensive refactoring that I
> performed.

That wouldn't be the case with my code. One day, I'll sit down with you and show you what we've got...when time allows.

> Well if you cant show us a concrete example then we definitely cant help you ;-)

True enough. Not that I don't want to...just that there isn't enough time at the moment.

> Perhaps you could show us something, a small exacting example, and we
> could help you refactor it? If we cant help you refactor it, then it
> shows that your use case is valid and that we do need a feature such
> as this.

How about I do some benchmarks and see if the performance gain over building large aggregate in-memory fragments is
worth the compromise? ;-)

The key point is that I am happy to remove the functionality, if the group as a whole votes that way.

Cast yer votes, maties!

Alternatives:

1) Leave it in and see if the world collapses
2) Remove it completely
3) Move it to it's own extension module.

My vote is for option #3.



-- 
Andrzej Taramina
Chaeron Corporation: Enterprise System Solutions
http://www.chaeron.com

[Exist-development] Fwd: Questions on packaging for new administration applications

From: Evgeny G. <gaz...@gm...> - 2010-01-27 23:44:09

about me: I'm use admin app and sandbox from fs, on the /exist/admin,
/exist/sandbox uris,
and own app stored in the db on the root "/" uri.

just we can store modules on db or in the fs. it's working as well for app,
but not working when we testing modules in admin panel.
for stored modules need use uri scheme "xmldb:exist:///db/....".
this is not a problem for queries, but is a problem for imported modules
which import another modules.

also I want suggest to "handle" some different dbs for one jetty/exist
instance.

and more today's "standalone" and "jetty" is very like.
if year ago the "jetty" mode used for cocoon, today
we have cocoon as extension.
so why we have two this modes?
suggest deprecate standalone mode (urlrewiter for one is not trivial) and
remove one later.

--------
Evgeny

Re: [Exist-development] new functionality: Execution Pipeline

From: Thomas W. <tho...@gm...> - 2010-01-26 13:29:18

I do believe, having asynchronous mechanism for fetching data in eXist will
give us foundation for many interesting new ideas.
I will need this functionality in about two months time and I really hope
somebody from the dev team will be interested in implementing the execution
pipeline.

Regards,
Thomas

------

Thomas White

Mobile:+44 7711 922 966
Skype: thomaswhite
gTalk: thomas.0007
Linked-In:http://www.linkedin.com/in/thomaswhite0007
facebook: http://www.facebook.com/thomas.0007



2010/1/26 Adam Retter <ad...@ex...>

> 2010/1/25 Thomas White <tho...@gm...>:
> > Adam,
> >
> > From my point of view, there are some important differences. Scheduling
> > functions can be used to execute a function asynchronously if the
> execution
> > time is set say 1 sec after the current time but this is where the
> > similarity ends.
> >
> > 1. So far all eXist functions are executed asynchronously (except
> scheduled
> > jobs and triggers). If I need to get data from say 25 or 250 remote
> sources
> > at the moment we will need to do it one data chunk at a time, one after
> > another. What about if we need to fetch 5000 RSS feeds?
> > We do need asynchronous commands.
>
> Ah okay, I think I understand now. Yes, it seems like an interesting
> feature to add to eXist.
>
> I guess we could add this onto the end of the roadmap if the other
> developers agree, however there is some work involved in this and as I
> am sure we are pretty busy.
>
> So how soon do you need this functionality?
>
> --
>  Adam Retter
>
> eXist Developer
> { United Kingdom }
> ad...@ex...
> irc://irc.freenode.net/existdb
>

Re: [Exist-development] new functionality: Execution Pipeline

From: Adam R. <ad...@ex...> - 2010-01-26 13:03:48

2010/1/26 Thomas White <tho...@gm...>:
> I do believe, having asynchronous mechanism for fetching data in eXist will
> give us foundation for many interesting new ideas.
> I will need this functionality in about two months time and I really hope
> somebody from the dev team will be interested in implementing the execution
> pipeline.

I think we will be very pushed for time to meet that deadline. The
roadmap for the next two versions is already laid out - although it is
subject to change.

Do you have any resources you could allocate to this effort?


> Regards,
> Thomas
>
> ------
>
> Thomas White
>
> Mobile:+44 7711 922 966
> Skype: thomaswhite
> gTalk: thomas.0007
> Linked-In:http://www.linkedin.com/in/thomaswhite0007
> facebook: http://www.facebook.com/thomas.0007
>
>
>
> 2010/1/26 Adam Retter <ad...@ex...>
>>
>> 2010/1/25 Thomas White <tho...@gm...>:
>> > Adam,
>> >
>> > From my point of view, there are some important differences. Scheduling
>> > functions can be used to execute a function asynchronously if the
>> > execution
>> > time is set say 1 sec after the current time but this is where the
>> > similarity ends.
>> >
>> > 1. So far all eXist functions are executed asynchronously (except
>> > scheduled
>> > jobs and triggers). If I need to get data from say 25 or 250 remote
>> > sources
>> > at the moment we will need to do it one data chunk at a time, one after
>> > another. What about if we need to fetch 5000 RSS feeds?
>> > We do need asynchronous commands.
>>
>> Ah okay, I think I understand now. Yes, it seems like an interesting
>> feature to add to eXist.
>>
>> I guess we could add this onto the end of the roadmap if the other
>> developers agree, however there is some work involved in this and as I
>> am sure we are pretty busy.
>>
>> So how soon do you need this functionality?
>>
>> --
>> Adam Retter
>>
>> eXist Developer
>> { United Kingdom }
>> ad...@ex...
>> irc://irc.freenode.net/existdb
>
>



-- 
Adam Retter

eXist Developer
{ United Kingdom }
ad...@ex...
irc://irc.freenode.net/existdb

Re: [Exist-development] new functionality: Execution Pipeline

From: Adam R. <ad...@ex...> - 2010-01-26 11:59:41

2010/1/25 Thomas White <tho...@gm...>:
> Adam,
>
> From my point of view, there are some important differences. Scheduling
> functions can be used to execute a function asynchronously if the execution
> time is set say 1 sec after the current time but this is where the
> similarity ends.
>
> 1. So far all eXist functions are executed asynchronously (except scheduled
> jobs and triggers). If I need to get data from say 25 or 250 remote sources
> at the moment we will need to do it one data chunk at a time, one after
> another. What about if we need to fetch 5000 RSS feeds?
> We do need asynchronous commands.

Ah okay, I think I understand now. Yes, it seems like an interesting
feature to add to eXist.

I guess we could add this onto the end of the roadmap if the other
developers agree, however there is some work involved in this and as I
am sure we are pretty busy.

So how soon do you need this functionality?


> 2.execute-before function - ability to conditionally execute a job with
> variable delay.
> 3. callback function - ability to call a specific code after the job is
> done.
> 4. ability to group jobs in batches, batch functions and batch callback
> function can provide very powerful way to perform an asynchronous final
> operation on a group of  asynchronous functions.
>
> So far we have been thinking synchronously only - we need all data on the
> server and we can produce the whole page at once. But it is getting much
> more interesting and much more powerful when we think asynchronously on the
> client - we can create and/or update partially many areas on the screen
> simultaneously. For example use case 3. federated search, on a web client.
> Imagine an application where the web client has to display results from 25
> servers where  each of them has different latency. Whenever any of the data
> becomes available - it is displayed without page refresh.
>
> Some will say just AJAX in action, but I will say it is going to a very
> different level. Why? Because we have progress update during the request and
> we have batch operations and especially ability to cancel whole batch of
> requests when needed.
>
> Let take an application where the use will see on a same screen results from
> 50 servers asynchronously. It is an application that gets quotes for car
> insurance and it takes between 5 and 45 seconds for different quotes to be
> completed.
>
> Case 1. Traditional AJAX approach - Every quote area on the screen has its
> own AJAX request for a specific quote provider. When the user clicks on the
> button, then:
>
> browser opens 50 TCP connections to the server until all quotes are received
> (Windows limits up to 2 simultaneous TCP connections to a domain, that can
> introduce additional delay).
> On the server  50 synchronous TCP connections to the quote providers.
> When every data chunk is recieved two TCP ports will be released on the
> server.
> On the server this user so far will occupy 100 TCP ports =
> 50(browser-server)+50( server - quote providers ).
>
> After 10 seconds we have recieved data for say 10 of the quotes and the user
> decides to amend something and press the "Get Quotes" button again. Then:
>
> The browser opens new 50 synchronous requests.
> On the server this user now will occupy 180 TCP ports = 50(new browser
> requests )+ 40(incompleted old browser requests)  +50( new server-quote
> providers)  + 40(incomplete old server-quote providers connections).
> There is nothing to cancel the old 40 connections to the browser add 40
> incomplete server-quote providers connections except connection timeout.
> As a result the server can get out of TCP ports very quickly and the data
> will be delivered slowly especially if more users click the quote button
> earlier. It is getting worse very quickly when the users press the quote
> button prematurely or when the users refresh the page. Scalability is
> severely effected by the user behavior and by the number of users.
>
> Case 2. We have batch of asynchronous quote requests on the server.
> There will be no long waiting requests on TCP connections on the client.
> "Get Quotes" button will call a XQuery and quickly receive the batchID and
> an initial estimated delay. For simplicity let say the client will call
> getBatchStatus every second, closing the TCP connection after receiving the
> data.When any of the jobs is complete, a quick call to fetch the data is
> made. Then:
>
> On the browser there are no long waiting opened TCP connections. We have
> quick fetch of the status and one call to fetch received data every second
> and then all TCP ports are closed.
> On the server, we have 50( server - quote providers ) TCP connections + 1
> every second from the browser, closed immediately + 1 connection for the
> received data, all quotes delivered in one call, closed immediately .
>
> Now when the user clicks the "Get Quotes" button earlier, then we first
> cancel all incomplete calls  to quote providers on the server by calling
> closeAll, releasing all TCP ports and then we make the new 50 requests. The
> result:
>
> On the browser still have a call or two every second. No hanging TCP ports,
> no timeouts.
> On the server we have exactly the same 50( server - quote providers ) TCP
> connections + 1 or 2 every second from the browser.
> The scalability of the server is not effected by the user behaviour at all
> and it can take much more users.
>
>
> 5. Use case 4 federated search, on a the server I believe is very
> important if we want to query more then one server in real time.
> This scenario can do a pretty good job for awhile, before the eXist real
> clustering is ready.
>
> I hope this explains your question.
>
> Regards,
> Thomas
>
> ------
>
> Thomas White
>
> Mobile:+44 7711 922 966
> Skype: thomaswhite
> gTalk: thomas.0007
> Linked-In:http://www.linkedin.com/in/thomaswhite0007
> facebook: http://www.facebook.com/thomas.0007
>
>
>
>
> 2010/1/25 Adam Retter <ad...@ex...>:
>> Can you not already do most (if not all) of this by Scheduling XQuery
>> jobs with eXist's Scheduler?
>>
>>
>> 2010/1/25 Thomas White <tho...@gm...>:
>>> I would like to propose a new functionality that I believe could be very
>>> beneficial for eXist users:
>>>
>>> Asynchronous Execution Pipeline
>>>
>>> This a mechanism for execution of number of asynchronous jobs
>>> simultaneously.  It is very useful for executing long running jobs or in
>>> cases where it is impossible to predict how long it will take to perform
>>> the
>>> operation. Every job will run as a separated thread and the jobID and the
>>> estimated delay will be returned immediately to the caller.
>>>
>>> Use cases:
>>>
>>> 1. Executing long running queries
>>>
>>> Callback function will be used to store the result, at a location
>>> according
>>> to the function-parameters.
>>> A client checking periodically the status of this job will take next
>>> action.
>>>
>>> 2. Fetching data from (large) number of remote URLs
>>>
>>> An XQuery or a scheduled job creates XX execution pipeline entries for
>>> each
>>> remote server.
>>> Callback functions are used to store the results, at a location according
>>> to
>>> the function-parameters.
>>> The batch callback function will combine the result and trigger the next
>>> action.
>>>
>>> 3. Federated search, on a web client
>>>
>>> A web client sends a search request to a local XQuery, that creates XX
>>> execution pipeline entries for each remote server and returns to the web
>>> client a batch-id.
>>> The web client queries the status for the jobs with this batch-id
>>> periodically and when some of the jobs has status 'completed', web client
>>> gets the result for this job and displays it on the screen
>>> asynchronously.
>>>
>>> 4. Federated search, on a the server
>>>
>>> A web client sends a search request to a local XQuery, that creates XX
>>> execution pipeline entries for each remote server and returns to the web
>>> client a batch-id.
>>> Every job callback function will save the result at a location according
>>> to
>>> the function-parameters. The batch callback function will combine the
>>> result.
>>> The web client queries the status for this batch periodically and when
>>> the
>>> batch is completed, web client gets the result and displays combined
>>> result
>>> set on the screen asynchronously.
>>>
>>> 5. Data Replication
>>>
>>> An XQuery or a scheduled job creates XX execution pipeline entries for
>>> each
>>> remote server.
>>> Execute-before function will identify what needs to be replicated.
>>> The main function does the replication.
>>> The batch callback function moves the replication marker.
>>>
>>> A call to the Execution Pipe Line:
>>>    execution-pipeline:addJob( function, function-parameters,
>>> pipeline-parameters )
>>>  returning :
>>>     handlerID, estimated-delay,  function-parameters
>>>
>>>
>>> To get the result we need to call another function:
>>>     execution-pipeline:getJobResults( handlerID, autoClose )
>>> returning either:
>>>     the result data set. if autoClose is true then close the job and
>>> release
>>> all used resources.
>>> or
>>>    same handlerID, new-estimated-delay,function -parameters
>>> or
>>>    unknown-handlerID error
>>>
>>> execution-pipeline:getJobStatus( handlerID )
>>> returns
>>>         status of the job, function-parameters for this job
>>>
>>> execution-pipeline:getBatchStatus(  batch-ID )
>>> returns
>>>         the status for all jobs from a particular batch ID.
>>>
>>>
>>> execution-pipeline:getStatus(  )
>>> returns
>>>         the status for all jobs.
>>>
>>>
>>> execution-pipeline:closeJob( handlerID )
>>> execution-pipeline:closeBatch( batchID )
>>> execution-pipeline:closeAll( )
>>>
>>>
>>> function-parameters:
>>>
>>> job-statistic-id: used to keep average time for execution of this
>>> function.
>>> average time= (previous-average-time + last-execution-time)/2. URL with
>>> specific parameters could be used as an ID.
>>> execute-before function: when provided, it will be called before calling
>>> the
>>> main function for this job. If the result is 0 then proceed with the main
>>> function, otherwise use the result as number of milliseconds to put this
>>> job
>>> to sleep and try later.
>>> callback function: when provided callback-function will be called as
>>> callback-function( handlerID, result, function-parameters ). if it
>>> returns
>>> true() the job will be closed.
>>> any other parameters that may be used by the callback function.
>>>
>>> pipeline-parameters:
>>>
>>> batch-ID - to group
>>> batch-callback-function: called when all jobs from the batch are
>>> completed.
>>> any other parameters that may be used by the callback function.
>>>
>>> Any comments?
>>>
>>> Thomas
>>>
>>>
>>> ------
>>>
>>> Thomas White
>>>
>>> Mobile:+44 7711 922 966
>>> Skype: thomaswhite
>>> gTalk: thomas.0007
>>> Linked-In:http://www.linkedin.com/in/thomaswhite0007
>>> facebook: http://www.facebook.com/thomas.0007
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Throughout its 18-year history, RSA Conference consistently attracts the
>>> world's best and brightest in the field, creating opportunities for
>>> Conference
>>> attendees to learn about information security's most important issues
>>> through
>>> interactions with peers, luminaries and emerging and established
>>> companies.
>>> http://p.sf.net/sfu/rsaconf-dev2dev
>>> _______________________________________________
>>> Exist-development mailing list
>>> Exi...@li...
>>> https://lists.sourceforge.net/lists/listinfo/exist-development
>>>
>>>
>>
>>
>>
>> --
>> Adam Retter
>>
>> eXist Developer
>> { United Kingdom }
>> ad...@ex...
>> irc://irc.freenode.net/existdb
>>
>
>



-- 
Adam Retter

eXist Developer
{ United Kingdom }
ad...@ex...
irc://irc.freenode.net/existdb

Re: [Exist-development] new functionality: Execution Pipeline

From: Thomas W. <tho...@gm...> - 2010-01-25 23:39:22

Adam,

>From my point of view, there are some important differences. Scheduling
functions can be used to execute a function asynchronously if the execution
time is set say 1 sec after the current time but this is where the
similarity ends.

1. So far all eXist functions are executed asynchronously (except scheduled
jobs and triggers). If I need to get data from say 25 or 250 remote sources
at the moment we will need to do it one data chunk at a time, one after
another. What about if we need to fetch 5000 RSS feeds?
*We do need asynchronous commands.*

2.*execute-before function* - ability to conditionally execute a job with
variable delay.
3. *callback function* - ability to call a specific code after the job is
done.
4. *ability to group jobs in batches*, batch functions and batch callback
function can provide very powerful way to perform an asynchronous final
operation on a group of  asynchronous functions.

So far we have been thinking synchronously only - we need all data on the
server and we can produce the whole page at once. But it is getting much
more interesting and much more powerful when we think asynchronously on the
client - we can create and/or update partially many areas on the screen
simultaneously. For example use case 3. federated search, on a web client.
Imagine an application where the web client has to display results from 25
servers where  each of them has different latency. Whenever any of the data
becomes available - it is displayed without page refresh.

Some will say just AJAX in action, but I will say it is going to a very
different level. Why? Because we have progress update during the request and
we have batch operations and especially ability to cancel whole batch of
requests when needed.

Let take an application where the use will see on a same screen results from
50 servers asynchronously. It is an application that gets quotes for car
insurance and it takes between 5 and 45 seconds for different quotes to be
completed.

*Case 1. Traditional AJAX approach* - Every quote area on the screen has its
own AJAX request for a specific quote provider. When the user clicks on the
button, then:

   - browser opens 50 TCP connections to the server until all quotes are
   received (Windows limits up to 2 simultaneous TCP connections to a domain,
   that can introduce additional delay).
   - On the server  50 synchronous TCP connections to the quote providers.
   When every data chunk is recieved two TCP ports will be released on the
   server.
   - On the server this user so far will occupy 100 TCP ports =
   50(browser-server)+50( server - quote providers ).

After 10 seconds we have recieved data for say 10 of the quotes and the user
decides to amend something and press the "Get Quotes" button again. Then:

   - The browser opens new 50 synchronous requests.
   - On the server this user now will occupy 180 TCP ports = 50(new browser
   requests )+ 40(incompleted old browser requests)  +50( new server-quote
   providers)  + 40(incomplete old server-quote providers connections).
   - There is nothing to cancel the old 40 connections to the browser add 40
   incomplete server-quote providers connections except connection timeout.
   - As a result the server can get out of TCP ports very quickly and the
   data will be delivered slowly especially if more users click the quote
   button earlier. It is getting worse very quickly when the users press the
   quote button prematurely or when the users refresh the page. Scalability is
   severely effected by the user behavior and by the number of users.

*Case 2. We have batch of asynchronous quote requests on the server.*
There will be no long waiting requests on TCP connections on the client.
"Get Quotes" button will call a XQuery and quickly receive the batchID and
an initial estimated delay. For simplicity let say the client will call
getBatchStatus every second, closing the TCP connection after receiving the
data.When any of the jobs is complete, a quick call to fetch the data is
made. Then:

   - On the browser there are no long waiting opened TCP connections. We
   have quick fetch of the status and one call to fetch received data every
   second and then all TCP ports are closed.
   - On the server, we have 50( server - quote providers ) TCP connections +
   1 every second from the browser, closed immediately + 1 connection for the
   received data, all quotes delivered in one call, closed immediately .

Now when the user clicks the "Get Quotes" button earlier, then we first
cancel all incomplete calls  to quote providers on the server by calling
closeAll, releasing all TCP ports and then we make the new 50 requests. The
result:

   - On the browser still have a call or two every second. No hanging TCP
   ports, no timeouts.
   - On the server we have exactly the same 50( server - quote providers )
   TCP connections + 1 or 2 every second from the browser.
   - The scalability of the server is not effected by the user behaviour at
   all and it can take much more users.


5. Use case 4 *federated search, on a the server* I believe is very
important if we want to query more then one server in real time.
This scenario can do a pretty good job for awhile, before the eXist real
clustering is ready.

I hope this explains your question.

Regards,
Thomas

------

Thomas White

Mobile:+44 7711 922 966
Skype: thomaswhite
gTalk: thomas.0007
Linked-In:http://www.linkedin.com/in/thomaswhite0007
facebook: http://www.facebook.com/thomas.0007




2010/1/25 Adam Retter <ad...@ex...>:
> Can you not already do most (if not all) of this by Scheduling XQuery
> jobs with eXist's Scheduler?
>
>
> 2010/1/25 Thomas White <tho...@gm...>:
>> I would like to propose a new functionality that I believe could be very
>> beneficial for eXist users:
>>
>> Asynchronous Execution Pipeline
>>
>> This a mechanism for execution of number of asynchronous jobs
>> simultaneously.  It is very useful for executing long running jobs or in
>> cases where it is impossible to predict how long it will take to perform
the
>> operation. Every job will run as a separated thread and the jobID and the
>> estimated delay will be returned immediately to the caller.
>>
>> Use cases:
>>
>> 1. Executing long running queries
>>
>> Callback function will be used to store the result, at a location
according
>> to the function-parameters.
>> A client checking periodically the status of this job will take next
action.
>>
>> 2. Fetching data from (large) number of remote URLs
>>
>> An XQuery or a scheduled job creates XX execution pipeline entries for
each
>> remote server.
>> Callback functions are used to store the results, at a location according
to
>> the function-parameters.
>> The batch callback function will combine the result and trigger the next
>> action.
>>
>> 3. Federated search, on a web client
>>
>> A web client sends a search request to a local XQuery, that creates XX
>> execution pipeline entries for each remote server and returns to the web
>> client a batch-id.
>> The web client queries the status for the jobs with this batch-id
>> periodically and when some of the jobs has status 'completed', web client
>> gets the result for this job and displays it on the screen
asynchronously.
>>
>> 4. Federated search, on a the server
>>
>> A web client sends a search request to a local XQuery, that creates XX
>> execution pipeline entries for each remote server and returns to the web
>> client a batch-id.
>> Every job callback function will save the result at a location according
to
>> the function-parameters. The batch callback function will combine the
>> result.
>> The web client queries the status for this batch periodically and when
the
>> batch is completed, web client gets the result and displays combined
result
>> set on the screen asynchronously.
>>
>> 5. Data Replication
>>
>> An XQuery or a scheduled job creates XX execution pipeline entries for
each
>> remote server.
>> Execute-before function will identify what needs to be replicated.
>> The main function does the replication.
>> The batch callback function moves the replication marker.
>>
>> A call to the Execution Pipe Line:
>>    execution-pipeline:addJob( function, function-parameters,
>> pipeline-parameters )
>>  returning :
>>     handlerID, estimated-delay,  function-parameters
>>
>>
>> To get the result we need to call another function:
>>     execution-pipeline:getJobResults( handlerID, autoClose )
>> returning either:
>>     the result data set. if autoClose is true then close the job and
release
>> all used resources.
>> or
>>    same handlerID, new-estimated-delay,function -parameters
>> or
>>    unknown-handlerID error
>>
>> execution-pipeline:getJobStatus( handlerID )
>> returns
>>         status of the job, function-parameters for this job
>>
>> execution-pipeline:getBatchStatus(  batch-ID )
>> returns
>>         the status for all jobs from a particular batch ID.
>>
>>
>> execution-pipeline:getStatus(  )
>> returns
>>         the status for all jobs.
>>
>>
>> execution-pipeline:closeJob( handlerID )
>> execution-pipeline:closeBatch( batchID )
>> execution-pipeline:closeAll( )
>>
>>
>> function-parameters:
>>
>> job-statistic-id: used to keep average time for execution of this
function.
>> average time= (previous-average-time + last-execution-time)/2. URL with
>> specific parameters could be used as an ID.
>> execute-before function: when provided, it will be called before calling
the
>> main function for this job. If the result is 0 then proceed with the main
>> function, otherwise use the result as number of milliseconds to put this
job
>> to sleep and try later.
>> callback function: when provided callback-function will be called as
>> callback-function( handlerID, result, function-parameters ). if it
returns
>> true() the job will be closed.
>> any other parameters that may be used by the callback function.
>>
>> pipeline-parameters:
>>
>> batch-ID - to group
>> batch-callback-function: called when all jobs from the batch are
completed.
>> any other parameters that may be used by the callback function.
>>
>> Any comments?
>>
>> Thomas
>>
>>
>> ------
>>
>> Thomas White
>>
>> Mobile:+44 7711 922 966
>> Skype: thomaswhite
>> gTalk: thomas.0007
>> Linked-In:http://www.linkedin.com/in/thomaswhite0007
>> facebook: http://www.facebook.com/thomas.0007
>>
>>
------------------------------------------------------------------------------
>> Throughout its 18-year history, RSA Conference consistently attracts the
>> world's best and brightest in the field, creating opportunities for
>> Conference
>> attendees to learn about information security's most important issues
>> through
>> interactions with peers, luminaries and emerging and established
companies.
>> http://p.sf.net/sfu/rsaconf-dev2dev
>> _______________________________________________
>> Exist-development mailing list
>> Exi...@li...
>> https://lists.sourceforge.net/lists/listinfo/exist-development
>>
>>
>
>
>
> --
> Adam Retter
>
> eXist Developer
> { United Kingdom }
> ad...@ex...
> irc://irc.freenode.net/existdb
>

Re: [Exist-development] new functionality: Execution Pipeline

From: Adam R. <ad...@ex...> - 2010-01-25 17:29:44

Can you not already do most (if not all) of this by Scheduling XQuery
jobs with eXist's Scheduler?


2010/1/25 Thomas White <tho...@gm...>:
> I would like to propose a new functionality that I believe could be very
> beneficial for eXist users:
>
> Asynchronous Execution Pipeline
>
> This a mechanism for execution of number of asynchronous jobs
> simultaneously.  It is very useful for executing long running jobs or in
> cases where it is impossible to predict how long it will take to perform the
> operation. Every job will run as a separated thread and the jobID and the
> estimated delay will be returned immediately to the caller.
>
> Use cases:
>
> 1. Executing long running queries
>
> Callback function will be used to store the result, at a location according
> to the function-parameters.
> A client checking periodically the status of this job will take next action.
>
> 2. Fetching data from (large) number of remote URLs
>
> An XQuery or a scheduled job creates XX execution pipeline entries for each
> remote server.
> Callback functions are used to store the results, at a location according to
> the function-parameters.
> The batch callback function will combine the result and trigger the next
> action.
>
> 3. Federated search, on a web client
>
> A web client sends a search request to a local XQuery, that creates XX
> execution pipeline entries for each remote server and returns to the web
> client a batch-id.
> The web client queries the status for the jobs with this batch-id
> periodically and when some of the jobs has status 'completed', web client
> gets the result for this job and displays it on the screen asynchronously.
>
> 4. Federated search, on a the server
>
> A web client sends a search request to a local XQuery, that creates XX
> execution pipeline entries for each remote server and returns to the web
> client a batch-id.
> Every job callback function will save the result at a location according to
> the function-parameters. The batch callback function will combine the
> result.
> The web client queries the status for this batch periodically and when the
> batch is completed, web client gets the result and displays combined result
> set on the screen asynchronously.
>
> 5. Data Replication
>
> An XQuery or a scheduled job creates XX execution pipeline entries for each
> remote server.
> Execute-before function will identify what needs to be replicated.
> The main function does the replication.
> The batch callback function moves the replication marker.
>
> A call to the Execution Pipe Line:
>    execution-pipeline:addJob( function, function-parameters,
> pipeline-parameters )
>  returning :
>     handlerID, estimated-delay,  function-parameters
>
>
> To get the result we need to call another function:
>     execution-pipeline:getJobResults( handlerID, autoClose )
> returning either:
>     the result data set. if autoClose is true then close the job and release
> all used resources.
> or
>    same handlerID, new-estimated-delay,function -parameters
> or
>    unknown-handlerID error
>
> execution-pipeline:getJobStatus( handlerID )
> returns
>         status of the job, function-parameters for this job
>
> execution-pipeline:getBatchStatus(  batch-ID )
> returns
>         the status for all jobs from a particular batch ID.
>
>
> execution-pipeline:getStatus(  )
> returns
>         the status for all jobs.
>
>
> execution-pipeline:closeJob( handlerID )
> execution-pipeline:closeBatch( batchID )
> execution-pipeline:closeAll( )
>
>
> function-parameters:
>
> job-statistic-id: used to keep average time for execution of this function.
> average time= (previous-average-time + last-execution-time)/2. URL with
> specific parameters could be used as an ID.
> execute-before function: when provided, it will be called before calling the
> main function for this job. If the result is 0 then proceed with the main
> function, otherwise use the result as number of milliseconds to put this job
> to sleep and try later.
> callback function: when provided callback-function will be called as
> callback-function( handlerID, result, function-parameters ). if it returns
> true() the job will be closed.
> any other parameters that may be used by the callback function.
>
> pipeline-parameters:
>
> batch-ID - to group
> batch-callback-function: called when all jobs from the batch are completed.
> any other parameters that may be used by the callback function.
>
> Any comments?
>
> Thomas
>
>
> ------
>
> Thomas White
>
> Mobile:+44 7711 922 966
> Skype: thomaswhite
> gTalk: thomas.0007
> Linked-In:http://www.linkedin.com/in/thomaswhite0007
> facebook: http://www.facebook.com/thomas.0007
>
> ------------------------------------------------------------------------------
> Throughout its 18-year history, RSA Conference consistently attracts the
> world's best and brightest in the field, creating opportunities for
> Conference
> attendees to learn about information security's most important issues
> through
> interactions with peers, luminaries and emerging and established companies.
> http://p.sf.net/sfu/rsaconf-dev2dev
> _______________________________________________
> Exist-development mailing list
> Exi...@li...
> https://lists.sourceforge.net/lists/listinfo/exist-development
>
>



-- 
Adam Retter

eXist Developer
{ United Kingdom }
ad...@ex...
irc://irc.freenode.net/existdb

64 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 119 120 121 122 123 .. 128 > >> (Page 121 of 128)