Re: [Exist-development] [Exist-commits] SF.net SVN: exist:[11099] trunk/eXist/src/org/exist/xquery/

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Wolfgang:

> Andrzej: maybe you could provide a concrete example for which you
> would need your extensions. This would make it easier for other users
> to follow.

Here's a rough outline of what we've built....

We've created a generic analytics reporting module, where you specify what analytical functions you want run, input
variables with defaults and relationships between hierarchical metrics using a declarative XML specification. That is,
you define your report and analysis in XML. No procedural code per se.

There is an XQuery called request.xql which will read all report definitions stored in the database, and will generate a
user interface in HTML allowing the user to request execution of one of the reports.  It reads the xml definition,
creates required input parameters and the like, and will do a post with all this, including the id of the selected report.

The post is handled by render.xql.  The issue we ran into is that some of the reports require custom xquery code which
is fairly extensive, and others use common code. In all cases, the user can elect to receive the output in raw xml
format (for submission to upstream systems perhaps) or in a human readable HTML format.  The HTML format is created from
the xml output as a precursor step.

So the render.xql has a dispatcher table which lists, for each report definition in the database, what .xqm module to
load dynamically, which function to call to generate the xml and which function to call to generate html from the xml.

This allowed us to split one massive render xquery into multiple modules which can be dynamically imported at runtime,
depending on which report the user requested.  In a subsequent step, we'll be splitting each report .xqm module into one
to generate xml and the other to generate html, to further separate things.  All of these modules are rather
large...these are extremely complicated reports which use dynamic analytics against healthcare data.

So...if you as a user requested report "A" with human readable HTML output, render.xql would look up the the "A" module
document name (say a-report.xqm or some such), along with the prefix and namespace the module uses, and would load this
module dynamically.  Then it would look up the function to call to generate the XML and will use eval() to call that
function.  The function will return the xml as a result back to request.xql, which will then look up the function to
call to generate the final HTML, and will call that function using eval() passing it the xml results.

The complicating factor is that some reports can return more than one xml document as a result, and so we needed the xml
generator to do something like this:

	return <xml>
		 <result1>{ $result1 }</result1>
		 <result2>{ $result2 }</result2>
	       </xml>

from the xml generation function and then the html renderer needs to then index into the two result values. That is a
very slow and expensive (not to mention memory consuming) operation since it's using in-memory fragments to create the
result document, when we already have the two results available.

For us, it would be a lot more efficient if we did something like:

	let $set1 := context:set-context-attribute( "result1", $result1 )
	let $set2 := context:set-context-attribute( "result2", $result2 )

in the xml function, and just did the following to get the data back in the html function:

	let $get1 := context:set-context-attribute( "result1" )
	let $get2 := context:set-context-attribute( "result2" )

rather than go to the time and expense of building the composite xml document.

This is all because we use render.xql as a dynamic dispatcher which dynamically calls and/or chains functions together
from modules that were dynamically loaded, depending on what report the user wanted to run.

Like I said, the composite XML document approach works fine, it's just a bit slow is all, so we don't really "need" the
context attributes. But they will come in handy. ;-)

> You both provided valid arguments. It's a difficult question and I'm
> not yet sure towards which side I tend (in particular since I have a
> terrible cold which currently limits my thinking).

Sorry to hear that.

> Adam is certainly
> right in that functional languages should avoid side-effects wherever
> possible. On the other hand, nearly all functional languages violate
> this principle and provide a way to set variables with side effects.
> In the XQuery world, the XQueryP extension introduces a "set
> $variable" expression
> (http://www.flworfound.org/pubs/Dana-XML-2006.pdf). MarkLogic has a
> dictionary data type, whose key/value pairs can be set from anywhere
> within a query! Apparently, users like this feature, though it
> encourages them to stick to procedural style instead of writing
> functional code.

I think that in the real world, as you said, all functional languages have to violate the principle, to get any "real
work" done and to integrate with external systems.

> On the other hand, I guess there are situations in which I wished I
> could set a global variable, e.g. to pass information between modules.

The newish cache extension module would work nicely for that. I've used it to generate and cache expensive lookup tables
at startup, that take many minutes, and would not be very user friendly to generate in response to a user request.

> Sometimes my modules need to save state between function calls. Sure,
> you could use some XML fragment to save the state and pass it back and
> forth between caller and module. This leads to complex function calls
> though, which are irritating to the user. Fortunately, a module can
> always store state into the database by saving an XML fragment.
> However, this is no different from setting session variables or
> whatever: it again introduces side effects.

That is my point. Trying to support a pure functional approach just doesn't work with real world applications, as all
functional language implementors have discovered.

> I think it is difficult to use XQuery for complex real-world
> application without allowing some functions to have side effects
> (storing a document).

I would venture to say it would be pretty much impossible. And even if it was possible technically, a pure functional
approach would turn off the majority of potential users. You don't see much Haskel used for just this reason.

My leanings are towards a more pragmatic approach.

> But we have to be very cautious. I rejected the
> idea to provide a dictionary data type, because it introduces side
> effects through the backdoor. From this point of view, I tend to
> prefer the more radical approach of a  "set" operation. If a function
> has side effects or not should at least be obvious.

I agree with you, though the cache extension module is very much like a dictionary data type, albeit at a global level.

-- 
Andrzej Taramina
Chaeron Corporation: Enterprise System Solutions
http://www.chaeron.com

Re: [Exist-development] [Exist-commits] SF.net SVN: exist:[11099] trunk/eXist/src/org/exist/xquery/

eXist-db is a feature rich Open Source native XML database

Re: [Exist-development] [Exist-commits] SF.net SVN: exist:[11099] trunk/eXist/src/org/exist/xquery/functions/util