One clarification: the values in the table represent time in seconds for completion of 15000 searches.


On Wed, Sep 18, 2013 at 2:44 PM, David Rudel <fwqhgads@gmail.com> wrote:
Hi All,

I recently did some testing of various algorithms in Saxon 9.5.0.2 and Saxon 9.5.1.2, with and without byte generation. Some of the results looked like they may be of interest to the community and/or Saxonica. "Observations 1 and 2" in particular seem worth consideration.

Note: I was using oxYgen for this testing. For Saxon 9.5.0.2, I used the current stable build of oxYgen. For Saxon 9.5.1.2, I used the current beta release that incorporates that version.
However, for some reason this beta did not let me access as much RAM... This may mean it was using 32-bit java, or it might mean something completely different. For this reason, differences between Saxon implementations are probably less important than differences among algorithms or differences between byte-code results versus non-byte-code results.

The basic question asked was "Given a sorted sequence ($seq) of xs:doubles, find the greatest value in $seq less than or equal to a given threshold."

I tested 5 different algorithms:

A1: Using the simple mapping operator: $seq[($seq!(if(. gt $threshold) then position() else ()))[1] - 1]

A2: Straight predicate:  $seq[. lt $threshold][last()]  <--does not make full use of sorted nature of sequence

A3: Recursive XSL-Function option (create an XSL-function that executes the suggestion made by Michael Kay at http://stackoverflow.com/questions/18856302/xslt3-and-xpath2-best-way-to-identify-greatest-element-below-a-threshold-in-sor

{I tried to test an Xpath-inline-recursion version, but kept getting an error saying that the variable was not declared.}

A4: Call a function using the <xsl:iterate> command to scan the list, using <break/> to stop processing once the threshold is met.

A5: Same as A4, but without the use of a dedicated function.


To test this I made a script that generated a 5000-item sorted list and then ran the various algorithms on this sequence 15000 times using randomly generated $threshold values between 0 and the maximum value of the sequence.

In some cases, algorithms were so slow that I had to use 150 iterations instead of 15000 iterations, and extrapolate the results. In one case the algorithm was so fast that I used 150000 iterations instead to get desired precision.

I have attached the scripts and the results in a chart.

Some key observations:

1. For some reason, the mapping operator option is extremely slow. Note that in separate testing I found that this algorithm works great if you are just trying to find the _position_ of the desired item, but somehow the calling of the actual value is causing a major slowdown.

That is to say that $seq!(if(. gt $threshold) then position() else ()))[1] - 1 is just as fast any any other algorithm for finding the position of the greatest item less than $threshold, but when this value is put inside the predicate to call the entry from the sequence, there is a huge slowdown.

2. byte-code generation appears to destroy tail-recursion optimization for the recursive-function option.

3. It appears that in Saxon 9.5.2.0.2, byte generation triggered some sort of optimization when the iterate-break algorithm made use of a separate function but not when it was all done inside the main template.  [In Saxon 9.5.2.1.2, this optimization appears to be triggered in both cases.]

4. Byte-code generation helped the brute-force predicate algorithm in Saxon 9.5.2.1.2 but not in Saxon 9.5.2.0.2 [could have nothing to do with change of version, may be based on RAM differences]

I hope these results are useful, and if someone could show me how to use inline recursive functions in Saxon, I'd appreciate that.

-David






--

"A false conclusion, once arrived at and widely accepted is not dislodged easily, and the less it is understood, the more tenaciously it is held." - Cantor's Law of Preservation of Ignorance.



--

"A false conclusion, once arrived at and widely accepted is not dislodged easily, and the less it is understood, the more tenaciously it is held." - Cantor's Law of Preservation of Ignorance.