Hi all,

As discussed in a previous email, the uri-collection() function in Saxon9.5 is rather hobbled when using the default URI Resolver that ships with the product because (as Michael explained) using that resolver leads to an implicit building of the entire tree for each found document. This rather defeats the point of streaming, which was (in large point) the raison d'etre of the uri-collection function.

This afternoon I tested a partial workaround for this. Since the issue described above fundamentally comes from uri-collection using the same URI resolver as collection(), you can trick it into skipping the building of trees by using the unparsed=true option.  This will still implicitly execute the reading of the entire file into memory, but as a text file rather than as an XML tree. (Technically, it is read in as an XML document with a single text node.)

Using this trick allows streaming, but at a cost in that the time required to read in the file as a single huge text node is wasted. It also means that ill-formed source files will not be identified.

This does raise a question:in the current product, is there ever a reason to use uri-collection() without unparsed=true? In my testing it seemed that the trees built implicitly during the execution of uri-collection() were lost immediately.



"A false conclusion, once arrived at and widely accepted is not dislodged easily, and the less it is understood, the more tenaciously it is held." - Cantor's Law of Preservation of Ignorance.