Generalized streaming optimization?

Jess Holle
2009-04-30
2012-10-08
  • Jess Holle

    Jess Holle - 2009-04-30

    While I realize that Saxon SA does limited streaming, I've seen no evidence of any XSLT engine that does generalized streaming.

    I realize that some XSLT stylesheets intrinsically require random access to the entire source document and thus intrinsically defeat streaming.

    On the other hand, some XSLT clearly does not require any look-ahead or look-behind / saved-past-state to speak of and could be optimized to execute in a completely streaming manner.

    Some XSLT is in between, e.g. some branches of the source document tree must effectively be loaded in their entirety for random access, sorting, etc, but outside each such branch no non-sequential access or just minimal look-ahead/behind/saved-state is required.

    Determination of which case applies to a given XSLT stylesheet is sometimes possible without any schema information about the source document, but in other cases clearly needs schema information.

    Is there any effort on the part of Saxon or any XSLT implementation to actually do the static analysis necessary and apply the maximum level of streaming to the XSLT processing? I ask as XSLT has a lot to be said for it (and I use it a lot), but its practical application is limited by the fact that apart from limited explicit streaming extensions (e.g. in Saxon SA) XSLT implementations insist on loading the entirety of the source document into memory at once. However efficiently this is done this becomes a scalability bottleneck, most especially for large, high-concurrency applications.

     
    • Michael Kay

      Michael Kay - 2009-04-30

      The XSL working group has been doing a lot of work on streaming over the last couple of years and hopefully some of this will soon be ready to be published. I'm also implementing things in Saxon in parallel as the ideas develop. In 9.2 I'll be releasing a new capability, "streaming templates", which for the first time allows recursive-descent processing using template rules in streaming mode.

      The approach both in the spec and in Saxon is basically that the user has to say they want streamed processing (this is associated with a mode), and static analysis is then done to verify that the code is actually streamable. Most of our work on use cases suggests that code becomes streamable only if the author takes a certain amount of care to make it streamable, which suggests that the approach of treating streaming as a pure optimization with no user involvement isn't the best strategy - that option has been open to implementors for 10 years and hasn't delivered anything.

       

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks