Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

Open Stylesheet Processing

2010-04-08
2012-10-08
  • Bruno Feurer
    Bruno Feurer
    2010-04-08

    Hello Mr. Kay

    I'd like to process small to medium sized XML documents with virtually one
    open-sized XSLT stylesheet. The structure of the different input documents
    should be able to evolve and grow independently from the process itself. The
    process should be able to handle any new namespace and element type with the
    addition of corresponding template matches. Since such a all-mighty stylesheet
    would soon grow too big, I would like to break it down, for example by
    namespace:

    1) The process first scans the input document for all the namespaces
    contained. Then it dynamically builds a stylesheet, importing the according
    stylesheets per namespace. The result gets compiled and the input transformed.

    2) The process scans the input document for the namespaces contained and
    builds a pipeline with the pre-compiled stylesheets, handling these
    namespaces. The input gets transformed step-by-step.

    3) The process starts with the stylesheet for the root namespace. For all the
    unknown namespaces transforms the node with the pre-compiled stylesheet for
    the according namespace via the saxon:transform() extension. Every "foreign"
    node gets transformed separatly.

    Which approach do you think performs best? do you have ideas for another
    approach?

    The number of namespaces within one input document is only a small subset of
    the possible overall selection. Some namespaces appear more often, others very
    seldom. A proper cache management would prevent the process from pre-compiling
    all the stylesheets in a long term, wide coverage usage.

    Thanks,
    Bruno

     
  • Michael Kay
    Michael Kay
    2010-04-08

    Sorry, I'm not sure I can offer any useful design advice here. I would need to
    spend a lot more time studying the project requirements.

     
  • Bruno Feurer
    Bruno Feurer
    2010-04-14

    Hello Mr. Kay

    Maybe I've been too specific somehow..., don't see where though... any way,
    the problem, I'd like to grasp, is more of a general nature, there are no
    specific project requirements. I've played around with some arbitrary test
    examples:
    http://livcos-dev.blogspot.com/2010/04/open-xslt-processing-2.html
    Maybe you can use the results and I would be happy about comments.

    Regards,
    Bruno

     
  • Michael Kay
    Michael Kay
    2010-04-14

    Thanks for sharing the data. Performance measurement is always illuminating,
    and it often raises a new question for every one that it answers. The
    difference for the figures with 1 "content element" versus 2 content elements
    is certainly striking, and I would think it is worth investigating further.
    Without any idea what your actual XSLT code looks like, I can't give any more
    specific advice.

     
  • Michael Kay
    Michael Kay
    2010-04-14

    Just had a very quick glance at your code (I'm afraid I won't have time to do
    more). You need to be aware that using a DOMSource with Saxon is very
    inefficient - typically 4-10 times the cost of using Saxon's native tree
    (which gets built if you supply a StreamSource or SAXSource).

     
  • Michael Kay
    Michael Kay
    2010-04-15

    Good, that makes sense. On occasions I've been tempted to change Saxon so that
    when given a DOMSource it defaults to converting it to a TinyTree rather than
    wrapping it - the trouble is that neither strategy is better all the time, and
    changing it will break applications.