Apply multiple XSL transforms to the same XML

Help
Ken Tam
2010-07-18
2012-10-08
  • Ken Tam

    Ken Tam - 2010-07-18

    Hello,

    I am working on a project where multiple XSL transform scripts will be applied
    to the same XML source. These multiple XSL scripts will be written by
    different teams and deployed independently. Thus, they can't be combined into
    a single XSL script - at least not easily.

    The average size of the XML source is about 50KB but the number of XML source
    can be up to 20,000 an hour.

    I am currently using the JAXP interface to pre-compile the XSL scripts into
    Templates and process the XML source in a multi-threaded environment. I
    haven't tested the actual throughput of this setup yet but am wondering if
    there is a better approach because it seems like the internal XML data
    structure needs to be rebuilt for each XSL script. Here is the code fragment:

    DOMSource domRoot = new DOMSource(root);

    for ( int i = 0; i < xsltCount; i++ ) {
    Transformer transformer = templates_.newTransformer();
    DOMResult dRes = new DOMResult();
    transformer.transform(domRoot, dRes);
    }

    That is, the internal structure of XML domRoot needs to be rebuilt for each
    transform() call. Is this correct? Is there anyway to preserve the internal
    structure across different transformations? For example, some of the XSL
    scripts use the <xsl:for-each-group> construct with the same condition. This
    seems to imply the internal data structure needs to be rebuilt for each
    transformation.

    Would S9API help in this case by building the internal structure in XdmNode
    and passes it to each of the pre-complied XsltTransformer?

    I am currently using saxon-9.1.0.2. Would it help to move to Saxon 9HE or EE?

    Thanks._

     
  • Michael Kay

    Michael Kay - 2010-07-18

    It's not a good idea to use a DOM for this purpose for two reasons: firstly,
    Saxon is 5-10 times slower on a DOM than on its native tree format, and
    secondly, the DOM is not thread-safe - you can't use the same DOM document in
    parallel threads, even in read mode. It's much better to use the native tree
    model. There are two ways you can do this:

    (a) build the document using Configuration.buildDocument() - this returns a
    DocumentInfo, which implements Source, and can therefore be passed to the JAXP
    Transform method. You will need to use the Configuration contained within your
    TransformerFactory, which you can get either be instantiating
    net.sf.saxon.TransformerFactoryImpl with your own Configuration, or by casting
    the TransformerFactory to its Saxon implementation class and using the
    getConfiguration() accessor method.

    (b) switch to using s9api, where you can create an XdmNode using the s9api
    DocumentBuilder, and set this as the context item on the XsltTransformer
    object.

    One thing to bear in mind when using the same document as input to several
    stylesheets is that it's best if they don't do any whitespace-stripping. It's
    most efficient to strip whitespace while building the tree, rather than while
    navigating it.

     
  • Ken Tam

    Ken Tam - 2010-07-19

    Yes, the XML source DOM will be accessed by a single thread - multiple XSL
    scripts will be applied serially. This helps to manage transaction by treating
    the XSL scripts as one unit to commit or rollback results.

    I am already instantiating net.sf.saxon.TransformerFactoryImpl so option (a)
    will be easier to implement. However, I would like to confirm that using (b)
    doesn't provide any performance gain over (a). Is this correct? In addition,
    there isn't any other configuration setting to further improve performance -
    e.g. preserve internal data structure across transformations. Is this correct?

    Thanks for your help.

     
  • Michael Kay

    Michael Kay - 2010-07-19

    Using the s9api interfaces gives you cleaner code (in my opinion) but it won't
    run any faster.

    If you stick with a DOMSource rather than using a Saxon native tree, then
    apart from the thread safety issues, you should be aware that there are two
    ways you can do this in Saxon: you can wrap the DOM in a Saxon wrapper to
    implement the Saxon NodeInfo interface, or you can copy it to a native Saxon
    tree. Which is more efficient depends on (a) time vs memory trade-off, and (b)
    how much activity the transformation does. As a rule of thumb, if the
    transformation accesses each node of the source more than once, then copying
    is probably faster than wrapping. By default Saxon using wrapping rather than
    copying, but there are many switches and options that can change this, for
    example it will always copy if validation is requested. But both approaches
    are expensive compared with using a native Saxon tree in the first place.

     
  • Ken Tam

    Ken Tam - 2010-07-20

    I don't need to stick with DOMSource. In fact, I'd prefer to use Saxon native
    tree. Here is the updated code:

    AugmentedSource domRoot = AugmentedSource.makeAugmentedSource(new
    DOMSource(root));
    domRoot.setWrapDocument(true);
    DocumentInfo source = saxonConfig.buildDocument(domRoot);

    for ( int i = 0; i < xsltCount; i++ ) {
    Transformer transformer = templates_.newTransformer();
    DOMResult dRes = new DOMResult();
    transformer.transform(source, dRes);
    }

    I am currently using saxon-9.1.0.2 and buildDocument() only takes one
    argument. AugmentedSource is used to ensure wrapping is turned off.
    saxonConfig is saved by calling getConfiguration() from
    net.sf.saxon.TransformerFactoryImpl.

    Let me know if buildDocument(new DomSource(root)) is sufficient to create a
    copy in native tree format.

    The above code fragment creates a native tree once per XML source and applied
    the same native tree to each XSL script.

    I am still wondering if I should combine all XSL scripts into one. Is it worth
    the effort? I guess the question comes down to how much overhead is there to
    apply the same XML in Saxon native tree format to multiple pre-compiled XSL
    templates as opposed to one combined pre-compiled XSL template.

    Thanks again for your help._

     
  • Michael Kay

    Michael Kay - 2010-07-20

    I would have expected to see domRoot.setWrapDocument(false) rather than
    domRoot.setWrapDocument(true) if you want to copy the tree rather than
    wrapping it. You can check that you have actually created a native Saxon tree
    by checking the implementation class of "source" - it should be
    TinyDocumentImpl. Otherwise the structure looks fine. I don't see any benefit
    in combining the separate transformations into a single stylesheet.

     
  • Ken Tam

    Ken Tam - 2010-07-21

    Yes, my bad. It should be domRoot.setWrapDocument(false). I see
    TinyDocumentImpl as "source".

    Thanks again for your help.

     

Log in to post a comment.