Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

Saxon Configuration synchronization issue

Bartek
2011-03-31
2012-10-08
  • Bartek
    Bartek
    2011-03-31

    It seems that net.sf.saxon.Configuration is a little bit incorrectly
    synchronized which sometimes causes performance issues. getSourceParser() and
    reuseSourceParser() should not be synchronized. They could (for example) look
    like this:

       private volatile ConcurrentLinkedQueue<XMLReader> sourceParserPool = new ConcurrentLinkedQueue<XMLReader>();
    
        private final int maxPoolSize;
    
        @Override
        public XMLReader getSourceParser() throws TransformerFactoryConfigurationError {
    
            XMLReader parser = sourceParserPool.poll();
            if (parser != null) {
                return parser;
            }
    
            if (getSourceParserClass() != null) {
                parser = makeParser(getSourceParserClass());
            } else {
                parser = loadParser();
            }
            try {
                Sender.configureParser(parser);
            } catch (XPathException err) {
                throw new TransformerFactoryConfigurationError(err);
            }
            if (isValidation()) {
                try {
                    parser.setFeature("[url]http://xml.org/sax/features/validation[/url]", true);
                } catch (SAXException err) {
                    throw new TransformerFactoryConfigurationError("The XML parser does not support validation");
                }
            }
    
            return parser;
    
        }
    
        @Override
        public void reuseSourceParser(XMLReader parser) {
    
            try {
                // give things back to the garbage collecter
                parser.setContentHandler(null);
                parser.setEntityResolver(null);
                parser.setDTDHandler(null);
                parser.setErrorHandler(null);
                // Unfortunately setting the lexical handler to null doesn't work on Xerces, because
                // it tests "value instanceof LexicalHandler". So we set it to a lexical handler that
                // holds no references
                parser.setProperty("[url]http://xml.org/sax/properties/lexical-handler[/url]", dummyLexicalHandler);
            } catch (Exception err) {
                //
            }
    
            // this condition isn't safe, some fluctuations in pool size might be possible; however, this should not do any harm :)
            if (sourceParserPool.size() < maxPoolSize) {
                sourceParserPool.offer(parser);
            }
    
        }
    

    Please also note that I added a (configurable) limit (a not really strict
    one). Anyway, it seems that currently there's no limit - parser pool can be
    extremely huge.

    Thanks,
    Bartek

     
  • Michael Kay
    Michael Kay
    2011-03-31

    Thanks for the contribution. When you say the current code is "incorrect", I
    assume you only mean that it is sub-optimal? Do you have any evidence of this
    causing a bottleneck, or of the improvements produced by your change? I would
    expect it to make a difference only when you are parsing very large numbers of
    very small documents (which of course is the scenario that caused the parser
    pool to be introduced in the first place).

    Do you have any unit tests for your changed code?

    Finally (and sorry about this, I know it's boring) can you please confirm that
    you and your employer are happy either to license this code under MPL, or to
    relinquish all copyright claims.

     
  • Michael Kay
    Michael Kay
    2011-03-31

    A further question: putting a limit on the pool size. This only seems
    necessary if you think people are likely to return a parser to the pool that
    was not obtained from the pool. This seems improbable to me, and not worth the
    cost of calling the size() method which is expensive on ConcurrentLinkedQueue.

     
  • Bartek
    Bartek
    2011-03-31

    Hi,

    Yes, I should have said that is suboptimal. It's not incorrect.

    Similarly to what I wrote on the second thread, for this issue I only have
    thread dumps with plenty of threads waiting on these synchronization.

    I haven't prepared any unit tests, this was just quick hacking. If I were to
    prepare a clean patch with unit tests, I would send you a diff and not an ugly
    snippet (based on 9.1.08 and not the trunk) :).

    On limiting the size: This was an extremely simple mechanism that was
    implemented for our tests. However, I feel that limiting the size makes sense
    as parsers might be "idle" for a long period of time which would prevent them
    from being gc-ed. A real pool with min/max/keepalive would be required.

    I'm ok with licensing this code under MPL. However, I'll need to refactor it
    anyway :).

    Thanks,
    Bartek

     
  • Bartek
    Bartek
    2011-04-01

    Oops, I meant to write:

    "I'm ok with licensing this code under MPL. However, you'll need to refactor
    it anyway :). "

    Bartek