Hi,

 

It seems there are more and more people with this issue.

 

We had (and still have) a similar issue but with the 1024 limit of  prefixes per namespace and it seems 9.4 won’t solve it.

 

I’m afraid it is not only a “ pathological Apache application” case. (for example Apache Axis client), but nowadays it is easily to get even 1024 clients (client apps) which uses the same web service. It is true, that client should use default prefix as soon as possible to decrease payload size at least, but it is difficult to teach clients.

 

 

What is more, this is another XML bomb.

 

Assuming that a bad guy knows that the server app is Java and uses XSL he is almost sure that it is Saxon based app, so it can try to generate a request with 1024 prefixes for the same namespace and try disabling the system with even one single request.

 

 

There are several ways to try avoiding that but all of them are partial:

-          Have several JVMs

-          Have a pool of TransformerFactories

 

They doesn’t solve issue with “xml bomb with 1025 prefixes for one namespace”.

 

 

Smd proposes to write a preprocessor which normalize the  prefixes….

I though it is a complete solution, but it is not.

 

There is even net.sf.saxon.om.PrefixNormalizer in the Saxon distribution, but the code is not complete (it doesn’t handle attributes with prefixes).

I wonder why it is not completed, but after writing a complete one I realized that it doesn’t solve the problem completely.

 

Even assuming that prefixes for all namespaces are unique it doesn’t solve the problem with element/attribute values.

If the element/attribute value is a QName, it is a problem.

 

 For example:

 

<n0:a xmlns:n0="uri1">

            <n1:b xmlns:n1="uri1">n1:someLocalName</n1:b>

</n0:a>

 

After normalization looks like:

 

<n0:a xmlns:n0="uri1">

            <n0:b>n1:someLocalName</n0:b>

</n0:a>

 

When XML Schema for the xml says that the b element is “string” – both xmls are valid.

If the b element is described as “QName” – the normalized xml is not valid.

An approach to change the values is not a solution, because the value which looks like a QName may be a string which shouldn’t be changed.

 

 

 

The reason of this mail is to prove that:

-          It must be fixed, because it is dangerous

-          The only solution is to fix the issue in Saxon by:

a) Making NamePool non global.

Currently there is one NamePool per Saxon Configuration, maybe there should be as many NamePools as Controllers.

b) Rewrite NamePool so that it uses “long” for addresses or consider Map representation instead of matrix.

 

 

 

--

Regards,

Mateusz Nowakowski

 

From: Michael Kay [mailto:mike@saxonica.com]
Sent: Tuesday, August 09, 2011 10:37 AM
To: Scott Robey
Cc: saxon-help@lists.sourceforge.net
Subject: Re: [saxon] Saxon HE ArrayIndexOutOfBoundsException: -32768 in NamePool.allocateCodeForPrefix(NamePool.java:483)

 

As it happens the limit on the number of namespace prefixes has gone in the 9.4 development branch, though there's still a limit of 32K namespace URIs and a limit of 1K prefixes per URI.

It would be nice to have limits that are sufficiently high that no-one will ever reach them, but I'm afraid I've decided several times over the years that redesigning the name pool for the benefit of this one rogue application really can't be justified.

I vaguely recall writing an XMLFilter on one occasion to normalize the namespace prefixes, but I can't lay my hands on it now.

Using a static TransformerFactory itself seems a poor design choice.

Sorry for the inconvenience!

Michael Kay
Saxonica


On 09/08/2011 05:11, Scott Robey wrote:

Thanks for the quick response, Michael.

 

From what I can tell looking at the NamePool source, there is a minor logic error that prevents it from failing nicely. Specifically, because the prefixes array is set to double the prefixesUsed number, each time it's exceeded, the following conditional may not be met:

 

 if (prefixesUsed >= prefixes.length) {
            if (prefixesUsed > 32000) {
                throw new NamePoolLimitException("Too many namespace prefixes");

 

The array will be doubled to something like 393216, and the prefixesUsed will exceed the max size for short before the above test fails...I think. Anyway, that's not really a big deal.

 

You are correct about the pathological Apache application, our clients are using an old version of Axis2 (1.2) and it appears to be creating a new namespace prefix for every web service request, so eventually I get prefixes that look like: axis2ns32001, hence the exception once the client has sent requests with more than 32767 different namespaces. Until the clients upgrade to a newer version of Axis2, however,  I'm kind of stuck and need to find a workaround.

 

I understand your reasoning that the application should not be using such a vast number of namepsace prefixes, but you can't always control what your clients are sending, right? There could be thousands of different clients using different data-binding tools and thus possibly using many different prefixes, seems like sooner or later your application will exceed the limitations imposed by the NamePool design. I'm not speaking from actual practical experience though, more theoretical, so maybe in the real world this isn't really an issue and the tools tend to reuse namespace prefixes (ns1, ns2, etc.).

 

In any case, in my situation, I don't have control over the lifecycle of the TransformerFactory, so I can't create multiple Saxon instances to handle the workload. This is because the JAXB RI is using a static TransformerFactory, see com.sun.xml.bind.v2.runtime.JAXBContextImpl. It's out of my control.

 

I'm going to pursue a workaround on the client side, such that the web service requests don't contain an infinite number of different prefixes, I think I can do this using a different Axis2 API.

 

Thanks for your time.

 

-Scott Robey

 


From: Michael Kay <mike@saxonica.com>
To: saxon-help@lists.sourceforge.net; scottrobey@yahoo.com
Sent: Monday, August 8, 2011 3:52 AM
Subject: Re: [saxon] Saxon HE ArrayIndexOutOfBoundsException: -32768 in NamePool.allocateCodeForPrefix(NamePool.java:483)

Yes, this is a known limit. I thought that these days it would fail cleanly saying the limit had been exceeded, but it seems that's not the case on this path.

There's a pathological Apache application - I've forgotten which - that generates XML in which a new namespace prefix is allocated (for the same namespace URI) on every element instance. The Saxon NamePool can't handle that. I don't know if that's the situation here. (In fact, I think that example would hit a different limit, namely a maximum of 1024 prefixes per namespace URI).

Devising a workaround involves understanding why the application is using such a vast number of different namespace prefixes. Is it generating prefixes at random for a small and stable set of namespace URIs? If that's the case, then one could insert a SAX filter to normalize the prefixes (assuming they don't also appear in content). On the other hand, if the number of namespace URIs is also very large, then one might have to look at ways of using multiple Saxon instances to handle the workload.

Michael Kay
Saxonica


On 07/08/2011 21:55, Scott Robey wrote:

I have an application using Saxon HE (9.3.0.5j) and JAX-WS/JAXB. During stress testing, after the application has processed several thousand XML messages, the following exception gets thrown repeatedly, and the application becomes unable to process XML until the JVM is cycled.

 

java.lang.ArrayIndexOutOfBoundsException: -32768
    at net.sf.saxon.om.NamePool.allocateCodeForPrefix(NamePool.java:483)
    at net.sf.saxon.om.NamePool.allocate(NamePool.java:563)
    at net.sf.saxon.event.ReceivingContentHandler.getNameCode(ReceivingContentHandler.java:405)
    at net.sf.saxon.event.ReceivingContentHandler.startElement(ReceivingContentHandler.java:289)

 

I've looked at the NamePool source, and it appears that the NamePool object referenced by the Configuration object has a limitation of only being able to handle ~32000 different namespace prefixes throughout the life of the application. My question is: is this a known and accepted limitation of the Saxon XSLT processor? Is there any way I can work around the issue given that JAXB is creating the TransformerFactory, and I have no control over its lifecycle?

 

The following JUnit test demonstrates the problem. The test case fails consistently when using the Saxon processor, and the test case passes reliably when using the standard JDK's TransformerFactory.

 

  @Test
  public void testArrayIndexOutOfBoundsException() throws Exception {
    final long MAX = 33000;
    final String XML_TEMPLATE = "<ns%1$s:element xmlns:ns%1$s=\"urn:ns1\">test</ns%1$s:element>";
   
    TransformerFactory factory = TransformerFactory.newInstance();
    System.out.println("TransformerFactory: " + factory);
    for ( long i = 0; i < MAX; i++ ) {
      String xml = String.format(XML_TEMPLATE,  i);
      DOMResult result = new DOMResult();
      try {
        factory.newTransformer().transform(
          new StreamSource(new ByteArrayInputStream(xml.getBytes())), result);
      }
      catch ( Exception e ) {
        e.printStackTrace();
        Assert.fail("After: " + i + " iterations. " + e);
      }
    }
  }  

 

Any insight is appreciated.


thanks,

Scott

 

 

 

 
------------------------------------------------------------------------------
BlackBerry&reg; DevCon Americas, Oct. 18-20, San Francisco, CA
The must-attend event for mobile developers. Connect with experts. 
Get tools for creating Super Apps. See the latest technologies.
Sessions, hands-on labs, demos & much more. Register early & save!
http://p.sf.net/sfu/rim-blackberry-1
 
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
saxon-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/saxon-help