Hi,

 

It seems there are more and more people with this issue.

 

We had (and still have) a similar issue but with the 1024 limit of  prefixes per namespace and it seems 9.4 won’t solve it.

 

I’m afraid it is not only a “ pathological Apache application” case. (for example Apache Axis client), but nowadays it is easily to get even 1024 clients (client apps) which uses the same web service. It is true, that client should use default prefix as soon as possible to decrease payload size at least, but it is difficult to teach clients.

 

 

What is more, this is another XML bomb.

 

Assuming that a bad guy knows that the server app is Java and uses XSL he is almost sure that it is Saxon based app, so it can try to generate a request with 1024 prefixes for the same namespace and try disabling the system with even one single request.

 

 

There are several ways to try avoiding that but all of them are partial:

-          Have several JVMs

-          Have a pool of TransformerFactories

 

They doesn’t solve issue with “xml bomb with 1025 prefixes for one namespace”.

 

 

Smd proposes to write a preprocessor which normalize the  prefixes….

I though it is a complete solution, but it is not.

 

There is even net.sf.saxon.om.PrefixNormalizer in the Saxon distribution, but the code is not complete (it doesn’t handle attributes with prefixes).

I wonder why it is not completed, but after writing a complete one I realized that it doesn’t solve the problem completely.

 

Even assuming that prefixes for all namespaces are unique it doesn’t solve the problem with element/attribute values.

If the element/attribute value is a QName, it is a problem.

 

 For example:

 

<n0:a xmlns:n0="uri1">

            <n1:b xmlns:n1="uri1">n1:someLocalName</n1:b>

</n0:a>

 

After normalization looks like:

 

<n0:a xmlns:n0="uri1">

            <n0:b>n1:someLocalName</n0:b>

</n0:a>

 

When XML Schema for the xml says that the b element is “string” – both xmls are valid.

If the b element is described as “QName” – the normalized xml is not valid.

An approach to change the values is not a solution, because the value which looks like a QName may be a string which shouldn’t be changed.

 

 

 

The reason of this mail is to prove that:

-          It must be fixed, because it is dangerous

-          The only solution is to fix the issue in Saxon by:

a) Making NamePool non global.

Currently there is one NamePool per Saxon Configuration, maybe there should be as many NamePools as Controllers.

b) Rewrite NamePool so that it uses “long” for addresses or consider Map representation instead of matrix.

 

 

--

Regards,

Mateusz Nowakowski

 

From: Michael Kay [mailto:mike@saxonica.com]
Sent: Tuesday, August 09, 2011 10:37 AM
To: Scott Robey
Cc: saxon-help@lists.sourceforge.net
Subject: Re: [saxon] Saxon HE ArrayIndexOutOfBoundsException: -32768 in NamePool.allocateCodeForPrefix(NamePool.java:483)

 

As it happens the limit on the number of namespace prefixes has gone in the 9.4 development branch, though there's still a limit of 32K namespace URIs and a limit of 1K prefixes per URI.

It would be nice to have limits that are sufficiently high that no-one will ever reach them, but I'm afraid I've decided several times over the years that redesigning the name pool for the benefit of this one rogue application really can't be justified.

I vaguely recall writing an XMLFilter on one occasion to normalize the namespace prefixes, but I can't lay my hands on it now.

Using a static TransformerFactory itself seems a poor design choice.

Sorry for the inconvenience!

Michael Kay
Saxonica