Thanks for the contribution.

For clarification, Saxon 9.4 will remove the limit of N prefixes per namespace URI, but there will still be other limits imposed by the NamePool, for example 1024 prefixes per (uri, local-name) pair.

The NamePool design works extremely well in the large majority of cases, and its limitations only affect a very small number of users (it only generates three or four support issues per year, and this figure is not increasing). This makes any redesign a significant challenge, because we don't want to create the situation where 99% of users are worse off.

Moving to a 64-bit namecode rather than a 32-bit namecode is a relatively modest change. It would solve the problem of limits for nearly all practical situations, but it would not solve other known problems with the NamePool design - the contention caused by synchronizing on the NamePool when allocating codes, and the indefinite growth in memory usage for workloads where the vocabulary is unbounded (e.g. when names are generated at random). It would impose a small overhead on all users, but the effect would probably be unnoticeable to nearly all of them.

Partitioning the NamePool into one object per namespace, perhaps at the same time as increasing name codes to 64 bits, might also solve the contention issues - but only for some users.

The only way I know of to remove all the known disadvantages of the NamePool design would also involve losing its biggest benefit - the fact that queries and stylesheets can be compiled to search for specific integer codes present in the instance documents, making evaluation of typical path expressions extremely fast.

I take your point that the limits could be exploited in a DoS attack. It might be that there are ways of defending against that other than redesigning the NamePool; this requires further thought.

So it's far from obvious what the best way forward is.

Michael Kay
Saxonica

 
On 19/08/2011 15:57, Nowakowski, Mateusz wrote:

Hi,

 

It seems there are more and more people with this issue.

 

We had (and still have) a similar issue but with the 1024 limit of  prefixes per namespace and it seems 9.4 won’t solve it.

 

I’m afraid it is not only a “ pathological Apache application” case. (for example Apache Axis client), but nowadays it is easily to get even 1024 clients (client apps) which uses the same web service. It is true, that client should use default prefix as soon as possible to decrease payload size at least, but it is difficult to teach clients.

 

 

What is more, this is another XML bomb.

 

Assuming that a bad guy knows that the server app is Java and uses XSL he is almost sure that it is Saxon based app, so it can try to generate a request with 1024 prefixes for the same namespace and try disabling the system with even one single request.

 

 

There are several ways to try avoiding that but all of them are partial:

-          Have several JVMs

-          Have a pool of TransformerFactories

 

They doesn’t solve issue with “xml bomb with 1025 prefixes for one namespace”.

 

 

Smd proposes to write a preprocessor which normalize the  prefixes….

I though it is a complete solution, but it is not.

 

There is even net.sf.saxon.om.PrefixNormalizer in the Saxon distribution, but the code is not complete (it doesn’t handle attributes with prefixes).

I wonder why it is not completed, but after writing a complete one I realized that it doesn’t solve the problem completely.

 

Even assuming that prefixes for all namespaces are unique it doesn’t solve the problem with element/attribute values.

If the element/attribute value is a QName, it is a problem.

 

 For example:

 

<n0:a xmlns:n0="uri1">

            <n1:b xmlns:n1="uri1">n1:someLocalName</n1:b>

</n0:a>

 

After normalization looks like:

 

<n0:a xmlns:n0="uri1">

            <n0:b>n1:someLocalName</n0:b>

</n0:a>

 

When XML Schema for the xml says that the b element is “string” – both xmls are valid.

If the b element is described as “QName” – the normalized xml is not valid.

An approach to change the values is not a solution, because the value which looks like a QName may be a string which shouldn’t be changed.

 

 

 

The reason of this mail is to prove that:

-          It must be fixed, because it is dangerous

-          The only solution is to fix the issue in Saxon by:

a) Making NamePool non global.

Currently there is one NamePool per Saxon Configuration, maybe there should be as many NamePools as Controllers.

b) Rewrite NamePool so that it uses “long” for addresses or consider Map representation instead of matrix.

 

 

--

Regards,

Mateusz Nowakowski

 

From: Michael Kay [mailto:mike@saxonica.com]
Sent: Tuesday, August 09, 2011 10:37 AM
To: Scott Robey
Cc: saxon-help@lists.sourceforge.net
Subject: Re: [saxon] Saxon HE ArrayIndexOutOfBoundsException: -32768 in NamePool.allocateCodeForPrefix(NamePool.java:483)

 

As it happens the limit on the number of namespace prefixes has gone in the 9.4 development branch, though there's still a limit of 32K namespace URIs and a limit of 1K prefixes per URI.

It would be nice to have limits that are sufficiently high that no-one will ever reach them, but I'm afraid I've decided several times over the years that redesigning the name pool for the benefit of this one rogue application really can't be justified.

I vaguely recall writing an XMLFilter on one occasion to normalize the namespace prefixes, but I can't lay my hands on it now.

Using a static TransformerFactory itself seems a poor design choice.

Sorry for the inconvenience!

Michael Kay
Saxonica

------------------------------------------------------------------------------ Get a FREE DOWNLOAD! and learn more about uberSVN rich system, user administration capabilities and model configuration. Take the hassle out of deploying and managing Subversion and the tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2
_______________________________________________ saxon-help mailing list archived at http://saxon.markmail.org/ saxon-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/saxon-help