Thanks for the contribution.
For clarification, Saxon 9.4 will remove the limit of N prefixes per
namespace URI, but there will still be other limits imposed by the
NamePool, for example 1024 prefixes per (uri, local-name) pair.
The NamePool design works extremely well in the large majority of cases,
and its limitations only affect a very small number of users (it only
generates three or four support issues per year, and this figure is not
increasing). This makes any redesign a significant challenge, because we
don't want to create the situation where 99% of users are worse off.
Moving to a 64-bit namecode rather than a 32-bit namecode is a
relatively modest change. It would solve the problem of limits for
nearly all practical situations, but it would not solve other known
problems with the NamePool design - the contention caused by
synchronizing on the NamePool when allocating codes, and the indefinite
growth in memory usage for workloads where the vocabulary is unbounded
(e.g. when names are generated at random). It would impose a small
overhead on all users, but the effect would probably be unnoticeable to
nearly all of them.
Partitioning the NamePool into one object per namespace, perhaps at the
same time as increasing name codes to 64 bits, might also solve the
contention issues - but only for some users.
The only way I know of to remove all the known disadvantages of the
NamePool design would also involve losing its biggest benefit - the fact
that queries and stylesheets can be compiled to search for specific
integer codes present in the instance documents, making evaluation of
typical path expressions extremely fast.
I take your point that the limits could be exploited in a DoS attack. It
might be that there are ways of defending against that other than
redesigning the NamePool; this requires further thought.
So it's far from obvious what the best way forward is.
On 19/08/2011 15:57, Nowakowski, Mateusz wrote:
> It seems there are more and more people with this issue.
> We had (and still have) a similar issue but with the 1024 limit of
> prefixes per namespace and it seems 9.4 won't solve it.
> I'm afraid it is not only a " pathological Apache application" case.
> (for example Apache Axis client), but nowadays it is easily to get
> even 1024 clients (client apps) which uses the same web service. It is
> true, that client should use default prefix as soon as possible to
> decrease payload size at least, but it is difficult to teach clients.
> What is more, this is another XML bomb.
> Assuming that a bad guy knows that the server app is Java and uses XSL
> he is almost sure that it is Saxon based app, so it can try to
> generate a request with 1024 prefixes for the same namespace and try
> disabling the system with even one single request.
> There are several ways to try avoiding that but all of them are partial:
> -Have several JVMs
> -Have a pool of TransformerFactories
> They doesn't solve issue with "xml bomb with 1025 prefixes for one
> Smd proposes to write a preprocessor which normalize the prefixes....
> I though it is a complete solution, but it is not.
> There is even net.sf.saxon.om.PrefixNormalizer in the Saxon
> distribution, but the code is not complete (it doesn't handle
> attributes with prefixes).
> I wonder why it is not completed, but after writing a complete one I
> realized that it doesn't solve the problem completely.
> Even assuming that prefixes for all namespaces are unique it doesn't
> solve the problem with element/attribute values.
> If the element/attribute value is a QName, it is a problem.
> For example:
> After normalization looks like:
> When XML Schema for the xml says that the b element is "string" --
> both xmls are valid.
> If the b element is described as "QName" -- the normalized xml is not
> An approach to change the values is not a solution, because the value
> which looks like a QName may be a string which shouldn't be changed.
> The reason of this mail is to prove that:
> -It must be fixed, because it is dangerous
> -The only solution is to fix the issue in Saxon by:
> a) Making NamePool non global.
> Currently there is one NamePool per Saxon Configuration, maybe there
> should be as many NamePools as Controllers.
> b) Rewrite NamePool so that it uses "long" for addresses or consider
> Map representation instead of matrix.
> Mateusz Nowakowski
> *From:*Michael Kay [mailto:mike@...]
> *Sent:* Tuesday, August 09, 2011 10:37 AM
> *To:* Scott Robey
> *Cc:* saxon-help@...
> *Subject:* Re: [saxon] Saxon HE ArrayIndexOutOfBoundsException: -32768
> in NamePool.allocateCodeForPrefix(NamePool.java:483)
> As it happens the limit on the number of namespace prefixes has gone
> in the 9.4 development branch, though there's still a limit of 32K
> namespace URIs and a limit of 1K prefixes per URI.
> It would be nice to have limits that are sufficiently high that no-one
> will ever reach them, but I'm afraid I've decided several times over
> the years that redesigning the name pool for the benefit of this one
> rogue application really can't be justified.
> I vaguely recall writing an XMLFilter on one occasion to normalize the
> namespace prefixes, but I can't lay my hands on it now.
> Using a static TransformerFactory itself seems a poor design choice.
> Sorry for the inconvenience!
> Michael Kay
> Get a FREE DOWNLOAD! and learn more about uberSVN rich system,
> user administration capabilities and model configuration. Take
> the hassle out of deploying and managing Subversion and the
> tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2
> saxon-help mailing list archived at http://saxon.markmail.org/