I've just tried using Saxonica with a fairly complicated schema
that required about 5 minutes to parse in Saxonica
(same time if performed as an xsl:import-schema or as a Java addSchema in
the factory).

Could this be due to redundant includes? Any pointers on how to debug
schema-parsing slowness?
 

I think the most likely cause of such poor performance is a very complex
content model for a complex type: the algorithms for building the finite
state machine are worse than linear (I forget the details) in relation to
the size of the grammar.

However, that's a guess. There's always a possibility of a performance bug
that's due to something quite trivial that can be easily fixed. So the best
thing is to send me the source and let me study what's going on internally.

I've also seen poor performance from the Java VM in compiling complex
regular expressions - however, I haven't seen that recently; it may have
been a performance bug in an early version of JDK 1.4.

I did try switching from JRE 1.4 to JRE 1.5 and noticed about double
improvement.  Still, parsing was taking a number of minutes.

Then I attacked the "xs:includes" in the schema files.  This schema is an
FPML-based model with about 40 different files that are made to be valid
either as simple models or as more complex and combined models.
The upshot is that many files are included redundantly. 

So I changed the model to a flat, top-level inclusion of files.  No more
nested includes.  The Schema parsing time dropped from several minutes
(at constant memory profile, by the way) to 2 seconds.  This is a pretty
dramatic improvement. 

I can send along the original pathological case in case that it's interesting.
I'm curious why redundant includes would produce such an exponential
increase in parsing time, but mostly I'm happy that the non-pathological
case is so snappy.

thanks

-alan