> I've just tried using Saxonica with a fairly complicated schema
> > that required about 5 minutes to parse in Saxonica
> > (same time if performed as an xsl:import-schema or as a Java addSchema=
> > in
> > the factory).
> > Could this be due to redundant includes? Any pointers on how to debug
> > schema-parsing slowness?
I think the most likely cause of such poor performance is a very complex
> content model for a complex type: the algorithms for building the finite
> state machine are worse than linear (I forget the details) in relation to
> the size of the grammar.
> However, that's a guess. There's always a possibility of a performance bu=
> that's due to something quite trivial that can be easily fixed. So the=20
> thing is to send me the source and let me study what's going on=20
> I've also seen poor performance from the Java VM in compiling complex
> regular expressions - however, I haven't seen that recently; it may have
> been a performance bug in an early version of JDK 1.4.
I did try switching from JRE 1.4 to JRE 1.5 and noticed about double
improvement. Still, parsing was taking a number of minutes.
Then I attacked the "xs:includes" in the schema files. This schema is an
FPML-based model with about 40 different files that are made to be valid
either as simple models or as more complex and combined models.
The upshot is that many files are included redundantly.=20
So I changed the model to a flat, top-level inclusion of files. No more
nested includes. The Schema parsing time dropped from several minutes
(at constant memory profile, by the way) to 2 seconds. This is a pretty
I can send along the original pathological case in case that it's=20
I'm curious why redundant includes would produce such an exponential=20
increase in parsing time, but mostly I'm happy that the non-pathological
case is so snappy.