I haven't confirmed this, but my suspicion is that Ant is supplying Saxon with a SAXSource in which the XMLReader is already initialized, in which case Saxon will use that XMLReader.

Is it possible to investigate this by setting the system property jaxp.debug=1? This will show whether the parser is being located using the JAXP search mechanism. If this is the case, it almost certainly means that Ant rather than Saxon is loading the XML parser.

Another useful experiment would be to see whether other Saxon configuration properties work, e.g. set http://saxon.sf.net/feature/timing to "true".

A workaround would probably be to use the exec task to run Saxon, rather than the xslt task.

Michael Kay
Saxonica

On 08/01/2013 14:32, Jirka Kosek wrote:
Hi,

I need to process HTML sources with Saxon. I have no problems reading
HTML sources when I specify alternative parser (tagsoup from John Cowan
or HTML parser from Henri Sivonen) using -x command-line option.

However when I try to specify same inside Ant it seems that switching of
parser is ignored by Saxon:

    <xslt in="in.html" out="out.xml" style="test.xsl" force="true">
      <factory name="net.sf.saxon.TransformerFactoryImpl">
        <attribute name="http://saxon.sf.net/feature/sourceParserClass"
value="org.ccil.cowan.tagsoup.Parser"/>
      </factory>
      <classpath location="${tagsoup.jar}"/>
      <classpath location="${saxon9.jar}"/>
    </xslt>

Such transformation fails on the first WF error as normal XML parser is
used instead of HTML parser/tagsoup. In Ant debug log there is nothing
suspicious.

I tried to specify alternative parser in config file and reference
config file from Ant using:

    <xslt in="in.html" out="out.xml" style="test.xsl" force="true">
      <factory name="net.sf.saxon.TransformerFactoryImpl">
        <attribute name="http://saxon.sf.net/feature/configuration-file"
value="config.xml"/>
      </factory>
      <classpath location="${tagsoup.jar}"/>
      <classpath location="${saxon9.jar}"/>
    </xslt>

However this resulted in the error message:

java.lang.IllegalArgumentException: Unknown configuration option
http://saxon.sf.net/feature/configuration-file
        at
net.sf.saxon.Configuration.setConfigurationProperty(Configuration.java:4044)
        at
net.sf.saxon.TransformerFactoryImpl.setAttribute(TransformerFactoryImpl.java:268)
        at
org.apache.tools.ant.taskdefs.optional.TraXLiaison.getFactory(TraXLiaison.java:424)

So it seems while this is documented at
http://www.saxonica.com/documentation/configuration/config-features.xml
it doesn't work.

Out of curiosity I have tried my simple config file:

<configuration xmlns="http://saxon.sf.net/ns/configuration"
  edition="HE">
  <global
    sourceParser="org.ccil.cowan.tagsoup.Parser"
  />
</configuration>

with command line (using -config option) and parser option was ignored,
normal XML parser was used instead for source document.

I was using latest release of HE (Java version). Is this limitation of
HE or bug or have I missed something?

TIA,

					Jirka




------------------------------------------------------------------------------
Master SQL Server Development, Administration, T-SQL, SSAS, SSIS, SSRS
and more. Get SQL Server skills now (including 2012) with LearnDevNow -
200+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only - learn more at:
http://p.sf.net/sfu/learnmore_122512


_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
saxon-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/saxon-help