I mentioned how to do it via the command line, but of course there's underlying support in the Java API as well. The approach that gives most control is to supply the Source of the transformation as a SAXSource; this contains an XMLReader which can of course be an instance of the TagSoup XMLReader. Using this approach you can configure the TagSoup XMLReader any way you like before invoking Saxon to do the transformation.
Michael Kay

From: saxon-help-bounces@lists.sourceforge.net [mailto:saxon-help-bounces@lists.sourceforge.net] On Behalf Of Brett Zamir
Sent: 18 January 2008 11:05
To: Mailing list for SAXON XSLT queries
Subject: Re: [saxon] Feature request

Hello all,

Thanks for the help.

My interest is integrating TagSoup/Tidy with Saxon-B in an open-source application (a Firefox extension) and I have no idea how or if I could script the command line along with using the API.

I do see that the TagSoup site has a combined version, but it says that due to a bug in Java 5 or 6, one must use Saxon 6.5.5... Not sure what trade-offs that entails, though I'd of course like to incorporate the latest version if possible...


Michael Kay wrote:
tagsoup has a version of Saxon which incorporates the tagsoup 
parser, you can use 'saxon' as normal, and get the benefits 
of treating bad html as wellformed xml.

In fact, you can use the standard Saxon distribution: it's possible to
nominate TagSoup as the XML source parser using the -x option on the Saxon
command line.

Michael Kay

This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
saxon-help mailing list