My interest is integrating TagSoup/Tidy with Saxon-B in an open-source application (a Firefox extension) and I have no idea how or if I could script the command line along with using the API.

I do see that the TagSoup site has a combined version, but it says that due to a bug in Java 5 or 6, one must use Saxon 6.5.5... Not sure what trade-offs that entails, though I'd of course like to incorporate the latest version if possible...


Michael Kay wrote:
tagsoup has a version of Saxon which incorporates the tagsoup 
parser, you can use 'saxon' as normal, and get the benefits 
of treating bad html as wellformed xml.

In fact, you can use the standard Saxon distribution: it's possible to
nominate TagSoup as the XML source parser using the -x option on the Saxon
command line.

Michael Kay

