Dear All:
I am currently using TagSoup with XOM to get XPath support as described here:
http://nicklothian.com/blog/2006/09/11/using-xpath-on-real-world-html-documents/
seems to work well except the following namespace problem:
http://www.supermind.org/blog/613/dom4j-xpath-tagsoup-namespaces-sweet
I noticed HTMLParser is, in my test, the fastest available, and has SAX Parser support:
http://htmlparser.sourceforge.net/javadoc/org/htmlparser/sax/package-summary.html
Has anyone used this with XOM? Any luck? Is it better/worse (i.e., slower/faster) than Tagsoup or other alternatives?
Thank you
Misha
|