Re: [Htmlparser-user] HTMLParser: package.
Brought to you by:
derrickoswald
From: Somik R. <so...@ya...> - 2003-01-25 18:48:21
|
> I came across this interesting tool for parsing HTML > files using HTMLparser and have a couple of questions. > > Q1. How is better than XML parsers, SAX parsers etc? There aer some good SAX parsers out there - but they are for parsing XML, not HTML. HTMLParser is primarily a tool for parsing HTML - and HTML is usually dirty, with no end tags, etc. Of late, we have started using HTMLParser to parse XML, and it is useful as it is so compact and very fast. However, keep in mind that HTMLParser is a tolerating parser. It cannot tell you if your xml file has errors (at least not yet) which most SAX parsers do. > Q2. Does it focus primary on HTML files or generic to > other files as well ? HTML files primarily. Lots of folks use it in their search engines, crawlers.. I use it for unit testing html (actually unit testing xsl stylesheets) for web applications at my workplace. > Q3. Where can I download the package for HTMLparser ? http://htmlparser.sourceforge.net - there is a download link. You are advised to go with an integration release, as HTMLParser is a 100% tested project. We do not add new bugs with every new release (we try not to and succeed to a good extent). > I get the following message while compiling a sample > program: > LinkExtractor.java:3: package org.htmlparser does not > exist import org.htmlparser.HTMLParser; Remove the package name if you are compiling it as your application. Bytway, this is already present in the parser, so when you download it, you shouldn't face any problems. Regards, Somik |