Thread: [Htmlparser-user] Help with Filters

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Greetings. I have started developing a solution with the HTMLParser and
wanted to ask about a few specifics.

The application extracts text, the title and some metadata (author,
description, keywords - if present) from HTML documents for indexing
purposes. I have successfully written code to access the content, title,
and meta information but now need to put it in context. To do this, I
would like to recognize the BODY tag's start and end. If I understand
the architecture correctly, HTMLParser should allow me to register a
simple HTMLTagScanner, but since this is an abstract class and the
existing scanners don't suit my purpose, I presume I need to implement a
subclass.

Can someone show me how to subclass HTMLTagScanner to watch for a
specific tag?

PS: I've found the design and implementation to be quit nice as I use
it, very simple to apply in practice. If the download bundle include
source I would probably have just taken a look. I'm not adverse to using
CVS but the setup time is sometimes prohibitive. Having a source bundle
for download might be useful in future distributions. Thanks.

Thread: [Htmlparser-user] Help with Filters

htmlparser-user