[Htmlparser-user] Help with Filters
Brought to you by:
derrickoswald
From: Claude D. <CD...@ar...> - 2002-06-11 16:05:38
|
Greetings. I have started developing a solution with the HTMLParser and wanted to ask about a few specifics. The application extracts text, the title and some metadata (author, description, keywords - if present) from HTML documents for indexing purposes. I have successfully written code to access the content, title, and meta information but now need to put it in context. To do this, I would like to recognize the BODY tag's start and end. If I understand the architecture correctly, HTMLParser should allow me to register a simple HTMLTagScanner, but since this is an abstract class and the existing scanners don't suit my purpose, I presume I need to implement a subclass. Can someone show me how to subclass HTMLTagScanner to watch for a specific tag? PS: I've found the design and implementation to be quit nice as I use it, very simple to apply in practice. If the download bundle include source I would probably have just taken a look. I'm not adverse to using CVS but the setup time is sometimes prohibitive. Having a source bundle for download might be useful in future distributions. Thanks. |