Re: [Htmlparser-user] Parsing query
Brought to you by:
derrickoswald
From: Somik R. <so...@ya...> - 2002-08-01 02:33:09
|
I am evaluating the HTMLParser for suitability in my project. I have gone through the Javadocs and observed that there is support only for some tags. For example IMAGE, LINK, SCRIPT etc. I wanted to know that suppose I want to support another tag say <INPUT> will I need to write my own tag-scanner pair? And if I need to write it how do I do it? You could do it in two ways -=20 [1] Handle it directly at in your application - like this : HTMLNode node; HTMLTag tag; for (Enumeration e =3D parser.elements();e.hasMoreElements();) { node =3D (HTMLNode)e.nextElement(); if (node instanceof HTMLTag) { tag =3D (HTMLTag)node; if (tag.getText().indexOf("INPUT")=3D=3D0) { // Its an input tag // Your code here // The conditional above can be made more robust.. } =20 } } [2] Write your own scanner-tag pair. This is very easy for the input = tag, as no additional parsing is needed. Have a HTMLInputTagScanner = extends HTMLTagScanner. Implement evaluate() - when should this tag = scanner activate ? i.e. when the string contains INPUT in a certain = location (first location). The HTMLTagScanner has some utility methods, = like absorbLeadingBlanks() - which you should do to make the checkign = simpler and more robust. The scan method is given control when you are evaluate has returned = true. You have to create an object of type HTMLInputTag (extends = HTMLTag), and this is really very simple. Not much in your tag changes, = so use the interface to extract data and create the input tag object. To = see how easy this can be, look at HTMLMetaTagScanner. Finally, make sure that you dont have HTMLFormScanner registered. = Because, HTMLFormScanner automatically picks up the Input tags. = Actually, we have taken out HTMLFormScanner from the default registry, = because its very hard to auto-correct a complete form block - we dont = know how to predict when a form has ended. So this shouldnt be a problem = at all for you. The class is just there for people who need to parse = forms (some of the users of the parser are using it). Cheers, Somik |