I am evaluating the HTMLParser for suitability in my project. I have
gone through the Javadocs and observed that there is support only for
some tags. For example IMAGE, LINK, SCRIPT etc. I wanted to know that
suppose I want to support another tag say <INPUT> will I need to write
my own tag-scanner pair? And if I need to write it how do I do it?
You could do it in two ways -=20
[1] Handle it directly at in your application - like this :
HTMLNode node; HTMLTag tag;
for (Enumeration e =3D parser.elements();e.hasMoreElements();) {
node =3D (HTMLNode)e.nextElement();
if (node instanceof HTMLTag) {
tag =3D (HTMLTag)node;
if (tag.getText().indexOf("INPUT")=3D=3D0) {
// Its an input tag
// Your code here
// The conditional above can be made more robust..
} =20
}
}
[2] Write your own scanner-tag pair. This is very easy for the input =
tag, as no additional parsing is needed. Have a HTMLInputTagScanner =
extends HTMLTagScanner. Implement evaluate() - when should this tag =
scanner activate ? i.e. when the string contains INPUT in a certain =
location (first location). The HTMLTagScanner has some utility methods, =
like absorbLeadingBlanks() - which you should do to make the checkign =
simpler and more robust.
The scan method is given control when you are evaluate has returned =
true. You have to create an object of type HTMLInputTag (extends =
HTMLTag), and this is really very simple. Not much in your tag changes, =
so use the interface to extract data and create the input tag object. To =
see how easy this can be, look at HTMLMetaTagScanner.
Finally, make sure that you dont have HTMLFormScanner registered. =
Because, HTMLFormScanner automatically picks up the Input tags. =
Actually, we have taken out HTMLFormScanner from the default registry, =
because its very hard to auto-correct a complete form block - we dont =
know how to predict when a form has ended. So this shouldnt be a problem =
at all for you. The class is just there for people who need to parse =
forms (some of the users of the parser are using it).
Cheers,
Somik
|