I'm experimenting with the HtmlParser library to see if I can use it to
transform webpages. One thing I'm trying is to see if I can inject some
javascript into the HTML page.
My test app uses the PrototypicalNodeFactory to register some overridden
tags like MyHeadTag and MyBodyTag (which derive from the HeadTag and
BodyTag classes respectively) and then I run the parser. I then locate
the MyHeadTag object found during the parsing and do the following:
ScriptTag script = new ScriptTag();
script.setAttribute("SRC", "blah.js");
script.setLanguage("javascript");
NodeList childNodes = headTag.getChildren();
childNodes.add(script);
I then loop through the parser-generated listHtmlNodes calling toHtml()
on each node and appending the result in a StringBuffer.
But looking at the resulting StringBuffer contents, I see that the
<script> tag is not terminated with a </script>:
<html>
<head>
<title>Testing...</title>
<SCRIPT LANGUAGE=javascript SRC="blah.js">
</head>
<body>
<p>Testing...</p>
</body>
</html>
All the other tags that were in the original HTML file with end tags are
fine. It is just the newly injected ScriptTag that is not properly
terminated.
This happens with any "container" tag I try to insert into the
parser-generated "DOM" tree.
Does anyone know why? Any hints on how to fix this? Is this an
unreasonable thing to do with HtmlParser?
thanks,
Tom
|