[Htmlparser-user] Transformation limitations?
Brought to you by:
derrickoswald
From: Tom H. <thj...@ri...> - 2007-12-06 08:19:30
|
I'm experimenting with the HtmlParser library to see if I can use it to transform webpages. One thing I'm trying is to see if I can inject some javascript into the HTML page. My test app uses the PrototypicalNodeFactory to register some overridden tags like MyHeadTag and MyBodyTag (which derive from the HeadTag and BodyTag classes respectively) and then I run the parser. I then locate the MyHeadTag object found during the parsing and do the following: ScriptTag script = new ScriptTag(); script.setAttribute("SRC", "blah.js"); script.setLanguage("javascript"); NodeList childNodes = headTag.getChildren(); childNodes.add(script); I then loop through the parser-generated listHtmlNodes calling toHtml() on each node and appending the result in a StringBuffer. But looking at the resulting StringBuffer contents, I see that the <script> tag is not terminated with a </script>: <html> <head> <title>Testing...</title> <SCRIPT LANGUAGE=javascript SRC="blah.js"> </head> <body> <p>Testing...</p> </body> </html> All the other tags that were in the original HTML file with end tags are fine. It is just the newly injected ScriptTag that is not properly terminated. This happens with any "container" tag I try to insert into the parser-generated "DOM" tree. Does anyone know why? Any hints on how to fix this? Is this an unreasonable thing to do with HtmlParser? thanks, Tom |