Thread: [Htmlparser-user] Transformation limitations?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

I'm experimenting with the HtmlParser library to see if I can use it to 
transform webpages.  One thing I'm trying is to see if I can inject some 
javascript into the HTML page.

My test app uses the PrototypicalNodeFactory to register some overridden 
tags like MyHeadTag and MyBodyTag (which derive from the HeadTag and 
BodyTag classes respectively) and then I run the parser.  I then locate 
the MyHeadTag object found during the parsing and do the following:

   ScriptTag script = new ScriptTag();
   script.setAttribute("SRC", "blah.js");
   script.setLanguage("javascript");

   NodeList childNodes = headTag.getChildren();
   childNodes.add(script);

I then loop through the parser-generated listHtmlNodes calling toHtml() 
on each node and appending the result in a StringBuffer.

But looking at the resulting StringBuffer contents, I see that the 
<script> tag is not terminated with a </script>:

<html>
<head>
<title>Testing...</title>
<SCRIPT LANGUAGE=javascript SRC="blah.js">
</head>
<body>
<p>Testing...</p>
</body>
</html>

All the other tags that were in the original HTML file with end tags are 
fine.  It is just the newly injected ScriptTag that is not properly 
terminated.

This happens with any "container" tag I try to insert into the 
parser-generated "DOM" tree.

Does anyone know why?  Any hints on how to fix this?  Is this an 
unreasonable thing to do with HtmlParser?

thanks,
Tom

Thread: [Htmlparser-user] Transformation limitations?

htmlparser-user