Thread: Re: [Htmlparser-user] Transformation limitations?
Brought to you by:
derrickoswald
From: Derrick O. <der...@ro...> - 2007-12-06 11:44:04
|
You will need to add your own end tag to the script tag you are injecting. I believe it's something like this: TagNode end = new TagNode (); end.setTagName ("/SCRIPT"); script.setEndTag (end) I guess this could be made much easier. ----- Original Message ---- From: Tom Hjellming <thj...@ri...> To: htm...@li... Sent: Thursday, December 6, 2007 3:19:22 AM Subject: [Htmlparser-user] Transformation limitations? I'm experimenting with the HtmlParser library to see if I can use it to transform webpages. One thing I'm trying is to see if I can inject some javascript into the HTML page. My test app uses the PrototypicalNodeFactory to register some overridden tags like MyHeadTag and MyBodyTag (which derive from the HeadTag and BodyTag classes respectively) and then I run the parser. I then locate the MyHeadTag object found during the parsing and do the following: ScriptTag script = new ScriptTag(); script.setAttribute("SRC", "blah.js"); script.setLanguage("javascript"); NodeList childNodes = headTag.getChildren(); childNodes.add(script); I then loop through the parser-generated listHtmlNodes calling toHtml() on each node and appending the result in a StringBuffer. But looking at the resulting StringBuffer contents, I see that the <script> tag is not terminated with a </script>: <html> <head> <title>Testing...</title> <SCRIPT LANGUAGE=javascript SRC="blah.js"> </head> <body> <p>Testing...</p> </body> </html> All the other tags that were in the original HTML file with end tags are fine. It is just the newly injected ScriptTag that is not properly terminated. This happens with any "container" tag I try to insert into the parser-generated "DOM" tree. Does anyone know why? Any hints on how to fix this? Is this an unreasonable thing to do with HtmlParser? thanks, Tom ------------------------------------------------------------------------- SF.Net email is sponsored by: The Future of Linux Business White Paper from Novell. From the desktop to the data center, Linux is going mainstream. Let it simplify your IT future. http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4 _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: <at...@gm...> - 2007-12-06 19:10:43
|
Hi, i need some help with the TagNameFilter. I have a function to get all the p tags out a html document. NodeList nl =3D parser.extractAllNodesThatMatch(new TagNameFilter("p"));= But now i want to filter from the NodeList all entries that do not match= a = special string. I guess the key would be the "accept() function" but im unsure how to = implement it(well the string compare etc is clear but the usage of the = accept() + Tag.class ). And furthermore i have problems with doubled entries because of nested p= = tags. Thanks Alex |
From: Tom H. <thj...@ri...> - 2007-12-06 19:25:20
|
Hi Derrick, That doesn't sound too bad at all to me. I created a single utility function that handles that for any tag I need to inject: void setEndTag(Tag tag) { TagNode endTag = new TagNode(); endTag.setTagName("/" + tag.getTagName()); tag.setEndTag(endTag); } That works like a charm! Thanks, Tom Derrick Oswald wrote: > You will need to add your own end tag to the script tag you are > injecting. I believe it's something like this: > TagNode end = new TagNode (); > end.setTagName ("/SCRIPT"); > script.setEndTag (end) > > I guess this could be made much easier. > > ----- Original Message ---- > From: Tom Hjellming <thj...@ri...> > To: htm...@li... > Sent: Thursday, December 6, 2007 3:19:22 AM > Subject: [Htmlparser-user] Transformation limitations? > > I'm experimenting with the HtmlParser library to see if I can use it to > transform webpages. One thing I'm trying is to see if I can inject some > javascript into the HTML page. > > My test app uses the PrototypicalNodeFactory to register some overridden > tags like MyHeadTag and MyBodyTag (which derive from the HeadTag and > BodyTag classes respectively) and then I run the parser. I then locate > the MyHeadTag object found during the parsing and do the following: > > ScriptTag script = new ScriptTag(); > script.setAttribute("SRC", "blah.js"); > script.setLanguage("javascript"); > > NodeList childNodes = headTag.getChildren(); > childNodes.add(script); > > I then loop through the parser-generated listHtmlNodes calling toHtml() > on each node and appending the result in a StringBuffer. > > But looking at the resulting StringBuffer contents, I see that the > <script> tag is not terminated with a </script>: > > <html> > <head> > <title>Testing...</title> > <SCRIPT LANGUAGE=javascript SRC="blah.js"> > </head> > <body> > <p>Testing...</p> > </body> > </html> > > All the other tags that were in the original HTML file with end tags are > fine. It is just the newly injected ScriptTag that is not properly > terminated. > > This happens with any "container" tag I try to insert into the > parser-generated "DOM" tree. > > Does anyone know why? Any hints on how to fix this? Is this an > unreasonable thing to do with HtmlParser? > > thanks, > Tom > > > ------------------------------------------------------------------------- > SF.Net email is sponsored by: The Future of Linux Business White Paper > from Novell. From the desktop to the data center, Linux is going > mainstream. Let it simplify your IT future. > http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4 > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > <mailto:Htm...@li...> > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > SF.Net email is sponsored by: The Future of Linux Business White Paper > from Novell. From the desktop to the data center, Linux is going > mainstream. Let it simplify your IT future. > http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4 > ------------------------------------------------------------------------ > > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |