Re: [Htmlparser-user] Excluding some tags
Brought to you by:
derrickoswald
From: Manish K. <ma...@we...> - 2010-11-17 06:44:40
|
Thanks for the revert Derrick. So, here's the real problem - I do want to retain the script tag. At the same time, I want to override all the links in the page. The parser doesn't play nice. Consider the scenario underneath for an html <script> > document.write("<a href='/jslink'>JS Link</a>") > </script> > <a href="/somelink">Some link</a> > To me the string literal inside script tag above is not a link at all. However, when I try to fetch all the <a> using the parser it would give me both of the above. Is there a way to not get the <a>s which are not in the <script> tag? Thanks Manish On Tue, Nov 16, 2010 at 11:39 PM, Derrick Oswald <der...@gm...>wrote: > Although the filter is correct, the tag enclosing the <script> tag is > accepted, and with it it's child tags - including the <script> tag. > Maybe a way to do it is to override the ScriptTag class with MyScriptTag so > that it prints nothing in the toHtml () call. > Add the overridden class to the PrototypicalNodeFactory as described > here: http://htmlparser.sourceforge.net/faq.html#composite, and then get > all tags and print the whole thing with System.out.println (this.parser.parse(null).toHtml > ()); > > On Tue, Nov 16, 2010 at 8:19 AM, Manish Kashyap <ma...@we...>wrote: > >> This indeed is a newbie question. I could not find a work around to >> exclude some tags (<script> in my case) while parsing. >> >> I tried using the NotFilter as underneath, but it didn't work as I got all >> the <script> tags in my NodeList - >> >>> NotFilter noScriptFilter = new NotFilter(); >>> noScriptFilter.setPredicate(new NodeFilter(){ >>> public boolean accept(Node currNode){ >>> if(currNode instanceof TagNode){ >>> if(((TagNode)currNode).getRawTagName().equalsIgnoreCase("script")){ >>> return true; >>> } >>> } >>> return false; >>> } >>> }); >>> NodeList allNodes = this.parser.parse(noScriptFilter); >>> >> >> Would appreciate if someone can guide me throgh this. >> >> Thanks >> Manish >> >> >> ------------------------------------------------------------------------------ >> Beautiful is writing same markup. Internet Explorer 9 supports >> standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3. >> Spend less time writing and rewriting code and more time creating great >> experiences on the web. Be a part of the beta today >> http://p.sf.net/sfu/msIE9-sfdev2dev >> _______________________________________________ >> Htmlparser-user mailing list >> Htm...@li... >> https://lists.sourceforge.net/lists/listinfo/htmlparser-user >> >> > > > ------------------------------------------------------------------------------ > Beautiful is writing same markup. Internet Explorer 9 supports > standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3. > Spend less time writing and rewriting code and more time creating great > experiences on the web. Be a part of the beta today > http://p.sf.net/sfu/msIE9-sfdev2dev > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |