Re: [Htmlparser-user] Excluding some tags
Brought to you by:
derrickoswald
From: Manish K. <ma...@we...> - 2010-11-17 06:46:34
|
Sorry i modify my question ignore the previous one. Is there a way to get the <a>s which are not in the <script> tag? Thanks, MAnish On Wed, Nov 17, 2010 at 12:14 PM, Manish Kashyap <ma...@we...>wrote: > Thanks for the revert Derrick. So, here's the real problem - > I do want to retain the script tag. At the same time, I want to override > all the links in the page. The parser doesn't play nice. Consider the > scenario underneath for an html > > <script> >> document.write("<a href='/jslink'>JS Link</a>") >> </script> >> <a href="/somelink">Some link</a> >> > > To me the string literal inside script tag above is not a link at all. > However, when I try to fetch all the <a> using the parser it would give me > both of the above. Is there a way to not get the <a>s which are not in the > <script> tag? > > Thanks > Manish > > > On Tue, Nov 16, 2010 at 11:39 PM, Derrick Oswald <der...@gm... > > wrote: > >> Although the filter is correct, the tag enclosing the <script> tag is >> accepted, and with it it's child tags - including the <script> tag. >> Maybe a way to do it is to override the ScriptTag class with MyScriptTag >> so that it prints nothing in the toHtml () call. >> Add the overridden class to the PrototypicalNodeFactory as described >> here: http://htmlparser.sourceforge.net/faq.html#composite, and then get >> all tags and print the whole thing with System.out.println (this.parser.parse(null).toHtml >> ()); >> >> On Tue, Nov 16, 2010 at 8:19 AM, Manish Kashyap <ma...@we...>wrote: >> >>> This indeed is a newbie question. I could not find a work around to >>> exclude some tags (<script> in my case) while parsing. >>> >>> I tried using the NotFilter as underneath, but it didn't work as I got >>> all the <script> tags in my NodeList - >>> >>>> NotFilter noScriptFilter = new NotFilter(); >>>> noScriptFilter.setPredicate(new NodeFilter(){ >>>> public boolean accept(Node currNode){ >>>> if(currNode instanceof TagNode){ >>>> >>>> if(((TagNode)currNode).getRawTagName().equalsIgnoreCase("script")){ >>>> return true; >>>> } >>>> } >>>> return false; >>>> } >>>> }); >>>> NodeList allNodes = this.parser.parse(noScriptFilter); >>>> >>> >>> Would appreciate if someone can guide me throgh this. >>> >>> Thanks >>> Manish >>> >>> >>> ------------------------------------------------------------------------------ >>> Beautiful is writing same markup. Internet Explorer 9 supports >>> standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3. >>> Spend less time writing and rewriting code and more time creating great >>> experiences on the web. Be a part of the beta today >>> http://p.sf.net/sfu/msIE9-sfdev2dev >>> _______________________________________________ >>> Htmlparser-user mailing list >>> Htm...@li... >>> https://lists.sourceforge.net/lists/listinfo/htmlparser-user >>> >>> >> >> >> ------------------------------------------------------------------------------ >> Beautiful is writing same markup. Internet Explorer 9 supports >> standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3. >> Spend less time writing and rewriting code and more time creating great >> experiences on the web. Be a part of the beta today >> http://p.sf.net/sfu/msIE9-sfdev2dev >> _______________________________________________ >> Htmlparser-user mailing list >> Htm...@li... >> https://lists.sourceforge.net/lists/listinfo/htmlparser-user >> >> > |