Re: [Htmlparser-user] Excluding some tags
Brought to you by:
derrickoswald
|
From: Derrick O. <der...@gm...> - 2010-11-17 17:35:31
|
Tgat's not valid HTML. You'll want to turn strict script scanning off then.
On Wed, Nov 17, 2010 at 7:44 AM, Manish Kashyap <ma...@we...>wrote:
> Thanks for the revert Derrick. So, here's the real problem -
> I do want to retain the script tag. At the same time, I want to override
> all the links in the page. The parser doesn't play nice. Consider the
> scenario underneath for an html
>
> <script>
>> document.write("<a href='/jslink'>JS Link</a>")
>> </script>
>> <a href="/somelink">Some link</a>
>>
>
> To me the string literal inside script tag above is not a link at all.
> However, when I try to fetch all the <a> using the parser it would give me
> both of the above. Is there a way to not get the <a>s which are not in the
> <script> tag?
>
> Thanks
> Manish
>
> On Tue, Nov 16, 2010 at 11:39 PM, Derrick Oswald <der...@gm...
> > wrote:
>
>> Although the filter is correct, the tag enclosing the <script> tag is
>> accepted, and with it it's child tags - including the <script> tag.
>> Maybe a way to do it is to override the ScriptTag class with MyScriptTag
>> so that it prints nothing in the toHtml () call.
>> Add the overridden class to the PrototypicalNodeFactory as described
>> here: http://htmlparser.sourceforge.net/faq.html#composite, and then get
>> all tags and print the whole thing with System.out.println (this.parser.parse(null).toHtml
>> ());
>>
>> On Tue, Nov 16, 2010 at 8:19 AM, Manish Kashyap <ma...@we...>wrote:
>>
>>> This indeed is a newbie question. I could not find a work around to
>>> exclude some tags (<script> in my case) while parsing.
>>>
>>> I tried using the NotFilter as underneath, but it didn't work as I got
>>> all the <script> tags in my NodeList -
>>>
>>>> NotFilter noScriptFilter = new NotFilter();
>>>> noScriptFilter.setPredicate(new NodeFilter(){
>>>> public boolean accept(Node currNode){
>>>> if(currNode instanceof TagNode){
>>>>
>>>> if(((TagNode)currNode).getRawTagName().equalsIgnoreCase("script")){
>>>> return true;
>>>> }
>>>> }
>>>> return false;
>>>> }
>>>> });
>>>> NodeList allNodes = this.parser.parse(noScriptFilter);
>>>>
>>>
>>> Would appreciate if someone can guide me throgh this.
>>>
>>> Thanks
>>> Manish
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Beautiful is writing same markup. Internet Explorer 9 supports
>>> standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3.
>>> Spend less time writing and rewriting code and more time creating great
>>> experiences on the web. Be a part of the beta today
>>> http://p.sf.net/sfu/msIE9-sfdev2dev
>>> _______________________________________________
>>> Htmlparser-user mailing list
>>> Htm...@li...
>>> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Beautiful is writing same markup. Internet Explorer 9 supports
>> standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3.
>> Spend less time writing and rewriting code and more time creating great
>> experiences on the web. Be a part of the beta today
>> http://p.sf.net/sfu/msIE9-sfdev2dev
>> _______________________________________________
>> Htmlparser-user mailing list
>> Htm...@li...
>> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>>
>>
>
>
> ------------------------------------------------------------------------------
> Beautiful is writing same markup. Internet Explorer 9 supports
> standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3.
> Spend less time writing and rewriting code and more time creating great
> experiences on the web. Be a part of the beta today
> http://p.sf.net/sfu/msIE9-sfdev2dev
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>
|