Re: [Htmlparser-developer] HTML Comments/Remarks

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Sam,

> I'm not sure what the downside is to having a toPlainTextString() call
> in the HTMLRemarkNode.  Remember I don't have such a wonderful
> understanding of the HTMLParser itself.  For example I'm not sure what
> you mean when you say that the remark text data would appear in your
> string filter.  I'm not sure what a string filter is ...   At the moment
> it seems I have to explicitly check for HTMLRemarkNodes and then process
> them if I want to ....

A string filter is a program that filters a page and gives you the string.
To create a string filter, I'd write a loop that calls
node.toPlainTextString(). There is no downside to having toPlainTextString()
implemented in HTMLRemarkNode. It was there till last week - I took it off
on an incorrect notion - bcos very often, people comment HTML tags, and
thats not "plain-text", and that shows up in the toPlainTextString() method
of HTMLRemarkNode.

Once we put this functionality back in, you wont need to check
HTMLRemarkNodes explicitly.
What might have sounded confusing in my earlier mail was that - if tags are
present inside the HTMLRemarkNode - as :
<!--
    <sometag>
    <someothertag>
    ...
    Text
    <blah></blah>
-->
we could recursively parse it to get the actual text - but that is a
developer's flight of fancy - you can ignore that...

Regards,
Somik