Re: [Htmlparser-developer] HTML Comments/Remarks
Brought to you by:
derrickoswald
From: Sam J. <ga...@yh...> - 2002-12-12 13:40:21
|
Hi Somik, Somik Raha wrote: >>Thanks for the help. I think I would like to see the >>toPlainTextString() method remain. Although I'm not quite sure of the >>difference between HTMLRemarkNode.toString and >>HTMLRemarkNode.toPlainTextString. >> >> > >This is actually based on your suggestion (eons back..) - >toPlainTextString() is the uniform way of getting string representation of a >page - meaningful and hopefully semantic data. I think you'd probably want >to use toPlainTextString() instead of toString() - as toString() always >gives some output for all the tags, while toPlainTextString() works only for >specific ones like string nodes, link text and strings inside forms. It was >also enabled earlier for comments, but was taken out last week. I am >thinking of putting it back in. What this will mean is that if folks have >commented tags - you will get that sort of data in your string filter. I >think you can live with that (?) > >Also - I am thinking of a better approach - wherein, should one require pure >strings within a comment, one could create a new parser, that operates on >the contents of the string node (it would be an interesting approach to >try..) > I'm not sure that I'm following you. But then its late here .... It would seem that whatever other considerations there might be one would want to have some method on HTMLRemarkNode that allows you to grab the pure unadulterated text of the remark without anything else. The HTMLRemarkNode.toString() method I'm using now seems to be appending the string "Comment Tag :" to the front of the string that is returned. Its nice to have convenience methods to pretty print things. But shouldn't the two default methods on any node be to: 1. return the original HTML 2. return the text appearing within it that is not a default part of the tag Naturally there will be variation depending on the node, but it seems odd to have prettified print responses as the default (maybe they're not and I'm just getting confused) - ideally they would be called with a parameter or special method like prettyPrint(). I'm not sure what the downside is to having a toPlainTextString() call in the HTMLRemarkNode. Remember I don't have such a wonderful understanding of the HTMLParser itself. For example I'm not sure what you mean when you say that the remark text data would appear in your string filter. I'm not sure what a string filter is ... At the moment it seems I have to explicitly check for HTMLRemarkNodes and then process them if I want to .... CHEERS> SAM |