Re: [Htmlparser-developer] Method to check if TextNode is just whitespace

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Conversion of character references like &nbsp; is already performed by 
the util.Translate class.
There is no &tab; character reference as far as I'm aware (see 
http://www.w3.org/TR/REC-html40/sgml/entities.html).

Ian Macfarlane wrote:

>Thanks for your reply,
>
>I wasn't suggesting trimming the actual text of the text nodes
>permanently, merely wondering if using the trim() method to see if the
>resulting string was empty would be sufficient, or whether we should
>also look for various white-space HTML entities (e.g. &tab; also) for
>purposes of determining this.
>
>Now I think about it some more, white space alone is probably what we
>want to do. If we want to get things like &tab; we ought to write some
>sort of method that would replace those types of HTML character
>references with the actual characters, if that's feasible.
>
>The only other question I've got - what do you all think should happen
>if the contents of the text node is null? Should it return true
>(because there's no characters), false (because it's not actually a
>white space String) or throw a NullPointerException (which would
>negate the value of this method by forcing the end-user to write lots
>of code to use this method)? Can a text node ever be null without the
>user changing the text ot be null?
>
>Ian
>
>String is immutable so String.trim().equals("") won't change the
>original String object.
>
>On 11/2/05, Axel <ax...@gm...> wrote:
>  
>
>>On 11/1/05, Ian Macfarlane <ian...@gm...> wrote:
>>    
>>
>>>I was thinking it might be worthwhile adding a method to Text/TextNode
>>>along the lines of:
>>>
>>>boolean isWhiteSpace()
>>>
>>>Which would return if the TextNode consisted of solely white space
>>>characters (or was the empty String).
>>>
>>>Now this could simply be done using String.trim().equals(""), however
>>>that wouldn't account for:
>>>
>>>- the non-breaking space character (#160)
>>>- The HTML code &nbsp; (also &nbsp as Firefox/IE do)
>>>- The HTML code &#160; (also &#160 as Firefox/IE do)
>>>
>>>So my question is, do you think should this method should treat those
>>>as spaces and remove/ignore them also for purposes of determining if
>>>the TextNode is white space? Or should it only trim normal whitespace
>>>(space, tab, carriage returns, etc).
>>>      
>>>
>>I think, if every character (or entity converted to a
>>unicode-character) in the TextNode is true for
>>Character#isWhitespace() the boolean isWhiteSpace() should return
>>true;
>>IMO the TextNode shouldn't be trimmed automatically. Only a special
>>function should allow this to do.
>>
>>--
>>Axel Kramer
>>http://www.plog4u.org - Wikipedia Eclipse Plugin
>>
>>
>>-------------------------------------------------------
>>SF.Net email is sponsored by:
>>Tame your development challenges with Apache's Geronimo App Server. Download
>>it for free - -and be entered to win a 42" plasma tv or your very own
>>Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
>>_______________________________________________
>>Htmlparser-developer mailing list
>>Htm...@li...
>>https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
>>
>>    
>>
>
>
>-------------------------------------------------------
>SF.Net email is sponsored by:
>Tame your development challenges with Apache's Geronimo App Server. Download
>it for free - -and be entered to win a 42" plasma tv or your very own
>Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
>_______________________________________________
>Htmlparser-developer mailing list
>Htm...@li...
>https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
>
>  
>