Re: [Htmlparser-developer] Method to check if TextNode is just whitespace
Brought to you by:
derrickoswald
|
From: Ian M. <ian...@gm...> - 2005-11-03 12:44:59
|
Thanks for your reply,
I wasn't suggesting trimming the actual text of the text nodes
permanently, merely wondering if using the trim() method to see if the
resulting string was empty would be sufficient, or whether we should
also look for various white-space HTML entities (e.g. &tab; also) for
purposes of determining this.
Now I think about it some more, white space alone is probably what we
want to do. If we want to get things like &tab; we ought to write some
sort of method that would replace those types of HTML character
references with the actual characters, if that's feasible.
The only other question I've got - what do you all think should happen
if the contents of the text node is null? Should it return true
(because there's no characters), false (because it's not actually a
white space String) or throw a NullPointerException (which would
negate the value of this method by forcing the end-user to write lots
of code to use this method)? Can a text node ever be null without the
user changing the text ot be null?
Ian
String is immutable so String.trim().equals("") won't change the
original String object.
On 11/2/05, Axel <ax...@gm...> wrote:
> On 11/1/05, Ian Macfarlane <ian...@gm...> wrote:
> > I was thinking it might be worthwhile adding a method to Text/TextNode
> > along the lines of:
> >
> > boolean isWhiteSpace()
> >
> > Which would return if the TextNode consisted of solely white space
> > characters (or was the empty String).
> >
> > Now this could simply be done using String.trim().equals(""), however
> > that wouldn't account for:
> >
> > - the non-breaking space character (#160)
> > - The HTML code (also   as Firefox/IE do)
> > - The HTML code   (also   as Firefox/IE do)
> >
> > So my question is, do you think should this method should treat those
> > as spaces and remove/ignore them also for purposes of determining if
> > the TextNode is white space? Or should it only trim normal whitespace
> > (space, tab, carriage returns, etc).
> I think, if every character (or entity converted to a
> unicode-character) in the TextNode is true for
> Character#isWhitespace() the boolean isWhiteSpace() should return
> true;
> IMO the TextNode shouldn't be trimmed automatically. Only a special
> function should allow this to do.
>
> --
> Axel Kramer
> http://www.plog4u.org - Wikipedia Eclipse Plugin
>
>
> -------------------------------------------------------
> SF.Net email is sponsored by:
> Tame your development challenges with Apache's Geronimo App Server. Downl=
oad
> it for free - -and be entered to win a 42" plasma tv or your very own
> Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
> _______________________________________________
> Htmlparser-developer mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
>
|