Re: [Htmlparser-developer] Method to check if TextNode is just whitespace
Brought to you by:
derrickoswald
From: Ian M. <ian...@gm...> - 2005-11-03 12:44:59
|
Thanks for your reply, I wasn't suggesting trimming the actual text of the text nodes permanently, merely wondering if using the trim() method to see if the resulting string was empty would be sufficient, or whether we should also look for various white-space HTML entities (e.g. &tab; also) for purposes of determining this. Now I think about it some more, white space alone is probably what we want to do. If we want to get things like &tab; we ought to write some sort of method that would replace those types of HTML character references with the actual characters, if that's feasible. The only other question I've got - what do you all think should happen if the contents of the text node is null? Should it return true (because there's no characters), false (because it's not actually a white space String) or throw a NullPointerException (which would negate the value of this method by forcing the end-user to write lots of code to use this method)? Can a text node ever be null without the user changing the text ot be null? Ian String is immutable so String.trim().equals("") won't change the original String object. On 11/2/05, Axel <ax...@gm...> wrote: > On 11/1/05, Ian Macfarlane <ian...@gm...> wrote: > > I was thinking it might be worthwhile adding a method to Text/TextNode > > along the lines of: > > > > boolean isWhiteSpace() > > > > Which would return if the TextNode consisted of solely white space > > characters (or was the empty String). > > > > Now this could simply be done using String.trim().equals(""), however > > that wouldn't account for: > > > > - the non-breaking space character (#160) > > - The HTML code (also   as Firefox/IE do) > > - The HTML code   (also   as Firefox/IE do) > > > > So my question is, do you think should this method should treat those > > as spaces and remove/ignore them also for purposes of determining if > > the TextNode is white space? Or should it only trim normal whitespace > > (space, tab, carriage returns, etc). > I think, if every character (or entity converted to a > unicode-character) in the TextNode is true for > Character#isWhitespace() the boolean isWhiteSpace() should return > true; > IMO the TextNode shouldn't be trimmed automatically. Only a special > function should allow this to do. > > -- > Axel Kramer > http://www.plog4u.org - Wikipedia Eclipse Plugin > > > ------------------------------------------------------- > SF.Net email is sponsored by: > Tame your development challenges with Apache's Geronimo App Server. Downl= oad > it for free - -and be entered to win a 42" plasma tv or your very own > Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > |