Re: [Htmlparser-developer] Method to check if TextNode is just whitespace
Brought to you by:
derrickoswald
From: Derrick O. <Der...@Ro...> - 2005-11-03 14:28:06
|
Conversion of character references like is already performed by the util.Translate class. There is no &tab; character reference as far as I'm aware (see http://www.w3.org/TR/REC-html40/sgml/entities.html). Ian Macfarlane wrote: >Thanks for your reply, > >I wasn't suggesting trimming the actual text of the text nodes >permanently, merely wondering if using the trim() method to see if the >resulting string was empty would be sufficient, or whether we should >also look for various white-space HTML entities (e.g. &tab; also) for >purposes of determining this. > >Now I think about it some more, white space alone is probably what we >want to do. If we want to get things like &tab; we ought to write some >sort of method that would replace those types of HTML character >references with the actual characters, if that's feasible. > >The only other question I've got - what do you all think should happen >if the contents of the text node is null? Should it return true >(because there's no characters), false (because it's not actually a >white space String) or throw a NullPointerException (which would >negate the value of this method by forcing the end-user to write lots >of code to use this method)? Can a text node ever be null without the >user changing the text ot be null? > >Ian > >String is immutable so String.trim().equals("") won't change the >original String object. > >On 11/2/05, Axel <ax...@gm...> wrote: > > >>On 11/1/05, Ian Macfarlane <ian...@gm...> wrote: >> >> >>>I was thinking it might be worthwhile adding a method to Text/TextNode >>>along the lines of: >>> >>>boolean isWhiteSpace() >>> >>>Which would return if the TextNode consisted of solely white space >>>characters (or was the empty String). >>> >>>Now this could simply be done using String.trim().equals(""), however >>>that wouldn't account for: >>> >>>- the non-breaking space character (#160) >>>- The HTML code (also   as Firefox/IE do) >>>- The HTML code   (also   as Firefox/IE do) >>> >>>So my question is, do you think should this method should treat those >>>as spaces and remove/ignore them also for purposes of determining if >>>the TextNode is white space? Or should it only trim normal whitespace >>>(space, tab, carriage returns, etc). >>> >>> >>I think, if every character (or entity converted to a >>unicode-character) in the TextNode is true for >>Character#isWhitespace() the boolean isWhiteSpace() should return >>true; >>IMO the TextNode shouldn't be trimmed automatically. Only a special >>function should allow this to do. >> >>-- >>Axel Kramer >>http://www.plog4u.org - Wikipedia Eclipse Plugin >> >> >>------------------------------------------------------- >>SF.Net email is sponsored by: >>Tame your development challenges with Apache's Geronimo App Server. Download >>it for free - -and be entered to win a 42" plasma tv or your very own >>Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php >>_______________________________________________ >>Htmlparser-developer mailing list >>Htm...@li... >>https://lists.sourceforge.net/lists/listinfo/htmlparser-developer >> >> >> > > >------------------------------------------------------- >SF.Net email is sponsored by: >Tame your development challenges with Apache's Geronimo App Server. Download >it for free - -and be entered to win a 42" plasma tv or your very own >Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php >_______________________________________________ >Htmlparser-developer mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > > |