Menu

#86 string.trimWhitespace misses some chars

open
nobody
Other (15)
5
2006-05-29
2006-05-29
No

string.trimWhitespace ignores:
vtab, char(11) - which is FileMaker's "newline"
ff , char(12) - form feed, sometimes a page break

and nonBreakingSpace
Mac: char(202)
Win: char(160)

Note:
vtab = "vertical" tab

I realize that "fixing" this could break current
scripts, though I suspect that's rare. One
observation: on the Mac, the regex (builtins.re)
whitespace token \s does NOT match vtab but DOES match
nonBreakingSpace. I should test on Windows too.... My
recollection is vague, but I think that's true for Perl
too, i.e. it's more likely to be a "bug" in the spec
rather than in the regex library that Frontier uses.
Of course that would be easy enough to test with a
small Perl script....

If PCRE and Perl really do omit vtab, one could make a
good case for leaving vtab out of
string.trimWhitespace. There are some benefits to "bug
for bug compatible" (even though this verb isn't
implemented using regex). And, despite the use of vtab
for FileMaker, there may well be cases where there's a
tangible benefit to treating vtab as something other
than whitespace. (Though, until I see those cases, I
still consider it a "spec bug" ... or of course faulty
memory on my part!)

Discussion

  • Seth Dillingham

    Seth Dillingham - 2006-05-30

    Logged In: YES
    user_id=1171838

    I'd be happy to fix this *right this minute* by adding a
    couple of optional params to add support for treating vtabs
    and non-breaking spaces as white space.

    However: in our cross-platform, multi-encoding world, I'm
    not completely sure about the right way to handle this. The
    vtab is easy, it's the same in almost all character sets
    (encodings). Handling the non-breaking space isn't so easy.
    It can't just test the platform, because I frequently have
    text on my Mac which started in Windows, and vice versa.

    So in other words, if string.trimWhiteSpace on the mac
    treated 160 as a non-breaking space, but the text I'm
    working with actually came from an email or web-submission
    that started on a PC, it's going to delete characters that
    really are NOT white space.

    Perhaps another optional param to specify the character set?
    So the declaration line in the glue script would look like this:

    on trimWhiteSpace( s, flVtab = false, charset = "" )

    ... so if you didn't specify the character set, it would
    assume it was platform native.

    How does that look? (Anybody?)

    Seth

    P.S. Incidentally, Karsten is right. The string type does
    need some additional properties. If our strings specified
    the encoding, this problem would be much more easily solved.

     
  • PreFab Software

    PreFab Software - 2006-05-30

    Logged In: YES
    user_id=1413973

    Did you happen to test Perl with vtab? (I haven't done Perl
    in ages or I would try to figure out the one-liner.)
    Assuming, Perl omits it, I think an optional param (default
    to false) is fine for Frontier.

    As for nonBreakingSpace, I agree that a param would be good
    -- and defaulting to the native platform is of course important.

    Just for completeness: I think ff should be added without
    any params -- or with the default set to true. Then the
    defaults will match Perl.

     
  • Karsten Wolf

    Karsten Wolf - 2006-05-30

    Logged In: YES
    user_id=201017

    Hi Scott,
    just out of curiosity: which version of FileMaker do you use?

    Hi Seth,
    for string. trimWhiteSpace() I would consider a second argument that lists the
    needed whitespace or some predefined class of whitespace.

    string.trimWhitespace(s, "\r\n\t")

    or something like

    string.trimWhitespace(s, system.blah.asciiWhitespace +
    system.blah.macRomanWhitespace)

    I'm not sure if it should not ignore the non-breaking space. It's some kind of
    meta-character (it contains layout information). If one wants it killed he
    should explicitly do a regex on start & end of string. My 2 cents.

     
  • Andre Radke

    Andre Radke - 2006-05-30

    Logged In: YES
    user_id=1137587

    I prefer Karsten's solution, i.e. passing a string that contains the chars to be
    trimmed as a second optional parameter.

     

Log in to post a comment.

MongoDB Logo MongoDB