string.trimWhitespace ignores:
vtab, char(11) - which is FileMaker's "newline"
ff , char(12) - form feed, sometimes a page break
and nonBreakingSpace
Mac: char(202)
Win: char(160)
Note:
vtab = "vertical" tab
I realize that "fixing" this could break current
scripts, though I suspect that's rare. One
observation: on the Mac, the regex (builtins.re)
whitespace token \s does NOT match vtab but DOES match
nonBreakingSpace. I should test on Windows too.... My
recollection is vague, but I think that's true for Perl
too, i.e. it's more likely to be a "bug" in the spec
rather than in the regex library that Frontier uses.
Of course that would be easy enough to test with a
small Perl script....
If PCRE and Perl really do omit vtab, one could make a
good case for leaving vtab out of
string.trimWhitespace. There are some benefits to "bug
for bug compatible" (even though this verb isn't
implemented using regex). And, despite the use of vtab
for FileMaker, there may well be cases where there's a
tangible benefit to treating vtab as something other
than whitespace. (Though, until I see those cases, I
still consider it a "spec bug" ... or of course faulty
memory on my part!)
Logged In: YES
user_id=1171838
I'd be happy to fix this *right this minute* by adding a
couple of optional params to add support for treating vtabs
and non-breaking spaces as white space.
However: in our cross-platform, multi-encoding world, I'm
not completely sure about the right way to handle this. The
vtab is easy, it's the same in almost all character sets
(encodings). Handling the non-breaking space isn't so easy.
It can't just test the platform, because I frequently have
text on my Mac which started in Windows, and vice versa.
So in other words, if string.trimWhiteSpace on the mac
treated 160 as a non-breaking space, but the text I'm
working with actually came from an email or web-submission
that started on a PC, it's going to delete characters that
really are NOT white space.
Perhaps another optional param to specify the character set?
So the declaration line in the glue script would look like this:
on trimWhiteSpace( s, flVtab = false, charset = "" )
... so if you didn't specify the character set, it would
assume it was platform native.
How does that look? (Anybody?)
Seth
P.S. Incidentally, Karsten is right. The string type does
need some additional properties. If our strings specified
the encoding, this problem would be much more easily solved.
Logged In: YES
user_id=1413973
Did you happen to test Perl with vtab? (I haven't done Perl
in ages or I would try to figure out the one-liner.)
Assuming, Perl omits it, I think an optional param (default
to false) is fine for Frontier.
As for nonBreakingSpace, I agree that a param would be good
-- and defaulting to the native platform is of course important.
Just for completeness: I think ff should be added without
any params -- or with the default set to true. Then the
defaults will match Perl.
Logged In: YES
user_id=201017
Hi Scott,
just out of curiosity: which version of FileMaker do you use?
Hi Seth,
for string. trimWhiteSpace() I would consider a second argument that lists the
needed whitespace or some predefined class of whitespace.
string.trimWhitespace(s, "\r\n\t")
or something like
string.trimWhitespace(s, system.blah.asciiWhitespace +
system.blah.macRomanWhitespace)
I'm not sure if it should not ignore the non-breaking space. It's some kind of
meta-character (it contains layout information). If one wants it killed he
should explicitly do a regex on start & end of string. My 2 cents.
Logged In: YES
user_id=1137587
Also see:
http://groups.yahoo.com/group/frontierkernel/message/2533
Logged In: YES
user_id=1137587
I prefer Karsten's solution, i.e. passing a string that contains the chars to be
trimmed as a second optional parameter.