I've found a problem test case for
string.convertCharSet. This is on Mac OS X 10.4.5,
Xcode 2.2.1, 10.2.8 SDK.
Here is the script...
local {
s};
s = string.convertCharset ("UTF-8", "macintosh", "∑");
dialog.alert ("hot here!")
The character I'm trying to convert as displayed on Mac
as the sum character (option-w). I think this is a
bogus conversion and debugging tells me that the
conversion failed which is fine.
The problem is I don't get a dialog! The script fails
silently. If I run the converion in Quick Script I do
get back false.
A little more about the text I'm trying to convert.
The sum character is out of a story from the news
aggregator. The channel for the story claims the
channel is in UTF-8. So my script is attempting to
convert it to macintosh. I believe the sum character
is bogus and the conversion should fail but I should be
able to trap the error in some way.
Logged In: YES
user_id=1171838
Thanks for the report, Thomas.
Two points:
1. This is a known bug. I actually thought I had mentioned
it in one of my recent posts about this verb being read, but
now I don't see it so I must have left it out.
2. This is the same bug that I reported on the list a
couple/few weeks ago with the other verbs that use TEC. That
is, if the text has a character which is invalid for the
specified input encoding (or more preciely, a byte is found
in the string which does not correspond to the character map
for the encoding), then TEC returns an error and the verbs
all return false. Returning false halts script execution
immediately.
What we need, simply, are some error messages. That's where
all of this started, but I suddenly needed to expand the
feature (for professional reasons) and lost track of my
original motivation. :-)
I'll come up with something, even if we just start out with
something generic that at least allows you to trap it.
Incidentally... text encoding issues are a total pain in the
butt. Or actually, in the head. Look at your example: you
might actually be talking about my own feed, which is in
UTF-8 and recently used the sum character.
The thing is, if you're seeing it, then you're *already*
seeing it in the macintosh encoding, regardless of what the
original encoding was. That's because your aggregator, your
browser, or the OS itself (depending on how you got to it in
the first place) translated the UTF-8 byteS (plural) into
the single macintosh byte that represents the sum character
to the macintosh.
Even viewing the source of a web page or RSS feed doesn't work.
If you want to see the real UTF-8 source, use something like
Frontier to download it, don't run it through
convertCharset. Look at it in a hex viewer, or something
like that.
Seth