Seth Dillingham - 2006-04-29

Logged In: YES
user_id=1171838

Thanks for the report, Thomas.

Two points:

1. This is a known bug. I actually thought I had mentioned
it in one of my recent posts about this verb being read, but
now I don't see it so I must have left it out.

2. This is the same bug that I reported on the list a
couple/few weeks ago with the other verbs that use TEC. That
is, if the text has a character which is invalid for the
specified input encoding (or more preciely, a byte is found
in the string which does not correspond to the character map
for the encoding), then TEC returns an error and the verbs
all return false. Returning false halts script execution
immediately.

What we need, simply, are some error messages. That's where
all of this started, but I suddenly needed to expand the
feature (for professional reasons) and lost track of my
original motivation. :-)

I'll come up with something, even if we just start out with
something generic that at least allows you to trap it.

Incidentally... text encoding issues are a total pain in the
butt. Or actually, in the head. Look at your example: you
might actually be talking about my own feed, which is in
UTF-8 and recently used the sum character.

The thing is, if you're seeing it, then you're *already*
seeing it in the macintosh encoding, regardless of what the
original encoding was. That's because your aggregator, your
browser, or the OS itself (depending on how you got to it in
the first place) translated the UTF-8 byteS (plural) into
the single macintosh byte that represents the sum character
to the macintosh.

Even viewing the source of a web page or RSS feed doesn't work.

If you want to see the real UTF-8 source, use something like
Frontier to download it, don't run it through
convertCharset. Look at it in a hex viewer, or something
like that.

Seth