|
From: James L. <bjl...@lo...> - 2007-01-29 06:07:01
|
Ethan Blanton wrote:
> James Lockie spake unto us the following wisdom:
> =20
>> Ethan Blanton wrote:
>> =20
>>> It most certainly can. B, j, r, and k are represented by their ASCII
>>> values, and =F6 can be represented in several ways; this email contai=
ns
>>> =F6 in UTF-8 as U+00F6, which UTF-8 represents as 0xc3 0xb6.
>>> =20
>> Ah, the string is probably something other than UTF8.
>> I get it thanks.
>>
>> I think I did this right :-)
>> '426c656564696e6720576f726473206279204d6f62696c65' =3D 'Bleeding Words=
by=20
>> Mobile'
>> =20
>
> Yes, this string is all ASCII -- I assume it works.
>
> =20
>> '69742773206f6820736f20717569657420627920426af6726b' =3D 'it's oh so q=
uiet
>> =20
> ^^ =F6 in ISO-8859-{1,15=
}
> =20
>> by Bj=F6rk'
>> =20
>
> So, if Amarok isn't telling you what encoding these strings are (and I
> suspect it is, when it can tell itself, but some annotations, such as
> id3v1, do not have encoding tags), your best bet is probably to simply
> try whatever encoding you expect to be most common, and fall back on
> replacement of the invalid characters if that doesn't work. Something
> like:
>
> if (g_utf8_validate passes)
> use the string as is
> else if (g_convert from ISO-8859-1 works)
> use the conversion
> else
> gaim_utf8_salvage it
>
> We use this sort of tactic in several places in Gaim, where strings
> come in that are of some unknown encoding.
>
> Ethan
Thank you so much.
I did
# Try and decode message
try:
msg =3D message.decode('utf8')
except:
self.log("DecodeError: Could not decode utf8 '%s', trying ISO=
8$
try:
msg =3D message.decode('iso-8859-1')
except:
self.log("DecodeError: Could not decode '%s'" % message)
return
which is what you suggested and it works now.
|