James Lockie spake unto us the following wisdom:
> Ethan Blanton wrote:
> > It most certainly can. B, j, r, and k are represented by their ASCII
> > values, and =F6 can be represented in several ways; this email contains
> > =F6 in UTF-8 as U+00F6, which UTF-8 represents as 0xc3 0xb6.
>
> Ah, the string is probably something other than UTF8.
> I get it thanks.
>=20
> I think I did this right :-)
> '426c656564696e6720576f726473206279204d6f62696c65' =3D 'Bleeding Words by=
=20
> Mobile'
Yes, this string is all ASCII -- I assume it works.
> '69742773206f6820736f20717569657420627920426af6726b' =3D 'it's oh so quiet
^^ =F6 in ISO-8859-{1,15}
> by Bj=F6rk'
So, if Amarok isn't telling you what encoding these strings are (and I
suspect it is, when it can tell itself, but some annotations, such as
id3v1, do not have encoding tags), your best bet is probably to simply
try whatever encoding you expect to be most common, and fall back on
replacement of the invalid characters if that doesn't work. Something
like:
if (g_utf8_validate passes)
use the string as is
else if (g_convert from ISO-8859-1 works)
use the conversion
else
gaim_utf8_salvage it
We use this sort of tactic in several places in Gaim, where strings
come in that are of some unknown encoding.
Ethan
--=20
The laws that forbid the carrying of arms are laws [that have no remedy
for evils]. They disarm only those who are neither inclined nor
determined to commit crimes.
-- Cesare Beccaria, "On Crimes and Punishments", 1764
|