|
From: Ethan B. <ebl...@cs...> - 2007-01-28 17:22:48
|
James Lockie spake unto us the following wisdom:
> James Lockie wrote:
> > Sean Egan wrote:
> >> I'm sure Amarok provides a mechanism for knowing the
> >> encoding of the strings it gives you
> >
> > I have asked on the Amarok maiing list what string format Amarok is=20
> > returning. :-)
> > =20
> I took out the decode and it works for strings without special=20
> characters but I get this error for "Bj=F6rk":
> message.append(signature=3Dintrospect_sig, *args)
> UnicodeError: String parameters to be sent over D-Bus must be valid UTF-8
>=20
> My guess is Amarok is NOT sending utf8 but decode works for some of the=
=20
> strings but utf8 for other strings. :-(
> I'll figure it out. :-)
I'm guessing Amarok is not sending UTF-8 for *any* of its strings, but
those strings which happen to contain only characters in the ASCII
range are validating as UTF-8. (ASCII strings are valid UTF-8, and
will display correctly.) Any strings which contain extended ISO-Latin
characters (e.g., =F6) are probably in some locale-specific charset
(most likely ISO-8859-1 or ISO-8859-15) and are thus failing
validation.
Try taking a byte dump of the string you're trying to use, and see if
it doesn't say {0x42, 0x6a, 0xf6, 0x72, 0x6b}. This is the
representation of "Bj=F6rk" in ISO-8859-{1,15}.
> I was told "Bj=F6rk" can be represented by utf8 but that doesn't seem to=
=20
> be the case.
It most certainly can. B, j, r, and k are represented by their ASCII
values, and =F6 can be represented in several ways; this email contains
=F6 in UTF-8 as U+00F6, which UTF-8 represents as 0xc3 0xb6.
Ethan
--=20
The laws that forbid the carrying of arms are laws [that have no remedy
for evils]. They disarm only those who are neither inclined nor
determined to commit crimes.
-- Cesare Beccaria, "On Crimes and Punishments", 1764
|