Hello,the biggest value of YAML is good readability. A document full of "
\xDCber" can not be read by humans.
Let us read the specification:
p. 5.1 Character Set:
YAML streams use the printable subset of the Unicode character set. <...> On
output, a YAML processor must only produce these acceptable characters, and
should also escape all non-printable Unicode characters.
(German, Russian and Greak characters are printable)
p. 5.2 Character Encoding:
A YAML processor must support the UTF-16 and UTF-8 character encodings. If a
character stream does not begin with a byte order mark (#FEFF), the
character encoding shall be UTF-8
(it does not specify that the output should be ASCII)
There is no problem to support old terminals and output ASCII but escaping
Unicode should not be the default behavior.
P.S. this is where PyYAML and SnakeYAML deviate - SnakeYAML only emits ASCII
when it is explicitly requested.
(from the very beginning Java was very Unicode-friendly)
On Mon, Feb 23, 2009 at 5:42 PM, Kirill Simonov <xi@...> wrote:
> Andrey Somov wrote:
>> By default PyYAML outputs an ASCII character stream escaping Unicode
>> 'Über' -> '\xDCber'
> Technically, it's "\xDCber", not '\xDCber'. The former is a representation
> of the text:
> while the latter is a representation of the text:
> >>> print yaml.load('Über')
> >>> print yaml.load(r''' "\xDCber" ''')
> >>> print yaml.load(r''' '\xDCber' ''')
> PyYAML is completely in compliance with the YAML specs here. There are two
> choices for the emitter when it encounters a non-ASCII character: either
> emit the scalar in the UTF-8 encoding or use the double-quote style and
> escape non-ASCII characters. Both are correct and supported by the PyYAML
> emitter. By default, the emitter uses the conservative approach: it escapes
> non-ASCII characters since it ensures that the document is always readable
> on ASCII terminals. While it may produce a less readable result, it's