Hi,

By default PyYAML outputs an ASCII character stream escaping Unicode characters:

'Über' -> '\xDCber'


Read YAML 1.1 specification (Chapter 5. Characters):

YAML streams use the printable subset of the Unicode character set.

If a character stream does not begin with a byte order mark (#FEFF), the character encoding shall be UTF-8.

 

It looks like in this case PyYAML does not follow the specification.

In the issue 11 (http://pyyaml.org/ticket/11) there is an explanation:

The default is to escape non-ASCII characters because they will produce garbage in non-utf8 terminals.

 

It is no problem to be able to escape non-ASCII characters but it should not be the default because it makes the output far less readable.

 

P.S.

As a maintainer of SnakeYAML I try to stay as close as possible to PyYAML to allow developers to re-use the knowledge and API and save some brain cycles when they have to work with YAML in Python and Java.

I would like to keep the list of deviations as short as possible...


Andrey