First of all, thank you for developing a YAML library for C#. I've been
putting off doing so myself (as I want to use YAML for object serialization
in my own projects). So hopefully your work will save me the trouble (I
haven't tried your library yet).
Regarding line breaks, I would not favor having the YAML processor, by
default, normalizing line feeds to a system dependent value. That would
cause data that was encoded on one system to be transformed when decoded on
a different system. Although this would be a useful option for a YAML
processor to support, having it do so by default could lead to unexpected
results. This becomes even more profound when trying to use a cross-platform
For an example of the language problem, lets suppose that we have an
application that writes SMTP messages to a YAML file for later transmission
by another program. We'll suppose that the body of the message is written as
a literal scalar. Now lets suppose that we also write a Python program that
is scheduled to run periodically, which reads all of the messages from the
YAML file and transmits them all at once through our SMTP server and which
is hosted on a Windows system. The Python programmer, knowing that the
application will run on Windows and so will already have the line feeds in
those scalars normalized to be \r\n by the YAML processor (or, simply
because when he reads the file, he sees that that is the case), decides
simply to transmit that body as-is (SMTP requires the \r\n encoding of its
body for plain text messages). Later, the Python program is moved to a Unix
system and, all of the sudden, it stops working (this could also be a
problem if the mail program was shared with someone else who was using Unix
instead of Windows or even a Mac, which uses \r for its line endings). And
why did it stop working? Because the line endings returned by the YAML
processor are no longer \r\n, but only \n. Consequently, the application has
to be rewritten to behave differently depending on which platform it is
running on (it has to normalize the line endings itself).
This example also shows the potential problems for the originator of the
messages. Lets suppose that the programmer writing the first application
(the one that writes SMTP messages to a YAML file) knows the limitations of
the second program (the one that sends the messages). He knows that the
program simply takes whatever is put into the scalar value and sends it as
is to the destination without normalizing the line endings for SMTP. He,
however, understands that the application may move from Windows to Unix, and
so he wants to ensure that the body of the message will always have the
proper \r\n combination. But, now, he has a problem. He can't do that with a
literal scalar because the interpretation of that scalar would change
depending on what platform that the sending application runs on. Instead, he
is always forced to use double quoted scalars.
The %LINEBREAK directive would handle many cases (most probably including my
example), but does not fully address the issues involved as it would end up
applying to the entire document as a whole. This would lead to the writer of
documents with mixed content (say one that is embedding a variety of text
files with different line endings) having to decide which scalars it is
going to have to use double quotes with. Although this isn't worse than the
current situation, it is incomplete. Oren mentioned to me that the hope
(plan?) is to use schemas to resolve this kind of issue, and that would seem
a much better solution in this case as well.
In the meantime, it seems to me that clients of a YAML processing library
are better served if the library exposes an option (or options) that allows
how line feeds are normalized to be controlled.
Thanks again for writing a .NET library. I look forward to trying it out.
The last one I found for .NET didn't work so well =P.
> -----Original Message-----
> From: Osamu TAKEUCHI [mailto:osamu@...]
> Sent: Saturday, September 12, 2009 1:22 AM
> To: yaml-core@...
> Subject: [Yaml-core] Line break normalization
> Recently, I implemented a YAML parser, which is going to be
> mainly used in Microsoft Windows environments, where the
> system default line breaks are "\r\n" instead of "\n".
> During that, I had to think about the way to store multi-line
> scalar nodes in YAML documents.
> At first, I thought the best way is to use the literal style:
> %YAML 1.2
> But I found a description that this document must always be
> parsed as "abc\n def\nghi\n" and not as "abc\r\n def\r\nghi\r\n".
> > Section 5.4. Line Break Characters
> > Line breaks inside scalar content must be normalized by the YAML
> > processor. Each such line break must be parsed into a
> single line feed
> > character. The original line break format is a presentation detail
> > and must not be used to convey content information.
> I understood the spec wanted to have the meaning of a YAML
> document be independent of the environments in which it is
> Then, the only way I found to express "abc\r\n def\r\nghi\r\n"
> in YAML was:
> %YAML 1.2
> \ def\r\n\
> This is unacceptably ugly.
> But I'm afraid there is no better way to do it with the
> current specification. Is there any?
> If we discuss some improvement of the specification to solve this
> problem, I suggest two solutions.
> 1. Declaring a YAML processor should normalize line breaks to the
> system default line breaks, instead of a single line feed "\n".
> This causes the character stream expression of an unescaped text
> vary depending on the environment where it is interpreted.
> At first glance, this is unacceptable.
> But, we should remember that, a YAML processor is allowed to
> or almost
> required to normalize the character encoding of text data in a YAML
> document to the system default, "because the character encoding is a
> presentation detail and must not be used to convey content
> (5.2. Character Encodings)"
> If line break format is also a presentation detail, the specification
> should require line break normalization not always to a
> single line feed
> but to the system default.
> 2. Introducing a new directive, something like %LINEBREAK.
> This will preserves the character stream expression of text
> data without
> explicitely escaping line breaks.
> %YAML 1.2
> %LINEBREAK "\r\n"
> I vote to the first option because it is consistent to the
> way of dealing
> with the character encoding, fmm, and %LINEBREAK directive is
> not good
> looking for me.
> Osamu Takeuchi
> Let Crystal Reports handle the reporting - Free Crystal
> Reports 2008 30-Day
> trial. Simplify your report design, integration and
> deployment - and focus on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now. http://p.sf.net/sfu/bobj-july
> Yaml-core mailing list