From: BlueGM <bl...@gm...> - 2009-09-13 23:19:16
|
Osamu TAKEUCHI, First of all, thank you for developing a YAML library for C#. I've been putting off doing so myself (as I want to use YAML for object serialization in my own projects). So hopefully your work will save me the trouble (I haven't tried your library yet). Regarding line breaks, I would not favor having the YAML processor, by default, normalizing line feeds to a system dependent value. That would cause data that was encoded on one system to be transformed when decoded on a different system. Although this would be a useful option for a YAML processor to support, having it do so by default could lead to unexpected results. This becomes even more profound when trying to use a cross-platform language. For an example of the language problem, lets suppose that we have an application that writes SMTP messages to a YAML file for later transmission by another program. We'll suppose that the body of the message is written as a literal scalar. Now lets suppose that we also write a Python program that is scheduled to run periodically, which reads all of the messages from the YAML file and transmits them all at once through our SMTP server and which is hosted on a Windows system. The Python programmer, knowing that the application will run on Windows and so will already have the line feeds in those scalars normalized to be \r\n by the YAML processor (or, simply because when he reads the file, he sees that that is the case), decides simply to transmit that body as-is (SMTP requires the \r\n encoding of its body for plain text messages). Later, the Python program is moved to a Unix system and, all of the sudden, it stops working (this could also be a problem if the mail program was shared with someone else who was using Unix instead of Windows or even a Mac, which uses \r for its line endings). And why did it stop working? Because the line endings returned by the YAML processor are no longer \r\n, but only \n. Consequently, the application has to be rewritten to behave differently depending on which platform it is running on (it has to normalize the line endings itself). This example also shows the potential problems for the originator of the messages. Lets suppose that the programmer writing the first application (the one that writes SMTP messages to a YAML file) knows the limitations of the second program (the one that sends the messages). He knows that the program simply takes whatever is put into the scalar value and sends it as is to the destination without normalizing the line endings for SMTP. He, however, understands that the application may move from Windows to Unix, and so he wants to ensure that the body of the message will always have the proper \r\n combination. But, now, he has a problem. He can't do that with a literal scalar because the interpretation of that scalar would change depending on what platform that the sending application runs on. Instead, he is always forced to use double quoted scalars. The %LINEBREAK directive would handle many cases (most probably including my example), but does not fully address the issues involved as it would end up applying to the entire document as a whole. This would lead to the writer of documents with mixed content (say one that is embedding a variety of text files with different line endings) having to decide which scalars it is going to have to use double quotes with. Although this isn't worse than the current situation, it is incomplete. Oren mentioned to me that the hope (plan?) is to use schemas to resolve this kind of issue, and that would seem a much better solution in this case as well. In the meantime, it seems to me that clients of a YAML processing library are better served if the library exposes an option (or options) that allows how line feeds are normalized to be controlled. Thanks again for writing a .NET library. I look forward to trying it out. The last one I found for .NET didn't work so well =P. BlueG > -----Original Message----- > From: Osamu TAKEUCHI [mailto:os...@bi...] > Sent: Saturday, September 12, 2009 1:22 AM > To: yam...@li... > Subject: [Yaml-core] Line break normalization > > Hi, > > Recently, I implemented a YAML parser, which is going to be > mainly used in Microsoft Windows environments, where the > system default line breaks are "\r\n" instead of "\n". > > During that, I had to think about the way to store multi-line > scalar nodes in YAML documents. > > At first, I thought the best way is to use the literal style: > > %YAML 1.2 > --- > |2+ > abc > def > ghi > ... > > But I found a description that this document must always be > parsed as "abc\n def\nghi\n" and not as "abc\r\n def\r\nghi\r\n". > > > Section 5.4. Line Break Characters > > > > Line breaks inside scalar content must be normalized by the YAML > > processor. Each such line break must be parsed into a > single line feed > > character. The original line break format is a presentation detail > > and must not be used to convey content information. > > I understood the spec wanted to have the meaning of a YAML > document be independent of the environments in which it is > interpreted. > > Then, the only way I found to express "abc\r\n def\r\nghi\r\n" > in YAML was: > > %YAML 1.2 > --- > "abc\r\n\ > \ def\r\n\ > ghi\r\n" > ... > > This is unacceptably ugly. > > But I'm afraid there is no better way to do it with the > current specification. Is there any? > > > If we discuss some improvement of the specification to solve this > problem, I suggest two solutions. > > > 1. Declaring a YAML processor should normalize line breaks to the > system default line breaks, instead of a single line feed "\n". > > This causes the character stream expression of an unescaped text > vary depending on the environment where it is interpreted. > > At first glance, this is unacceptable. > > But, we should remember that, a YAML processor is allowed to > or almost > required to normalize the character encoding of text data in a YAML > document to the system default, "because the character encoding is a > presentation detail and must not be used to convey content > information. > (5.2. Character Encodings)" > > If line break format is also a presentation detail, the specification > should require line break normalization not always to a > single line feed > but to the system default. > > > 2. Introducing a new directive, something like %LINEBREAK. > This will preserves the character stream expression of text > data without > explicitely escaping line breaks. > > %YAML 1.2 > %LINEBREAK "\r\n" > --- > |2+ > abc > def > ghi > ... > > > I vote to the first option because it is consistent to the > way of dealing > with the character encoding, fmm, and %LINEBREAK directive is > not good > looking for me. > > > Best, > Osamu Takeuchi > > -------------------------------------------------------------- > ---------------- > Let Crystal Reports handle the reporting - Free Crystal > Reports 2008 30-Day > trial. Simplify your report design, integration and > deployment - and focus on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Yaml-core mailing list > Yam...@li... > https://lists.sourceforge.net/lists/listinfo/yaml-core |