Clark C . Evans [mailto:cce@...] wrote:
> On Thu, May 31, 2001 at 01:46:01PM +0200, Oren Ben-Kiki wrote:
> | - The "character set" attribute explains how to convert the
> | value of the
> | 'content' entry to bytes to write. It should be one of
> | ascii/utf-8/utf-16be/utf-16le. If the content is a binary
> | blob, either there
> | is no "character set" (the content isn't textual), or it
> | could just say
> | "binary" (I know that's not a IANA recognized character set...)
> I was thinking a bit differently. Perhaps either
> we should use "class" to indicate character set
> or introduce another indicator. Your thoughts.
> Consider the current "built-in" classes,
> "_int", "_real", etc. Are not they encodings?
It seems I'm making a distinction here that you aren't.
Issue 1: How to convert YAML text to an in-memory object and vice-versa
Issue 2: How to convert YAR file content represented in a YAML scalar to a
file on the disk and vice versa (YAR issue).
Character set, as I described it, belongs to issue 2. It seems to me
completely outside the scope of YAML to include, as part of the object
model, the answer to the question "if this Unicode string was to be written
as the sole content of an on-disk file, which of the many possible encodings
to use for it".
Your questions about the built-in types, however, relate to issue 1: When
reading the YAML file into memory in, say, Java, should I create a String
in-memory object, a Float in-memory object (or maybe a Double?), or an
Integer in-memory object (or maybe a Long?). Great question, to which my
answer is: "determine how YAML will support color, and do it that way".
> | Question: in a block, does YAML preserve whether a line
> | ended with a \n or a \r\n?
> No. This is one of the nice problems XML cleared up, let's
> not roll back the clock. I've yet to hear anyone complain
> that XML's folding of \r\n and \r into just \n was anything
> but helpful.
> The only place it gets us into problems is for the YAR
> use case... where the serializer doesn't know if a given
> value is character or binary. In this case, I think
> the behavior should be dependent upon the platform.
> On unix platforms, files with \r\n are treated as binary
> (to preserve the line endings), and on DOS boxes, files
> ending with \n (without \r) are treated as binary.
A "binary" file will be represented as a base64 blob in the YAR file. You
don't have to go that far to handle an occasional \r\n or \n in a file; you
can just use a quoted string and escape them properly. That's still "mostly
> This gives the expected behavior when moving files
> between platforms. By the way, with the current
> treatment, any YAML saved on a Windows box will
> have \r\n line endings. And likewise, on a
> Unix box it will have \n line endings. Since the
> parser doesn't care, one can move the files back
> and forth without affecting the canonical form.
> This may not give exactly the same behavior
> as "tar", since the exact line endings won't
> be preserved when switching platforms... but this
> is always a sore spot anyway! As for "diff",
> there are (or should be) flags to not report
> line ending differences.
So, YAR is more like "shar" then "tar". That makes a lot of sense.
> Therefore, I'm pretty certain that I'd like to keep
> the line ending folding as it is in the YAML spec.