Thread: FW: [Yaml-core] newlines

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Sorry - hit the wrong button :-(

Clark C . Evans [mailto:cc...@cl...] wrote:
> Ok.  So it stays as is.  That said, I think I'd like
> to add NEL (x83) and bare CR (x0D) to the line 
> normalization rules; per the Blueberry conversation
> a few months ago...[1]  I've posted a query to the 
> XML list and I'll be interested in the results.

Bare CR already is.

About the rest (NEL, PS and LS) - there's a lovely little document in
http://www.unicode.org/unicode/reports/tr13/ which describes what we (or
anyone else) should be doing:

"Even if you know which characters represents NLF on          
your particular platform, on input and in interpretation, treat CR, LF,
CRLF,          
and NEL the same. Only on output do you need to distinguish between them."

We do that - we convert them all to LF. That's allowed by the rules:

"1. If you do know the exact usage of any NLF, then convert it to LS or PS.
2. If you don't know the exact usage of any NLF, remap it to your platform
NLF. (This doesn't really help you in interpreting Unicode text unless you
are the only source of that text, since someone else may have left in LF,
CR, CRLF, or NEL.)".

We legitimately use #2 and declare our platfom (YAML's) NLF to be LF. So we
are covered as far as CR/LF/NEL are concerned.

We don't allow an unescaped FF in a YAML file, so we don't have to break our
heads on that one.

That leaves PS and LS (Paragraph and Line Separators). These are a pain.

The rules are:

"A readline function should stop at NLF, LS, FF, or PS."

So, we must treat LS and PS as valid line breaks at least within text
scalars - the simplest is to allow them as a line break everywhere.

As for what we do with them afterward:

"1. Always interpret PS as paragraph separator and LS as line separator.

2. In word processing, interpret any NLF the same as PS.         
3. In simple text editors, interpret any NLF the same as LS.         
4. In parsing, choose the safest interpretation."

Outside scalars we throw away the line break characters anyway so there's no
isse of what PS/LS should map to.

Inside text scalars I suggest that never convert PS/LS into LF or fold them
into a space. If someone is using them presumably he has a good reason to,
and he's aware that notepad wouldn't handle it well.

Thoughts?

Have fun,

	Oren Ben-Kiki

Thread: FW: [Yaml-core] newlines

yaml-core