From: Oren Ben-K. <or...@ri...> - 2001-12-02 15:30:39
|
Clark C . Evans [mailto:cc...@cl...] wrote: > | About the rest (NEL, PS and LS) - there's a lovely little > | document in > | http://www.unicode.org/unicode/reports/tr13/ which > | describes what we (or > | anyone else) should be doing > > Great. So CR, LF, CRLF, and NEL are all normalized to LF. Yes. > | Outside scalars we throw away the line break characters anyway > | so there's no isse of what PS/LS should map to. > | > | Inside text scalars I suggest that never convert PS/LS into > | LF or fold them > | into a space. If someone is using them presumably he has a > | good reason to, > | and he's aware that notepad wouldn't handle it well. > > Off hand, I think we should normalize PS/LS just like > the others, unless, of course, they are escaped. I think that would be an unnecessary incompatibility with Unicode. > The problem with the scalar treatment is the edge > case. (PS = line ended using PS instead of CR/LF/CRLF/NEL) I didn't understand the problem: > one: bing PS The value is "one". > two: \\PS > bop PS > foo PS The value is "bop<PS>foo". The first and last PS-es aren't part of the escaped scalar value. If you want "<PS>bop<PS>foo<PS>" you have to write: two: \\<eol> <PS> bop<PS> foo<PS> <eol> Where "eol" is any line break form (PS included). > three: bar PS The value is "bar". I don't see the problem... Have fun, Oren Ben-Kiki |
From: Oren Ben-K. <or...@ri...> - 2001-12-02 21:15:59
|
Clark C . Evans wrote: > | > two: \\PS > | > bop PS > | > foo PS > | > | The value is "bop<PS>foo". The first and last PS-es aren't part of the > | escaped scalar value. > > Kinda magical isn't it? Nope. No different than if it were LF characters there. The line break immediately following the indicator is never taken to be a part of the leaf value, in any of the styles. Likewise a leaf value does not include the final line break unless it is a block - and even there we allow for stripping the final line break using ||. No magic; just standard YAML. > Can I fold the escaped scalar? > > two: \\PS > bop fooPS That's valid but is a different value ("bop foo"). > If not, then this kinda breaks the semantics of our > language, no? No. Line folding is defined to preserve LS and PS, that's all. It make sense - somebody went into a great deal of trouble to say "a *paragraph/line* ends here". Preserving LS and PS allows you to preserve a line/paragraph structure of text and *still* fold it: --- \ This long line belongs in the same paragraph<LS> as this one.<PS> This line line belongs to another paragraph.<PS> Besides being stronger this also happens to conform to the Unicode guidelines :-) Of course a Unicode system should know how to display them in some way that they appear different than normal LF, so imagine <LS> and <PS> above to be appropriate hieroglyphs... Word already knows how to display paragraph markersm for example. Have fun, Oren Ben-Kiki |
From: Clark C . E. <cc...@cl...> - 2001-12-02 22:13:38
|
| No. Line folding is defined to preserve LS and PS, that's all. | | It make sense - somebody went into a great deal of trouble to say "a | *paragraph/line* ends here". Preserving LS and PS allows you to preserve a | line/paragraph structure of text and *still* fold it: | | --- \ | This long line belongs | in the same paragraph<LS> | as this one.<PS> | This line line belongs | to another paragraph.<PS> | | Besides being stronger this also happens to conform to the Unicode | guidelines :-) Ok. Let's make it so. ;) Clark |
From: Brian I. <in...@tt...> - 2001-12-03 09:02:43
|
On 02/12/01 17:25 -0500, Clark C . Evans wrote: > | No. Line folding is defined to preserve LS and PS, that's all. > | > | It make sense - somebody went into a great deal of trouble to say "a > | *paragraph/line* ends here". Preserving LS and PS allows you to preserve a > | line/paragraph structure of text and *still* fold it: > | > | --- \ > | This long line belongs > | in the same paragraph<LS> > | as this one.<PS> > | This line line belongs > | to another paragraph.<PS> > | > | Besides being stronger this also happens to conform to the Unicode > | guidelines :-) > > Ok. Let's make it so. This whole discussion basically went over my head. I'll wait for the spec. :) Actually, I won't. I'll finish my implementation and wait for the bug reports ;) Cheers, Brian |
From: Clark C . E. <cc...@cl...> - 2001-12-02 16:47:50
|
| > Off hand, I think we should normalize PS/LS just like | > the others, unless, of course, they are escaped. | | I think that would be an unnecessary incompatibility with Unicode. | | > The problem with the scalar treatment is the edge | > case. (PS = line ended using PS instead of CR/LF/CRLF/NEL) | | I didn't understand the problem: | | > one: bing PS | | The value is "one". | | > two: \\PS | > bop PS | > foo PS | | The value is "bop<PS>foo". The first and last PS-es aren't part of the | escaped scalar value. Kinda magical isn't it? Can I fold the escaped scalar? two: \\PS bop fooPS If not, then this kinda breaks the semantics of our language, no? Clark |