Re: [Yaml-core] Oren's Take

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Thu, Oct 25, 2001 at 11:15:06AM +0200, Oren Ben-Kiki wrote:
| I'm going to rephrase it as independent modifications to my proposal:
| 
| 1. Use ':' instead of ' ' to separate "descriptors".
| 
| I must say I'm not enamored of this. I find:
| 
| 	this:!class:&001 value
| 
| To be much less human-friendly than:
| 
| 	this: !class &001 value

Hmm. After seeing the two examples, I think the space 
separator is more readable.  That said, I'm willing
to go either way.  An advantage of the "single token"
approach is that the value can start with & or !
(although it'd be confusing and error prone).

I would prefer:

    unquoted: \
        This is an unquoted string
    block: |
        This is a block with a 
        terminal carriage return.
    no-line-end: |-
        This is a blockk without
        the terminal carriage return.

But can live with Brian's proposal 
using :, | and \ respectively.

| 3. Making '~' a descriptor instead of an implicit type. I don't see why we
| need to do that. Do you want to make references into descriptors as well?
| Why?

Ok.  Do we would write: !null!  
instead of: ~

| 4. In-line block values:

I think you mis-read what I wrote, as it was "meta".  No biggee.

| 5. Using '-' instead of ':' for list entries.
| 
| As Brian pointed out, this is safe:
| 
| inline scalar: -1
| next line scalar: \
|     -1
| list entry:
|     - 1
| 
| The '-' is a tad more readable, but it means another
| special character. I've no strong opinion about this.
| At a pinch I'd go for using '-'.

I can go either way.  In some ways it's nice to have
list use a different indicator.  However, it is another
character...  Worse, for some items, a desh looks better
than a colon (for long text entries) and for other items,
a colon looks better.  I'll defer this one to you all.

| 6. Having explicit type names available for text as 
|    well as map and list.
| 
| Good idea.

Hmm.  I still have to understand this better.

| 7. Using the names '.<type>' instead of '<type>!' for the basic types.
| 
| I'm afraid we'll be using a mechanism which may prove useful
| for other purposes: e.g., "relative type names".
| 
| (Relative type names mean:
| 
| data-island: !org.someone.root-type
|     data-element: !.another-type
|     ...

Yes.  This is from our SML-DEV talks.  It's very nice...
most data comes in islands and having to repeat the prefix
each time is damn tedious.  For now, let's not worry about 
giving the YAML types an abbreviation.

| 8. '#<index>' descriptor for sparse lists.
| 
| I like the idea of sparse maps, but I don't feel strongly about the need to
| support them. They can always be emulated by having explicit null entries. I
| also dislike making '#' an indicator, because that would rule out using it
| as a key for "comment" values. Likewise making '[' or '(' be indicators
| isn't a good idea. How about '^' instead?
| 
| sparse-list:
|     : ^12 Indexed entry.

Now that we arn't using @ for list, the at sign can be used...

sparse-list:
    : @12 Indexed entry at entry number 12.

| Clark's comments:
| > >   A.  Strengthen the packing requirement, making the
| > >       indicator mandatory follow the colon.
| > 
| > Not sure I get you here. The indicator *is* mandatory. It 
| > just happens that one possible indicator is ''.
| 
| I agree.

Withdrawn.

| > >   B.  We strengthen the : indicator to specifically
| > >       mean an un-quoted, un-escaped multi-line scalar.
| > 
| > What about it's next-line property? 
| > I don't see the need for this.
| 
| I agree - and therefore I think '\' is better. Its intuitive
| meaning rules out this kind of confusion.

My point was rather simple.  Regardless of the indicator,
if you want it possible for the start of a quoted string
to begin on the second line, then you disallow an unquoted
string to begin with a quote.  Note that our current 
production does not have this restriction.

That said, are we allowing...  I don't see a reason why not.

   key: This is a multi-line scalar 
       that continues on the next line.

| > >   C.  For multi-line keys, we strictly use the 
| > >       quoted form.  (for readability only)
| > 
| > OK by me. (I think :) Oren?
| 
| There's no technical reason for this restriction,
| but I can live with it.

No, there isn't.  But I think it is cleaner.  We
can always lossen the restriction later if we 
find that it is problematic, where the opposite
isn't true.

| > > If I use the same keys, they are meant as the proper
| > > replacement.
| 
| So repeating a key in a map means overwriting the value? I'd rather not.
| Some errors will go undetected, and round-tripping will lose data. Any good
| use case for this?

This was a meta-comment.  Not YAML behavior.  I was
trying to explain that I was providing replacement
suggestions to Brian's syntax by using the same
key name.  

| Other points you raised:
| 
| > I still would like to be able to:
| > nextline2::
| >     "Multi-line quoted scalars are rather easy to do.  It \
| >     is simply a quoted \"inline\" scalar which happens \
| >     to extend multiple lines.  It needs no special treatment. \
| >     And it removes the exceptional case from the multi-line \
| >     unquoted, unescaped, : scalar.\n"
| 
| I agree.

Ok.  But this restricts unquoted multi-line scalars
to not being with a quote.  This is a new restriction.
However, I seem to be out-numbered here, so I'll
withdraw the suggestion.

| > So if we take this away, how do we do empty lists and maps? Oren?
| 
| I agree with Clark that kind != type, and should be kept strictly
| independent of it, that is that !<type> does not reflect on the kind of
| node, just what it is being de-serialized into. I also think that my
| original proposal is consistent with both statements :-)
| 
| That's because when talking about an empty map or empty list is talking
| about its type, not its kind! The semantics of:
| 
|     empty map : !map!
| 
| In my proposal is:
| 
| - This is a *scalar node*.
| - Its value is the empty string.
| - Its explicit type is "map!".
| - The only valid value for this type is the empty string.
| - This value is deserialized into an empty map of the same
|   native data type as what untyped maps are deserialized into.

Ok.  So this is just a way to express empty maps, via the
type system.  Kinda arcane, but then again, so would be
re-introducing % and @.  If we are going to limit it to empty
map/list then perhaps the type should be !org.yaml.empty.map

| There's no magic at all involved, and I don't see that we need to add any. I
| learned that lesson well from my reference syntax fiasco.
| 
| That said, I agree with Brian that we should have a "text!" type along with
| "map!", "list!" and "ref!". I disagree that null, binary, block and (gasp)
| chomped-block belong to that set (I think Clark shares my view in
| this).

Yes.

| Clark also wrote:
| > > For implicit types ::
| > > 
| > >     My initial thoughts is that this only works
| > >     with one-line, non-quoted scalars.  There should
| > >     be a registry with the REGEX for the implicit
| > >     type and if it matches, the type is set.  
| > > 
| > >     The open issue is if this REGEX list is a singleton
| > >     (on the yaml.org site) or if it can be local.  If
| > >     it is local, then portability suffers as two vendors
| > >     may have a two different types for the same regex;
| > >     or worse, overlapping regex.  In this case, yaml
| > >     fragments from both vendors can't be mixed.  This
| > >     would greatly hurt YAML.  Thus, IMHO, the REGEX
| > >     list should be a singelton and published in a spec.
| > 
| > I agree with all of this. Oren?
| 
| I don't see how having incompatible implicit types is any better or worse
| then having incompatible explicit types. 

With the explicit type mechanism, we have global tags tld.domain.*
so people can use explict tags and not conflict.   This isn't
true for implicit types.  Without a global list, implicit types
are significantly weakened.

| > >     : starting productions
| 
| Here there are actually two issues:
| 
| - Using a sequence of blank lines to separate top-level entries.
| 
| This does save you from using the '-' prefixes. Is it that important? It
| seems to me that typically you'll know that a document will contain multiple
| entries and you can therefore have a list top-level production. I can live
| with it either way, though.

I'm weary of using blank lines as a seperator for two
reasons.  Consider "\n\n" vs "\n    \n" or "\n\t\n".  
First, I can't visually distinguish between them, 
it requires me to go into an editor which shows
me explicit whitespace characters.  Second, many 
editors strip trailing whitespace.   Third, I 
don't think my grandmother would be able to 
understand the subtle distinction here.

Right.  

| > >     : indentation characters (tabs anybody?)
| > 
| > Use one tab, instead of four spaces.
| > 
| > Someone else suggested this and it makes a lot of sense in many ways.
| > Why not just have one tab for each indentation level?
| 
| That was me, and Clark killed it by quoting the guy who wrote Make saying
| "this was the biggest mistake he made" or some such. Perhaps we should
| revisit this - as you say, a single tab character for indentation has a lot
| of advantages.

I give up.  TAB character it is.

| > >     : round-tripping comments
| > 
| > Something like:
| > 
| > this:
| >     #: a comment
| > 
| > same as:
| > 
| > this:# a comment
| >
| > Thoughts? I probably care least about this point, but it 
| > still is something we should agree on.

Hmm.  I guess I'd rather have...

  this:# a comment

be a synonymn for...

  this: !org.yaml.comment a comment

| I'd rather not have an in-line form of keys in a map... 
| Typically a comment would be multi-line anyway.
| 
| > >     : throw-away comments 
| > 
| > Unquoted lines beginning with '#' in column 1 should be ignored by the
| > parser. This will be so helpful, I consider it a slam-dunk. 
| > (People are already requesting this)
| > 
| > Could be implemented as a standard filter.
| 
| I guess we can't avoid this. The problem is that it ruins YAML-pretty-print
| as a standard YAML application... Sigh. Perhaps it could be written using
| the standard streaming API...

I'm not all that enthusiastic about comments
that don't round trip.  I think that the
stripping of comments could be a application
level choice...

  myyaml = yaml_load("myfile.yaml",strip_comments=1)

| > >     : DWIM modes
| > 
| > Maybe we don't need them with all this great new stuff.
| 
| Good. :-)

;)

| > >     : preprocessors/filters
| > 
| > : space to tab indentation fixer
| > : comment stripper
| > : validator
| > : blank line ignorer
| 
| Ideally, I'd consider these to be out of scope for the YAML draft itself.
| Otherwise, there would be no guarantee that my YAML processor will read your
| output YAML file, because I don't use the right filter.

Yes.  These things get problematic.  Unless we have
a declaration section at the start of the YAML.  

#!/my/program/that/accepts/yaml/input
#%
    YAML-Version: 1.0
    Whitespace: 4 spaces
    Strip-Comments: yes

Just an idea...

| > >     : YAML standard library (MIME etc)
| > 
| > I guess this is a library of preprocessors, filters and 
| > postprocessors that come standard. This let's you have 
| > a bunch of standard usage options that aren't in the core 
| > info model, but that people like/need nonetheless.
| 
| Again, I see a danger in me not being able to read your 
| file because I don't have the right filter.

I guess this is why Brian was denoting it as a 
standard library?

Best,

Clark