Re: [Yaml-core] Simple Scalar Styles

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On 18/06/02 15:29 -0400, Clark C . Evans wrote:
> On Tue, Jun 18, 2002 at 02:21:34PM -0400, Oren Ben-Kiki wrote:
> | Let's just say for shorthand that:
> | 
> | *CORE ::= *SPC + *IND + *SEQ + *COM
> | 
> | That is, all things that really can't/shouldn't appear in *any* such value,
> | no matter what. The proposal becomes:
> | 
> |   [ top-value,
> |     top-seq ]: [ *CORE, *KEY ]
> | # seq requires *KEY so we just ban it everywhere.
> | 
> |   [ inline-value,
> |     inline-seq ]: [ *CORE, *KEY, *SEP ]
> | # Inline bans *SEP as well.
> | 
> |   [ top-key,
> |     inline-key ]: [ *CORE, *KEY, *SEP, *SPN ]
> | # Keys can't span.
> | 
> | I think we all agree on this?
> 
> Ok.  I don't think that multi-line scalars within an in-line 
> collection is good practice; but if it will help acceptance
> more than hurt it then I'm game.   A further clarification,
> this limits quoted variants to a single line as well?

I concur that using multi-line scalars within an in-line collection is
not good practice. Especially when you want to allow comments on any line.
I'd really like to avoid comments in the middle of a token. I'd also like to
avoid inline-collection continuation/spanning mid-token.

    ---
    bad practices: #should be illegal IMO
      - { foo: figuring out     # first line
               scalars is hard, # second line
        }
      - { foo: "even double quotes    # what am I?
       should be single line"         # I yaml what I yaml!

I think that both Clark and I want to preserve the original intent of inline
collections (small groupings of small things) while allowing them to use
multiple lines. But I think we all agree that scalar spanning in inline
collections is possible.

> 
> | We need to agree on three things:
> | - The scalar styles (Clark's latest looks basically OK)
> 
> I talked to Brian and he doesn't like the unification and
> thinks that the current path is good.  So, unless there is
> an outcry of support for the proposed unification; I guess
> it will just fade away as a unfulfilled strawman. ;)

We are very close with our current spec, and I really like the multiple
forms to boot. A little bit of redundancy is a good thing (paraphrasing
Larry Wall :)

I didn't like that we threw out double quoting on a whim, just because
we might be able to get by without it. Double quoting is a well known
idiom that YAML is currently capitalizing on. I also like '>' nested
folding. I'm not sure that you could do folding right without it in
all cases, because sometimes you can't autodetect the indentation
level. Thus our explicit number indicator: '>4'. Wiki also benefits
visually from the nested style folding for scalars that begin with
indented wiki content.

We've been down these treacherous roads before. We're on the right track.

> | - The restrictions on the unquoted in-line style (seems 
> |   the above is acceptable to all)
> 
> It's acceptable, given the reservations above.
> 
> Brian and I discussed how the top-* items can have
> the semantics of the folded type so that we don't
> introduce yet another set of folding rules.  The
> indentation for the multi-line top-* scalars is
> determined from the first non-blank line past the
> opening line.  If you want any other indentation 
> setting, use the folded variant.   This would allow
> for Wiki stuff within this scalar form and keep the
> logic a bit more uniform between the two scalar styles.

I agreed with Clark that this was possible. I don't really desire it though.

I think the nested-folded form should use wiki style folding, and the
spanning top-value should use collapse style folding, where all leading
and trailing whitespace collapses to a single space. This actually gives
us more power. The added bonus is that collapse style doesn't care about
indentation width, so no explicit indicator is needed.

> | - The regexp for strings (we have some reasonable 
> |   options, Clark will make a call).
> 
> This comes down to what direction we want to take 
> implicit types.  We have a few choices:
> 
> - Drop them altogether; I include using the parenthesis
>   mechanism (even if it is auto-detected within parens)
>   as making them explicit.
> 
>   This is simple since it has a specific indicator for
>   auto-type detection.   The parenthesis is just a 
>   nicer looking, stand-alone ! indicator.  Nothing more
>   nothing less; and it isn't implicit any more.
> 
>   This isn't pretty beacuse it leaves out integers,
>   timestamps, floats, null, and other very common types
>   which are not strings.
> 
> - Fix them at the current level, possibly with a few
>   quick modifications.   This would not prevent us from
>   defining above parenthesis mechanism which could be
>   more flexible in the future.

+1

>   This is nice in that commonly used types of today are
>   supported well; and we can use the parenthesis marker
>   for types of tomorow.   If we go this route, we have
>   to choose if stuff like tuples (ip addresses) and
>   URIs should be implicit types.   An argument could be
>   made that these are application level objects and that
>   typing beyond the core types should be done at the
>   application level; either explicitly or via a 
>   schema which has the expected data types.
> 
> - Allow us to add them in the future; be quite restrictive
>   up front.   This comes at the price of having to quote
>   things, perhaps things that don't make sense that they
>   should be quoted.
> 
> Before my suggestion, let me get two changes to implicit
> types that I think we should implement:
> 
> - We clean up the date/time productions to eliminate
>   time.  Time just doesn't make sense without date as
>   there arn't any good native "time-only" types.  We 
>   only keep the timestamp type with an optional date-only
>   shorthand used only when the time is exactly 
>   midnight GMT.  We further limit the regex to be 
>   exactly one token by only using a T to separate 
>   the date from the optional time component.

To clarify, we drop !date and !time from the spec, and just support
!timestamp. (Perhaps we can rename !timestamp to !date for brevity) The
new date could be written in either of the following formats:

    ---
    - 2001-06-18     # Implies midnight for time.
    - 2001-06-18T19:22:45.5Z

If we enforce the 'T' then we don't have any implicits with
whitespace in them, except string. This makes it easier to have a
relaxed string implicit.

> 
> - We add a very cute format for floating point values
>   which happen to have two decimal places.  34.3% is
>   treated as 0.343 ; this would make alot of business
>   YAML representations look great.  Yes? (nothing 
>   like pure un-adultrated sugar)

I don't care. I don't care. (quoting Jon "not John" Wayne)

> The compromise I'm now leaning to is a fixed approach with
> a few escape hatches so we can be creative later if the 
> need emerges.  Unfortunately, this means that it will
> be a bit complicated...
> 
>    1.  The regexp for matching text content should exclude
>        a few characters that are not commonly used for
>        punctuation that we can use later for indicators or
>        other implicit types or for other purposes.  
> 
>                |~#%^*+=<>$  
> 
>        We really can't restrict @ due to e-mail addresses,
>        \ due to DOS pathnames, : due to URLs.

-1 # Parens and leading alphanums cover all the bases.

>    2.  Anything starting with alpha-numeric value that
>        is more than one token (contains a space) is a 
>        text via the implicit rule.

+1 

>    3.  Anything starting with alphabetic character is
>        text via the implicit rule.

+1 

> 
> This rules out implicit typing of URLs, pathnames, and
> email.  This isn't so bad.  I don't know what a URL type
> would give me anyway... Hmm.

+1

Cheers, Brian