Thread: [Yaml-core] minutes 17-jun-2002

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Summary: >
 Oren, Brian and I met for a short while on IRC today and had
 a brief chat over the "simple scalar".   Let me first give a
 bit of background explanation as to the issues grappled with.

Background: 
- >
 First, this scalar style satisfies the need for simple string
 values, such as enumerations, etc.   It is not folded, escaped,
 quoted.  I'd like to note that originally this scalar kind was 
 called "folded" and could span multiple lines.  Later on we 
 divided the scalar type into a one-line variety and a multi-line 
 folded variety to reduce many complications.
- >
 Second, with the introduction of a type system in YAML, we
 wanted a way to express common types in an implicit manner 
 that didn't hurt readability.  Integers, floating point
 values, and date/time being the largest use cases.   We generalized
 on the core use cases above to many use cases by using a regular
 expression mechanism.  Basically, the content of this scalar style
 is comparied to each regular expression in the type library; if
 the regular expression matches, then it is typed accordingly.  If
 none of the expressions match; then it is an implicit type error.
- >
 Third, there are a few complications.  The scalar style can be used
 as a key, and as such cannot contain ": " or it would signify the end
 of the key.  Also, when used as a value, it cannot use the ": " due
 to the "- key: value" short-hand trick we have.   Further, it can
 also be used in the context of inline mappings and values.  In this
 case a value cannot contain '[', ']', '{', '}', ': ' and ', '.
- >
 About two weeks ago, we loosened up things in the spec to allow 
 in-line styles to span multiple lines.  This is a change that we
 have yet to fully grapple with as it was a complicating factor that
 we set aside along time ago in the interest of complexity; but now,
 after getting quite a bit of user feedback now feel is needed for
 general acceptance.
- >
 Furthermore, about this time frame (or a bit thereafter) we tightened
 up the regular expression for "text" matching to include only word
 characters insead of matching any string starting with an alpha char.
 This is what prompted the current meeting.  This restriction was 
 passed on a short voice vote without enough consideration and we
 are very happy that the YAML community has questioned this decision.

Goals:
- > 
 We'd like to maintain an implicit type system so that it is easy
 to express and read integers, dates, null, and floating point values.
- >
 Ideally, we'd like it open to allow other sorts of implicit types
 to be added in the future, for instance, a URL or a YPATH expression.
 Some of us feel that we could fix the implicit types at the above
 set and make all other types explicit.
- >
 We'd like to keep the simple scalar simple for common one-line strings.
 This is especially important for enumerated values, etc.
- >
 A few of us would like to use the simple scalar again for multi-line 
 folded-like strings (although there is not universal agreement here)
 especially the usage for human-written paragraphs:

    example:  Some would like, very much so, to be able to use
              this style for documentation (and other stuff...) 
              that spans multiple lines in the original folded
              manner -- with full, access to punctuation!
- >
 We'd like to keep it intuitive and easy to implement; although
 implmentation can be harder if required to make it easier to use.

Note: >
 A few people are getting confused with what the simple scalar
 production allows and what the text regular expression is.
 In general, the scalar production applies to all simple scalars,
 having implied types (including text) or otherwise.  While the
 text regular expression is applied to content allowable by the
 production to determine if it is a "text" according to the 
 implicit rules.   Note that restrictions to the regular expression
 do not apply if you simply use !text before the content...

Implicit Typing Options:
- >
 The first option, current in the spec, is that the implicit expression
 only allows word characters and the space.  This is deemed problematic
 since people want to use punctuation.  Not that this keeps many implict
 types (such as a URI) open for definition in the future.
- > 
 The second option is to further restrict the expression so that a 
 space is not allowed.  This solves the problem above by making it 
 impossible to use the simple scalar for paragraphs without an 
 explicit !text marker.
- >
 The third option is to separate the implicit type detection from
 the simple scalar altogether and make another indicator out of it.
 The stand-alone ! was proposed for this purpose.   Another proposal
 would use parenthesis so that implicit typed things are always
 included in parenthess, such as (mailto:cc...@cl...) for 
 example, would be implicitly typed as a URI.  In this extreme, 
 lacking the indicator would mean that the type is text, so 
 null values, dates, integers, etc. would all require an extra 
 indicator, either parenthesis or ! or something else..
- >
 The fourth option is to roll-back a bit.  In this option we say
 that all simple scalars starting with an alpha character are 
 implicitly typed as text.  This means that mailto:cc...@cl...
 cannot be implicitly typed as a URI, and that '32 Walker Drive'
 will not be implicitly typed as a string.  This option would allow
 for most paragraphs, but provide for the ability to introduce new
 implicit types in the future as long as they don't start with
 an alphabetic character.  By convention we could use parenthesis
 for implicit types which would otherwise start with an alphabetic
 character, such as a URI.
- >
 The fifth option is to fix the implicit types; at least for this
 version.  In this option anything not covered by an existing
 regular expression is a string.  This option makes version numbers
 very important as the type of a node can depend upon it and
 getting the version number wrong can cause information model 
 problems that do not show up as parse errors.
- >
 The sixth option is to eliminate implicit types altogether,
 and make everything a string.

Production Options:
- > 
 Have one simple scalar production which is used as 
 both key/value and both nested within an in-line map/scalar
 and otherwise.   This option prevents ", " from being used
 in top level scalars.
- >
 Have two simple scalar productions.  One for non-nested 
 cases where ", " is useable; and another for the nested 
 case where ", " isn't useable.  Note that ": " is not 
 useable in any circumstance.
->
 Go with 3/4 productions that reflect both necessary and
 sufficient restrictions but with added complexity.

Choices:
- >
 Thus far, we've chosen to go with two simple scalar
 productions.  We choose this since we'd like to be
 able to write paragraphs using the simple form and
 since four productions are more complicated without
 any clear value.
->
 Thus far, we've chosen to go with the fourth implict typing
 option.   There are several factors, first, this is how
 we had it and it works fairly well and current data/parsers
 use this setup currently.   The other options didn't give
 us the ability to introduce other implicit types, such
 as currency, or were too restrictive on the text expression.
 In some ways it is reassuring to go back to a previous choice.

Thoughts: >
 After the meeting, I had a few questions/reservations about
 how simple scalars would be used in a multi-line (not in-line)
 key.  I'm wondering if the third production option wouldn't 
 be more prudent from a readability perspective.  In this case,
 making in-line keys restricted to a single line instead of 
 allowing them to be multi-line.  Remember, we have the ? : syntax 
 for this sort of thing... this would probably enhance readability,
 otherwise I can see the whole thing getting ugly.

Closing: >
 I hope that this accurately reflects the ideas discussed and 
 informs the YAML user community about the factors involved and
 why the current solution emerged.  Oren won't implement the
 spec changes till this next weekend... so you have till then
 to make a compelling case for your favorite option.

Thread: [Yaml-core] minutes 17-jun-2002

yaml-core