From: Oren Ben-K. <or...@be...> - 2004-04-17 13:15:45
|
OK, I'm almost at the end of the productions. I think you'll find them more comprehensible than the previous set. At any rate... I'm now considering the top-top-level document productions. Currently we forbid documents without an explicit "---" header from being flow scalars. This requires me to seperate the generic node production into two variants No big deal, except it got me thinking (dangerous, I know)... What with our new restrictions/relaxations for simple keys and so on, this rule seems to be overly strict. Technically speaking, we could just say that a document is simply a node, in any style, regardless of whether a header line is specified. There's no ambiguity involved, since the situation is the same as when parsing something following the ':' in a block mapping. For example (<SOF> starts file, <EOF> ends file): <SOF> This would be a plain scalar document. --- Another one. <EOF> <SOF> !foo This would be a 'foo' plain scalar. <EOF> <SOF> !#/bin/foo Hmmm. This is a '#/bin/foo' plain scalar. If we were to say that !#<something> scalars are always literal, we could just load every UNIX script file!. It is way too much of a hack, though :-( <EOF> <SOF> "This would be a double quoted scalar." --- 'This is single quoted'. <EOF> <SOF> | This would be literal. --- This is plain. <EOF> <SOF> > This would be folded. --- This is plain. <EOF> Remember, we already forbid a non-indented line to start with '---' or '...' anyway. Reasons to allow the above are: - Simpler rules (one less special case). - Existing text files (README, COPYING etc.) give a reasonable result when interpreted as YAML plain scalars. It isn't clear how useful this is... Reasons I can think of _not_ to allow the above are: - The current restriction makes it easier to identify YAML files as such. The modified rules allow almost anything to be considered a YAML file. This is relevant both for computer programs and for people. - As the !#/bin/foo example shows, treating non-YAML files as YAML doesn't always do what's expected; for example, script files get their lines joined to one big line (with the occasional line break where an empty line was used). It is impossible to reconstruct the original script from this. Note that both these objections only hold for plain scalars. Quoted or block scalars are denoted by indicators (" ' | >) and using them without a "---" is much less confusing. Perhaps we should only ban header-less *plain* scalars... Hmmm. This would remove the most common "mistaken identity" case - "!#/bin/foo" files - and would require an explicit annotation of README files as being plain or folded, preventing them being silently mangled. Not too bad... So, options: - No-way: Forbid all header-less top-level scalars (as today). - Half-way: Allow header-less indicated top-level scalars (| > " '). - All-way: Allow all header-less top-level scalar styles. Thoughts? Have fun, Oren Ben-Kiki |