From: Brian I. <in...@tt...> - 2001-08-08 11:59:27
|
On 08/08/01 13:10 +0200, Oren Ben-Kiki wrote: > Brian Ingerson wrote: > > OK. I buy it. No shortcuts for now. I think all funny chars should be > > reserved for future use. To use them verbatim, you simply double quote > > them. Win/Win. > > Not quite... I agree we should reserve some keys, I don't see how > quoting them helps. This is related to: I thought we had agreed that: key: !: classname "!": just another key would be parsed into an applications map as (possibly): key: __!__: classname "!": just another key You just can't use "__!__" for a key, or whatever the implementor decides. > > > > To layer this you need that the information model will > > > distinguish somehow between: > > > > > > int: 12 > > > > > > And: > > > > > > text: "12" > > > > > > This means that the information model is no longer a simple > > > map/list/scalar/null. No, I'm afraid that this transformation > > > (12 -> int) must be done at the lowest level, where the > > > formatting information is still available. > > > > I disagree. If the application can control how the text is > > emitted (simple vs quoted) and can also tell which form the > > parser encountered, then the heuristics for numbers could be > > put at the application level. There would be a suggested > > implementation (like yours below) but it would not be mandatory. > > The default would always be string. That means that some data > > might not round trip between Python and Foo. But how could it > > anyway? Foo does not support integers. :) > > I'm unaware of any language which doesn't have the concept > of integers and floats :-) A) I think that there *are* languages that only have strings. What about shell scripts? I think Tcl at least used to be all strings. There may be others in the future. B) It really doesn't matter, because ints and floats will have different implementations in different languages. Strings are the only reliable implementation. For instance, what is this: key: 652346462364876476246264648964774623646462646492134654769634763 In Perl this would have to be a string or a float. In Ruby, it could easily be an int. > > How would the application tell what form the parser has > encountered, if the information model only allows it to return > one of int/float/string/null? Surely not by some meta-information > attached to the data - that would bring us right back to the > 'class' magical attribute. Same problem exists for quoting "!" as > a key, etc. No, the low level API needs to return what type of quoting was used. The mid level API will decide what data type to encode it in for that particular language. We will of course have guidelines that should be followed, similar to the patterns you posted. The end user API will not care. It will just expect that the correct thing was done if the languages are homogeneous, and that the best possible thing was done if the languages are heterogeneous. > I think the deciding factor should be practicality, and > again you have convinced me it is much more practical > to have all these types - null, reference, int, float > and string - in every YAML implementation "out of the > box". This would greatly improve YAML's acceptance. Practicality is by far my leading motivation as well. > > > > Also, it seems that keys must be strings, even if we > > > have more scalar types in the model. Right? > > > > Again, that would be the default, but the application could > > decide. Therefore, a language implementation can be tuned for > > maximum usability within its own realm, while still doing an > > admirable job of playing well with others. > > I agree; however if your system distinguishes between 12, 12.0 > and "12" as map keys, you still can't assume other YAML systems > will be able to cope with it. So YAML documents you create > should be careful with using this. Perhaps it should be > added as a warning instead of as a requirement? Exactly. Perl only allows strings for map keys. Python allows for many things including references. So make the core engine support 3 types of strings, and let the next layer (the language layer) decide what is best. You'll always round trip within the same lanuage, and come as close as possible with multiple languages. > > > > I'd be happy to hear your counter proposal. Hopefully > > > it would be even better! > > > > OK, here goes. > > > > Let me preface by saying that I really like the current three scalar > > forms. We are very close, and I really like what we have. > > I'm uneasy about '|-' - I always thought it is a hack. Otherwise I > agree, the three types we have are great and their syntax is pretty > good. Though we could do better :-) It's a little hacky, but by far the best I've seen. > > So here is the problem to solve. Is this: > > > > key : > > Verdict: "Guilty as charged!" > > > > a map or a simple scalar? It cannot be a block, list, or > > quoted. I think that's all we need to disambiguate. > > No. All three forms: > > key : > One : value > "Two" : value > 'Three' : value > > Are potentially ambiguous. In the latter two cases lookahead > may be required just to detect the problem. This lookahead > is potentially large - one quoted multi-line scalar's worth. > Am I right in assuming that you propose to use lookahead to > resolve these cases? (Or do you intend to disallow line > breaks in keys?) We will have to bite the lookahead bullet, yes. But as I've always said, "don't burden the user just to save a little complexity from the implementation side". Once we write the code, we'll get to use an excellent syntax. We won't even remember whether or not it was hard to do. > > Let's assume this is supposed to be a simple scalar. I > > used to think we'd need something like: > > > > key : $ > > Verdict: "Guilty as charged!" > > > > But your quoting ideas gave me a better idea. Instead of > > using a sigil for "simple scalars starting on next line", > > let's just use single quotes to mean the same as simple > > scalar, except they can be used to disambiguate strings > > from maps. Very similar to Perl. > > > > key : > > 'Verdict: "Guilty as charged!"' > > Yes, it requires one scalar's worth of lookahead: > > key : > 'Verdict: > "Guilty as charged!"' : Not so! Yes. It does. No big deal. > > The one caviat is that we must escape single quotes and > > backslashes. But this will be incredibly rare, since using > > the single quotes in the first place will be rare. > > I was rather pleased with myself when I hit on the notion > that there's no real need to escape quotes. Since we detect > the end of the scalar using indentation, then the only place > there's a potential ambiguity is if the quote character is > the very first or the very last character. We already have > a solution to handle indicators in the first character - > we just add a quote before them. It turns out this works > for the quote character itself, and it also works at the > end of the scalar: > > key: ''This value 'starts \and/ ends' with a single '' Yes. This is a good thing. Data::Denter has actually always worked that way. It was just a momentary lapse :) > > > I really won't consider messing with the current form of > > block. Anything that > > requires an extra line at the end (like your suggestion) or > > any end marking > > at all is a real detraction from the simplicity we now have. > > Perhaps I wasn't clear. My proposal does *not* require an > extra line at the end. In my proposal, writing: > > block: ' > block here > ... > > Is exactly equivalent to the current spec's: > > block: | > block here > ... > > So if that's what troubling you, problem solved. I really don't care what the sigil is. I just won't allow swapping the '-'ish thing for a trailing indicator. It gives me the feeling that the ending quote is part of the data. And what if the data does end with a tick? The '-' thing isn't beautiful, but it's the best we've got. > > Since I have eliminated the need for a sigil after the : for > > simple and quoted, the three quotes idea isn't really needed > > at all, although part of it lives on in the single-for-simple > > proposal. > > You haven't eliminated the sigil, you have just moved it to > the next line. And I dislike the lookahead aspect. To summarize > the differences between your and my proposals: OK. Maybe I should have said "indicator" instead of "sigil". I have eliminated the indicator. The single quote is not an indicator like '%' IMO. We could have: key: $ 'simple string' In a strict implementation. The '$' is the indicator. What I really don't like is the quote starting on a different line than the data. That will just confuse the heck out of people. That is never done in programming. Why do it in YAML? > - The indicators used are different (yours are the current | ' " > and mine are the new set ' ` "). Even. > - In both proposals they are optional except for when needed > for disambiguation. Even. > - In both proposals @ and % are optional except for when needed > for disambiguation. Even. > - My proposal requires the start quote to appear at the first > line of a multi-line scalar value. Yours doesn't. Point for > you. > - In your proposal, | is a prefix and must appear in the first > line; likewise for '|-'; the others surround the value. > In my proposal, in all cases they surround the value and the > terminating quote is optional unless needed for disambiguation. > There's no need for two block forms; one does it all. Mine > seems simpler and more consistent. Point for me? I don't expect to need quotes on a block. A block is a visual rectangle. The current form gives that perfectly. > - In your proposal, it is impossible to write a multi-line > unescaped simple scalar that starts at the next line. In mine > it is possible. Point for me? Not true AFAIK. key: simple scalar As long as there is no colon. > - Your proposal requires large lookahead in certain cases. > Or disallow line break in quoted keys. Or something. > Mine doesn't require any of these. Point for me? Lookaheads are an implementation issue. Computer programs do lookaheads all the time. These lookaheads are not huge, compared to things like language parsing. Moving quotes away from the data to save a lookahead is a big loss in aesthetic acceptance for YAML. I would prefer a non-quote scalar indicator sigil like '$' to eliminate lookaheads. > > > Here's my fairly exhaustive examples: > > The following would be identical in mine: > > > a: simple value > > c: > > this : is a map > > h: "a quoted value\n" > > probably an int: 123 > > probably a float: -12.34 > > always a string: "54321" > > k: > > : list value 1 > > : list value 2 > > empty list: @ > > empty map: % > > These would be different. Note that these are all > cases used rarely or for large amount of text: > > > b: > > simple value > > mine: b: ` > simple value Confusing to user. '=' or '$' would be easier to understand. It leaves me looking for a matching quote. > > > d: > > 'this : is a simple string' > > mine: d: ` > this : is a simple string You'll have a hard time explaining why any person familiar with modern text would understand the second more clearly. > > e: > > a 'simple' value > > mine: e: ` > a 'simple' value > > > f: > > 'this : would be a \'rare\' case' > > mine: f: ` > this : would be a rare case \'" > and wouldn't require any escaping! > > > g: > > 'I just realized: 'single quotes' do > > not need to be escaped at all. Inde\ > > nting does the quoting. Single quotes > > simply 'disambiguate'' > > mine: g: " > Yes, I have said so "long ago" - at le\ > ast several days ago :-) You only need > need them to "disambiguate"" > > > i: > > "a quoted value\n" > > mine: i: " > a quoted value\n" Aaaack. I'm sorry, but do you actually like this? (Lookaheads aside.) > > (The terminating " is allowed). > > > j: | > > This : is a block > > for sure! > > mine: j: ' > This : is a block > for sure! Fine by me. > > Also: > > > jj |- > > This : is a block > > for sure! > > mine: jj: ' > This : is a block > for sure!' Not OK by me. It isn't even deterministic. What if the ending tick is part of the data. And please don't give me a solution that involves adding an extra line to the end. > No need for a second block form. The terminating ' > indicates there's no trailing newline. > > Does this work for you? Sorry if I seem one-sided. I just want to get coding. I (and others) need YAML for real life projects three weeks ago. I think the 3 quotes idea is going in the wrong direction. Please consider my proposals, and offer me a counter. I'm going out for lunch for 2-3 hours. I'll be back after that. Let's get this thing locked :) Brian |