From: Matthew W. <mat...@gm...> - 2010-06-08 10:22:42
|
Dear all Thought I would pop up on this list to sound out people's feelings about the following proposal. Motivation: * YAML syntax is quite big and complex to fully support * No full YAML parser appears to exist for browser Javascript clients * Web browsers have fast, native JSON parsers which it's nice to use where possible * Plain JSON lacks custom data types and graph-based serialization features * People are reinventing the wheel trying to add these things on top of JSON in assorted poorly-standardised ways * YAML and JSON are already friendly, this would make them friendlier still :) Proposed solution is an embedding of YAML inside JSON, which could be implemented as an alternative presentation phase for a YAML library. The goal is that: * Data which would happily serialize as plain JSON, maps directly to its plain JSON representation (with the exception of some reserved key names required for the embedding) * Data which is not pure-JSON-compatible (uses types other than map,seq,string,int,float,null, or does not conform to a simple tree structure, containing cyclic references for example) maps to a subset of JSON from which the original YAML node tree can be recovered. Available cheesy name: Yayson. Sorted. I'm sure I'm missing some subtleties, but here's an outline of how this could be achieved. First tags: { "$tag": "!!timestamp", "$value": "2001-12-15T02:59:43.1Z" } { "$tag": "!my-app-specific-type", "$value": {"is constructed from": "a map"}, } Or a more concise alternative syntax for this: { "$tag": "!my-app-specific-type", "is constructed from": "a map" } (I would suggest that full support for things like tag namespace aliases via the %TAG directive not be supported here, to keep it simple; ! and !! could have their usual meanings as tag prefixes, or you could use a full tag URI) Now anchors: { "$anchor": "foo", "$value": "some string" } { "$anchor": "foo", "$tag": "!!timestamp", "$value": "2001-12-15T02:59:43.1Z" } { "$anchor": "foo", "$value": {"anchor is for": "a map"}, } or the alternative in the case of a map: { "$anchor": "foo", "anchor is for": "a map" } Now aliases: {"$alias": "foo"} Some caveats: This introduces 4 'reserved' key names for maps. Plain JSON data containing these keys would not actually be able to serialize in the usual way; they'd need to be escaped, via eg: {"$$anchor": "the key unescapes to $anchor"} {"$$$anchor": "the key unescapes to $$anchor"} .. Some more special forms may be needed to express some other YAML features (like arbitary objects as keys of maps) not expressible in JSON, eg: { "$tag": "!!map", "$pairs": [ [{"key": "object"}, {"value": "object"}], [{"other key": "object"}, {"value": "object"}] ] } Anyway let me know what you think! -Matt |
From: Ingy d. N. <in...@in...> - 2010-06-08 22:23:32
|
Matthew, Have you seen the JSYNC proposal I made last month? http://www.jsync.org I think it covers what you are trying to accomplish here, and even resolves your caveats. It provides everything that you can do in YAML, using standard JSON encoding. Want to help me with it? Cheers, Ingy On Tue, Jun 8, 2010 at 3:22 AM, Matthew Willson <mat...@gm...>wrote: > Dear all > > Thought I would pop up on this list to sound out people's feelings about > the following proposal. > > Motivation: > > * YAML syntax is quite big and complex to fully support > * No full YAML parser appears to exist for browser Javascript clients > * Web browsers have fast, native JSON parsers which it's nice to use where > possible > * Plain JSON lacks custom data types and graph-based serialization features > * People are reinventing the wheel trying to add these things on top of > JSON in assorted poorly-standardised ways > * YAML and JSON are already friendly, this would make them friendlier still > :) > > Proposed solution is an embedding of YAML inside JSON, which could be > implemented as an alternative presentation phase for a YAML library. The > goal is that: > > * Data which would happily serialize as plain JSON, maps directly to its > plain JSON representation (with the exception of some reserved key names > required for the embedding) > * Data which is not pure-JSON-compatible (uses types other than > map,seq,string,int,float,null, or does not conform to a simple tree > structure, containing cyclic references for example) maps to a subset of > JSON from which the original YAML node tree can be recovered. > > Available cheesy name: Yayson. Sorted. > > I'm sure I'm missing some subtleties, but here's an outline of how this > could be achieved. First tags: > > { > "$tag": "!!timestamp", > "$value": "2001-12-15T02:59:43.1Z" > } > > { > "$tag": "!my-app-specific-type", > "$value": {"is constructed from": "a map"}, > } > > Or a more concise alternative syntax for this: > > { > "$tag": "!my-app-specific-type", > "is constructed from": "a map" > } > > (I would suggest that full support for things like tag namespace aliases > via the %TAG directive not be supported here, to keep it simple; ! and !! > could have their usual meanings as tag prefixes, or you could use a full tag > URI) > > Now anchors: > > { > "$anchor": "foo", > "$value": "some string" > } > > { > "$anchor": "foo", > "$tag": "!!timestamp", > "$value": "2001-12-15T02:59:43.1Z" > } > > { > "$anchor": "foo", > "$value": {"anchor is for": "a map"}, > } > > or the alternative in the case of a map: > > { > "$anchor": "foo", > "anchor is for": "a map" > } > > Now aliases: > > {"$alias": "foo"} > > Some caveats: > > This introduces 4 'reserved' key names for maps. Plain JSON data containing > these keys would not actually be able to serialize in the usual way; they'd > need to be escaped, via eg: > > {"$$anchor": "the key unescapes to $anchor"} > {"$$$anchor": "the key unescapes to $$anchor"} > .. > > Some more special forms may be needed to express some other YAML features > (like arbitary objects as keys of maps) not expressible in JSON, eg: > > { > "$tag": "!!map", > "$pairs": [ > [{"key": "object"}, {"value": "object"}], > [{"other key": "object"}, {"value": "object"}] > ] > } > > Anyway let me know what you think! > > -Matt > > ------------------------------------------------------------------------------ > ThinkGeek and WIRED's GeekDad team up for the Ultimate > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the > lucky parental unit. See the prize list and enter to win: > http://p.sf.net/sfu/thinkgeek-promo > _______________________________________________ > Yaml-core mailing list > Yam...@li... > https://lists.sourceforge.net/lists/listinfo/yaml-core > |
From: Matthew W. <mat...@gm...> - 2010-06-08 22:05:45
|
> Have you seen the JSYNC proposal I made last month? http://www.jsync.org Aha, yeah that's exactly what I wanted! and a more concise syntax than mine. I did google for something along these lines ("yaml inside json", etc) but nothing showed up, apologies for the dup :) > I think it covers what you are trying to accomplish here, and even resolves your caveats. It provides everything that you can do in YAML, using standard JSON encoding. Great! There's a few cases which the webpage doesn't include in its examples: * A tagged custom type which is constructed from (say) a string rather than from a map - my example was {"$tag": "!!timestamp", "$value": "2001-12-15T02:59:43.1Z"} * An anchor attached to a string/array/some non-map type - my example was {"$anchor": "foo", "$value": "some string"} * Use of one of the special reserved keys "!" or "&" as a normal key of a mapping (would need to be escaped somehow) * A map with non-string keys * A string value which starts with "*" which you don't want treated as a reference On that last point, perhaps it would be easier to do references as {"*": "001"} rather than "*001". That means that you then only need to reserve one other map key, "*", rather than have to reserve (and escape normal uses of) all strings starting with *, noting that this requires every string value to be checked when mapping a parsed JSON tree to a YAML graph. > Want to help me with it? Sure - see the above :) I guess the ultimate awkward corner case would be: A tagged custom type, which is also given an anchor, and which is created from a map with some non-string keys and some (suitably-escaped) reserved keys. Any thoughts on a media type for this? eg application/yaml+json ? Also, I guess an alternative approach (or perhaps one which is undertaken in parallel?) would be to propose an extension to json with new syntax for references and tags, to match the YAML model, but only the minimal extra syntax required to do so, with the goal of keeping it as easy to parse as possible. -Matt |
From: Ingy d. N. <in...@in...> - 2010-06-09 06:48:40
|
On Tue, Jun 8, 2010 at 3:05 PM, Matthew Willson <mat...@gm...>wrote: > > Have you seen the JSYNC proposal I made last month? http://www.jsync.org > > Aha, yeah that's exactly what I wanted! and a more concise syntax than > mine. > > I did google for something along these lines ("yaml inside json", etc) but > nothing showed up, apologies for the dup :) > > > I think it covers what you are trying to accomplish here, and even > resolves your caveats. It provides everything that you can do in YAML, using > standard JSON encoding. > > Great! > > There's a few cases which the webpage doesn't include in its examples: > That page was just to give a rough idea of what was going on. I didn't want to scare people away with all the niggly details, on first glimpse. When I write the spec, it will cover everything. I've figured out all the edge cases, though. There's not really much to it, syntax-wise. But implementing it won't be trivial. We have to do tag resolution, implicit and explicit, prewalk to detect duplicate refs on dumps, provide a nice API for custom tagging etc. Monkey-hacking this into existing implementations will be interesting. Still, I think it is all very doable. I want to be sure to provide a nice and consistent API across all implementations. ie There needs to be an API spec, as well. I think that's one area where YAML has fallen down. I think if JSYNC is a success, then the API can be retrofitted to YAML implementations. I'll address your points below... * A tagged custom type which is constructed from (say) a string rather than > from a map - my example was {"$tag": "!!timestamp", "$value": > "2001-12-15T02:59:43.1Z"} > "!!timestamp 2001-12-15T02:59:43.1Z" > * An anchor attached to a string/array/some non-map type - my example was > {"$anchor": "foo", "$value": "some string"} > [ "&anchor1", "x", "y", "&anchor2 z"] --- &anchor1 - x - y - &anchor2 z > * Use of one of the special reserved keys "!" or "&" as a normal key of a > mapping (would need to be escaped somehow) > { ".!oh": "my!", ".>": ">" } --- '!oh': my! '>': '>' * A map with non-string keys > {"!!int 4": "four"} --- 4: four > * A string value which starts with "*" which you don't want treated as a > reference > [".*anchor"] --- - '*anchor' > On that last point, perhaps it would be easier to do references as {"*": > "001"} rather than "*001". That means that you then only need to reserve one > other map key, "*", rather than have to reserve (and escape normal uses of) > all strings starting with *, noting that this requires every string value to > be checked when mapping a parsed JSON tree to a YAML graph. > I think that too complicated visually. You need to check every string node anyway, to see if it has a type or anchor, so no real penalty doing "*001". > > Want to help me with it? > > Sure - see the above :) > Great. Join #jsync on irc.freenode.net if you do IRC. > I guess the ultimate awkward corner case would be: > > A tagged custom type, which is also given an anchor, and which is created > from a map with some non-string keys and some (suitably-escaped) reserved > keys. > { "!": "custom", "&": "anchor", "!!int 55": "I can't drive", "awesome": ".!!!" } --- !custom &anchor 55: I can't drive awesome: '!!!' Any thoughts on a media type for this? eg application/yaml+json ? > application/jsync ? > Also, I guess an alternative approach (or perhaps one which is undertaken > in parallel?) would be to propose an extension to json with new syntax for > references and tags, to match the YAML model, but only the minimal extra > syntax required to do so, with the goal of keeping it as easy to parse as > possible. > Explain this more. I don't really see what you are proposing... Cheers, Ingy > -Matt |
From: William S. <sp...@rh...> - 2010-06-09 19:40:18
|
Ingy dot Net wrote: > "!!timestamp 2001-12-15T02:59:43.1Z" I do not like this at all, because now you have to make up rules for how the value is quoted and you need to implement a nested parser for this. Leading spaces? Quotes in it? Backslashes in it? In fact you might as well just put the entire YAML file into one quoted string and call that a solution. I think any real solution has to split things into the final units at the JSON level, with the only post-processing the removal of single bytes from the starts of returned strings. MY PROPOSAL FOR JYAML: A value with an anchor: ["&anchor", <value>] A value with a tag (the %XX syntax is decoded so !!a%20b is written as "!!a b"): ["!tag", <value>] A value with both an anchor and tag: ["&anchor", "!tag", <value>] or ["!tag", "&anchor", <value>] A value that is a reference: "*anchor" A map entry with a key that is a string with no tag or anchor: "key": <value> A map entry with a key that is a reference: "*anchor": <value> A map entry where the key is any object, including numbers, null, true, false, arrays, maps, and values with anchors or tags. Here 'A' is a string generated by the converter that does not conflict with any other anchors (note that parser can distinguish these anchors because they are declared as keys, not as part of an array value): # YAML: <key>: <value> "&A": <key>, "*A": <value> All string values that start with 1 or more '.' characters followed by any one of '&*!@' have a single '.' removed to get the actual value: ".!foo" -> "!foo" "...!foo" -> "..!foo" ".1" -> ".1" # NOTE NO CHANGE! "..." -> "..." # AGAIN NO CHANGE! Errors: Strings starting with '&' or '!' at unexpected places are an error. Strings starting with '@' are always an error. Arrays containing strings starting with '&' or '!' must conform to one of the patterns above, otherwise they are an error. > I guess the ultimate awkward corner case would be: > > A tagged custom type, which is also given an anchor, and which is > created from a map with some non-string keys and some > (suitably-escaped) reserved keys. > > --- !custom &anchor > 55: I can't drive > awesome: '!!!' ["!custom", "&anchor", {"&1": 55, "*1": "I can't drive", "awesome": ".!!!" } ] > Also, I guess an alternative approach (or perhaps one which is > undertaken in parallel?) would be to propose an extension to json > with new syntax for references and tags, to match the YAML model, > but only the minimal extra syntax required to do so, with the goal > of keeping it as easy to parse as possible. This *IS* YAML. It's just that only the "flow" subset is being used. There is no need to define a different type. I do agree this is useful and am using it myself. Unfortunately YAML's syntax has not lived up to expectations, in particular it is NOT hand-editable as soon as complex nested data, and text with newlines in it, are introduced. Mistakes in editing produce cryptic error messages or complete mangling of the data. I switched to "flow" all the time and modified libyaml to add newlines after the commas. I also had to change the quoting rules to require that strings containing a '\' or '"' be quoted. This is because code assumed it could take YAML output, throw away the optional surrounding quotes, and add them again later. This made YAML a useful presentation form, but only if these characters were always shown as "\\" and "\"". |
From: Ingy d. N. <in...@in...> - 2010-06-12 22:35:38
|
On Wed, Jun 9, 2010 at 12:19 PM, William Spitzak <sp...@rh...> wrote: William, Thank you for your time and ideas. Ingy dot Net wrote: > > "!!timestamp 2001-12-15T02:59:43.1Z" >> > > I do not like this at all, because now you have to make up rules for how > the value is quoted and you need to implement a nested parser for this. > Leading spaces? Quotes in it? Backslashes in it? In fact you might as well > just put the entire YAML file into one quoted string and call that a > solution. I think any real solution has to split things into the final units > at the JSON level, with the only post-processing the removal of single bytes > from the starts of returned strings. > The rules of JSYNC are quite simple, and the parser you speak of is a couple simple regexes. Adding a wrapper structure to every annotated node was exactly the solution I wanted to avoid. I prefer to annotate the three kinds of nodes, using their native facilities. But to be honest, the hard work is not in either syntax method, but in figuring out how to add the YAML stack over current JSON parser/emitters in the wild. You could easily try both syntaxes and see how they compare. Where you have simplified in having one way to do it for everything, you have changed the overall memory structure of the graph. Where I have 3 rules instead of one, I have preserved the original structure. Either way is worth a try though, because they are effectively the same. To be clear: 1) For mappings: look for 2 special keys: "&" and "!". 2) For arrays: inspect the first entry for a string beginning with "&" or "!". 3) For a string: look for a "&" and/or "!". Since neither tags nor anchors can contain a space char, you can parse them out by matching from the start to the next space. Trivial. More below... > MY PROPOSAL FOR JYAML: > > A value with an anchor: > ["&anchor", <value>] > {"&": "anchor", ...} ["&anchor", ...] "&anchor ..." A value with a tag (the %XX syntax is decoded so !!a%20b is written as "!!a > b"): > ["!tag", <value>] > > A value with both an anchor and tag: > > ["&anchor", "!tag", <value>] > or > ["!tag", "&anchor", <value>] > {"!": "tag", "&": "anchor", ...} ["!tag &anchor", ...] "!tag &anchor ..." A value that is a reference: > > "*anchor" > > A map entry with a key that is a string with no tag or anchor: > > "key": <value> > > A map entry with a key that is a reference: > > "*anchor": <value> > > A map entry where the key is any object, including numbers, null, true, > false, arrays, maps, and values with anchors or tags. Here 'A' is a string > generated by the converter that does not conflict with any other anchors > (note that parser can distinguish these anchors because they are declared as > keys, not as part of an array value): > > # YAML: <key>: <value> > "&A": <key>, > "*A": <value> > This is a fantastic method. Way better than my proposal. I will fully adopt it for the forthcoming JSYNC specification. Thank you. All string values that start with 1 or more '.' characters followed by any > one of '&*!@' have a single '.' removed to get the actual value: > > ".!foo" -> "!foo" > "...!foo" -> "..!foo" > ".1" -> ".1" # NOTE NO CHANGE! > "..." -> "..." # AGAIN NO CHANGE! > I have agreed with this logic, since you mentioned it. My only worry is that it doesn't future-proof well. You might want to escape other leading sigils. I have updated the http://jsync.org example to reflect this. Errors: Strings starting with '&' or '!' at unexpected places are an error. > Strings starting with '@' are always an error. Arrays containing strings > starting with '&' or '!' must conform to one of the patterns above, > otherwise they are an error. > > I guess the ultimate awkward corner case would be: >> >> A tagged custom type, which is also given an anchor, and which is >> created from a map with some non-string keys and some >> (suitably-escaped) reserved keys. >> >> --- !custom &anchor >> 55: I can't drive >> awesome: '!!!' >> > > ["!custom", "&anchor", > {"&1": 55, > "*1": "I can't drive", > "awesome": ".!!!" > } > ] > { "!": "custom", "&": "anchor", "&1": 55, "*1": "I can't drive", "awesome": ".!!!" } Cheers, Ingy |
From: William S. <sp...@rh...> - 2010-06-14 20:52:07
|
Ingy dot Net wrote: > Since neither tags nor anchors can contain a space char, you can parse > them out by matching from the start to the next space. Trivial. My concern was if the value started with a space, '&', or '!'. I guess the rule can be that exactly one space is used up. And/or that one of the alternative syntaxes must be used in these cases. Also this does mean that the %XX encoding is left as-is in the tags & anchors, although libyaml and perhaps other libraries remove this before the string is returned. > {"&": "anchor", ...} This is equivalent to YAML '&anchor {...}', not '{&anchor ...}', correct? > ["&anchor", ...] I think this is what influenced my proposal, though I read it wrong. This is I believe equivalent to YAML '&anchor [...]', right? (not '&anchor ...' which is what I thought). It is not clear how you add a tag or anchor to a non-quoted object such as a number, however. Perhaps it is ok if quotes are required, or maybe I'm missing something. > # YAML: <key>: <value> > "&A": <key>, > "*A": <value> > > This is a fantastic method. Way better than my proposal. I will fully > adopt it for the forthcoming JSYNC specification. Thank you. Thanks! Yes I was pretty happy with this, after messing with a bunch of nested lists and things it suddenly occurred to me. > All string values that start with 1 or more '.' characters followed > by any one of '&*!@' have a single '.' removed to get the actual value: > > ".!foo" -> "!foo" > "...!foo" -> "..!foo" > ".1" -> ".1" # NOTE NO CHANGE! > "..." -> "..." # AGAIN NO CHANGE! > > > I have agreed with this logic, since you mentioned it. My only worry is > that it doesn't future-proof well. You might want to escape other > leading sigils. I agree, though I only added '@' because that is the only punctuation mark that YAML reserves for future use (unless I read the spec wrong). I think it would make sense to add more punctuation. Some strings that I think should work and not be mangled are ".1", ".cshrc", "./blah", "..\\blah", ".", "..", and "...". > I have updated the http://jsync.org example to reflect this. This is great and if this question comes up again people should be directed to this web site. Name & syntax are chosen. |
From: Ingy d. N. <in...@in...> - 2010-06-14 23:27:33
|
On Mon, Jun 14, 2010 at 1:51 PM, William Spitzak <sp...@rh...> wrote: > Ingy dot Net wrote: > > Since neither tags nor anchors can contain a space char, you can parse >> them out by matching from the start to the next space. Trivial. >> > > My concern was if the value started with a space, '&', or '!'. I guess the > rule can be that exactly one space is used up. And/or that one of the > alternative syntaxes must be used in these cases. > I like that exact solution best. The other option is escaping: "!foo . s p a c y " > Also this does mean that the %XX encoding is left as-is in the tags & > anchors, although libyaml and perhaps other libraries remove this before the > string is returned. > Yep. Spaces in tags is pretty insane anyway. > {"&": "anchor", ...} >> > > This is equivalent to YAML '&anchor {...}', not '{&anchor ...}', correct? > Yes. > ["&anchor", ...] >> > > I think this is what influenced my proposal, though I read it wrong. This > is I believe equivalent to YAML '&anchor [...]', right? (not '&anchor ...' > which is what I thought). > [ ["&anchor1", "foo"], ["&anchor2 foo"], ["&anchor3 ", "foo"], ] --- - &anchor1 - foo - &anchor2 foo - - &anchor3 '' #anchor on empty string - foo It is not clear how you add a tag or anchor to a non-quoted object such as a > number, however. Perhaps it is ok if quotes are required, or maybe I'm > missing something. This is an interesting question. My take is that tagging is only applied to strings. Since each and every tag is resolved by a function, it seems a fair assumption to make. I realize that there is a distinction in YAML betwixt: --- - !foo "3" - !foo 3 But I don't see a constructor making use of that distinction. What do you think? > > # YAML: <key>: <value> >> "&A": <key>, >> "*A": <value> >> >> This is a fantastic method. Way better than my proposal. I will fully >> adopt it for the forthcoming JSYNC specification. Thank you. >> > > Thanks! Yes I was pretty happy with this, after messing with a bunch of > nested lists and things it suddenly occurred to me. It took Clark, Oren and me about 4 years of "occurring" and arguing over ideas, to come up with YAML. Hopefully this will be much quicker. Speaking of this, there is still an important missing piece in JSYNC. Maybe you can have another stroke of brilliance... We need to support a top level '%TAG' and '%JSYNC' syntax: %YAML 1.2 %TAG ! tag:xyz.com,2010: %TAG !abc! tag:abc.net,2010: --- !thing this: !abc!widget [2, 4] Here is my strawman... There are two kinds of top level in JSYNC: map and sequence. Map seems easier: { "%": "JSYNC 1.0\nTAG ! tag:xyz.com,2010:\nTAG !abc! tag:abc.net,2010:", "!": "thing", "this": ["!abc!widget", 2, 4] } Here is a similar doc as a top level sequence: [ "%JSYNC 1.0\n%TAG ! tag:xyz.com,2010:\n%TAG !abc! tag:abc.net ,2010:\n!thing", ["!abc!widget", 2, 4] ] Whereas ! and & parse to a space or EOS, % parses to newline or EOS. Another concern is whether to support a top level scalar in JSYNC? > All string values that start with 1 or more '.' characters followed >> by any one of '&*!@' have a single '.' removed to get the actual value: >> >> ".!foo" -> "!foo" >> "...!foo" -> "..!foo" >> ".1" -> ".1" # NOTE NO CHANGE! >> "..." -> "..." # AGAIN NO CHANGE! >> >> >> I have agreed with this logic, since you mentioned it. My only worry is >> that it doesn't future-proof well. You might want to escape other leading >> sigils. >> > > I agree, though I only added '@' because that is the only punctuation mark > that YAML reserves for future use (unless I read the spec wrong). I think it > would make sense to add more punctuation. Some strings that I think should > work and not be mangled are ".1", ".cshrc", "./blah", "..\\blah", ".", "..", > and "...". Good starting list. I would want to escape <SPACE>, !, #, $, %, & because then you could tell a JSON encoder to sort mapping keys (if the option is available) and always get the JSYNC fields at the top of the mapping. Not only is this better visually; it might be _necessary_ if we wanted to do streaming JSYNC libraries. As an aside, I think there is about a year of time where you can declare new technology like JSYNC to be alpha, and subject to non-backwards-compatible change. I personally think it is healthy, and avoids things like mandatory tabs in Makefiles! > > I have updated the http://jsync.org example to reflect this. > > This is great and if this question comes up again people should be directed > to this web site. Name & syntax are chosen. > Thanks! The next steps are to write up a syntax spec, then an API spec, then create an implementation or two, then write an implementation guide. Cheers, Ingy |
From: William S. <sp...@rh...> - 2010-06-15 02:42:48
|
Ingy dot Net wrote: > It is not clear how you add a tag or anchor to a non-quoted object > such as a number, however. Perhaps it is ok if quotes are required, > or maybe I'm missing something. > > > This is an interesting question. My take is that tagging is only applied > to strings. Since each and every tag is resolved by a function, it seems > a fair assumption to make. > > I realize that there is a distinction in YAML betwixt: > > --- > - !foo "3" > - !foo 3 > > But I don't see a constructor making use of that distinction. > > What do you think? It seems ok. Certainly whether a string is quoted or not is used by YAML parsers to determine type, but that is only when the tag is missing. I think if there is a tag then it is ok if the value is always quoted. It does mean that you cannot add more keywords than JSON supports as unquoted constants to YAML but I don't think that is being allowed anyway? > We need to support a top level '%TAG' and '%JSYNC' syntax: > > %YAML 1.2 > %TAG ! tag:xyz.com <http://xyz.com>,2010: > %TAG !abc! tag:abc.net <http://abc.net>,2010: > --- !thing > this: !abc!widget [2, 4] > > Here is my strawman... > > There are two kinds of top level in JSYNC: map and sequence. Map seems > easier: > > { > "%": "JSYNC 1.0\nTAG ! tag:xyz.com <http://xyz.com>,2010:\nTAG !abc! > tag:abc.net <http://abc.net>,2010:", > "!": "thing", > "this": ["!abc!widget", 2, 4] > } > > Here is a similar doc as a top level sequence: > > [ > "%JSYNC 1.0\n%TAG ! tag:xyz.com <http://xyz.com>,2010:\n%TAG !abc! > tag:abc.net <http://abc.net>,2010:\n!thing", > ["!abc!widget", 2, 4] > ] I would put each '%' into it's own item. I think of YAML, if you include the "---" dividers, as having a top-level list naturally. So perhaps the conversion of YAML to JSON is to turn it always into a list. Back-conversion will turn that top-level list into --- divided items. > Good starting list. I would want to escape <SPACE>, !, #, $, %, & > because then you could tell a JSON encoder to sort mapping keys (if the > option is available) and always get the JSYNC fields at the top of the > mapping. Not only is this better visually; it might be _necessary_ if we > wanted to do streaming JSYNC libraries. Okay we have the following list, imho: DEFINATELY NEED QUOTING: !, #, %, &, * MAY NEED QUOTING: @ (reserved for future use by YAML) space (I'm not sure why) " and ' (also used by YAML at the start of values) NOT QUOTED: +, -, 0-9 (for numbers) A-Z, a-z, _ (valid identifiers in most languages) All Unicode > 0x7f (match the letters, no UTF-8 decoding needed) / and \ (for . and .. filenames) "nothing": a string of only periods is unchanged (for . and .. filenames) UNKNOWN (but I would say not quoted): C0 control characters such as newline $, (, ), comma, -, :, ;, <, =, >, ?, [, ], ^, `, {, |, }, ~ > > As an aside, I think there is about a year of time where you can declare > new technology like JSYNC to be alpha, and subject to > non-backwards-compatible change. I personally think it is healthy, and > avoids things like mandatory tabs in Makefiles! > > > > I have updated the http://jsync.org example to reflect this. > > This is great and if this question comes up again people should be > directed to this web site. Name & syntax are chosen. > > > Thanks! > > The next steps are to write up a syntax spec, then an API spec, then > create an implementation or two, then write an implementation guide. > > Cheers, Ingy |
From: Ingy d. N. <in...@in...> - 2010-06-15 03:19:37
|
On Mon, Jun 14, 2010 at 7:42 PM, William Spitzak <sp...@rh...> wrote: > > > Ingy dot Net wrote: > > It is not clear how you add a tag or anchor to a non-quoted object >> such as a number, however. Perhaps it is ok if quotes are required, >> or maybe I'm missing something. >> >> >> This is an interesting question. My take is that tagging is only applied >> to strings. Since each and every tag is resolved by a function, it seems a >> fair assumption to make. >> >> I realize that there is a distinction in YAML betwixt: >> >> --- >> - !foo "3" >> - !foo 3 >> >> But I don't see a constructor making use of that distinction. >> >> What do you think? >> > > It seems ok. Certainly whether a string is quoted or not is used by YAML > parsers to determine type, but that is only when the tag is missing. I think > if there is a tag then it is ok if the value is always quoted. It does mean > that you cannot add more keywords than JSON supports as unquoted constants > to YAML but I don't think that is being allowed anyway? > Right. The serialization needs to be 100% JSON. > > We need to support a top level '%TAG' and '%JSYNC' syntax: >> >> %YAML 1.2 >> %TAG ! tag:xyz.com <http://xyz.com>,2010: >> %TAG !abc! tag:abc.net <http://abc.net>,2010: >> >> --- !thing >> this: !abc!widget [2, 4] >> >> Here is my strawman... >> >> There are two kinds of top level in JSYNC: map and sequence. Map seems >> easier: >> >> { >> "%": "JSYNC 1.0\nTAG ! tag:xyz.com <http://xyz.com>,2010:\nTAG !abc! >> tag:abc.net <http://abc.net>,2010:", >> >> "!": "thing", >> "this": ["!abc!widget", 2, 4] >> } >> >> Here is a similar doc as a top level sequence: >> >> [ >> "%JSYNC 1.0\n%TAG ! tag:xyz.com <http://xyz.com>,2010:\n%TAG !abc! tag: >> abc.net <http://abc.net>,2010:\n!thing", >> ["!abc!widget", 2, 4] >> ] >> > > I would put each '%' into it's own item. > > I think of YAML, if you include the "---" dividers, as having a top-level > list naturally. So perhaps the conversion of YAML to JSON is to turn it > always into a list. Back-conversion will turn that top-level list into --- > divided items. Interesting thoughts. Let's let it brew for a while. > > Good starting list. I would want to escape <SPACE>, !, #, $, %, & because >> then you could tell a JSON encoder to sort mapping keys (if the option is >> available) and always get the JSYNC fields at the top of the mapping. Not >> only is this better visually; it might be _necessary_ if we wanted to do >> streaming JSYNC libraries. >> > > Okay we have the following list, imho: > > DEFINATELY NEED QUOTING: > !, #, %, &, * > > MAY NEED QUOTING: > @ (reserved for future use by YAML) > space (I'm not sure why) > because space sorts before '!'. The first printable chars are: <space> ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ I was thinking it would be nice if the JSYNCisms key sorted above the normal keys. > " and ' (also used by YAML at the start of values) > " will always be \" It is required to be valid JSON. > > NOT QUOTED: > +, -, 0-9 (for numbers) > A-Z, a-z, _ (valid identifiers in most languages) > All Unicode > 0x7f (match the letters, no UTF-8 decoding needed) > / and \ (for . and .. filenames) > "nothing": a string of only periods is unchanged (for . and .. filenames) > Agreed. > > UNKNOWN (but I would say not quoted): > C0 control characters such as newline > $, (, ), comma, -, :, ;, <, =, >, ?, [, ], ^, `, {, |, }, ~ > > > > > > >> As an aside, I think there is about a year of time where you can declare >> new technology like JSYNC to be alpha, and subject to >> non-backwards-compatible change. I personally think it is healthy, and >> avoids things like mandatory tabs in Makefiles! >> >> >> > I have updated the http://jsync.org example to reflect this. >> >> This is great and if this question comes up again people should be >> directed to this web site. Name & syntax are chosen. >> >> >> Thanks! >> >> The next steps are to write up a syntax spec, then an API spec, then >> create an implementation or two, then write an implementation guide. >> >> Cheers, Ingy >> > |
From: Matthew W. <mat...@gm...> - 2010-06-09 12:03:56
|
> * A tagged custom type which is constructed from (say) a string rather than from a map - my example was {"$tag": "!!timestamp", "$value": "2001-12-15T02:59:43.1Z"} My apologies, I just spotted this in your example on a second look ("!!date March 2, 1962") and from that I could have guessed what you intended for an anchored string too. I guess there's a trade-off to be made here between conciseness and speed/ease of parsing. One of my goals was to get as much mileage as possible out of fast native JSON parsers, so I tended towards letting the JSON parser do as much of the work as possible, rather than encoding things in strings which I would then need to parse myself in javascript. The only overhead which it adds in the common case, is traversing the parsed JSON tree for objects and checking for at most 3 special key names on each object encountered, whereas yours additionally needs to visit each string with a regexp. But then again my proposal is less concise when it comes to serializing. Using "!", "&", "*" keys (as per your proposal) rather than "$tag", "$anchor", "$alias" etc would help a bit with that though. > "!!int 55": "I can't drive", Presume you also have a plan for the case where the key is an object too? :) > Also, I guess an alternative approach (or perhaps one which is undertaken in parallel?) would be to propose an extension to json with new syntax for references and tags, to match the YAML model, but only the minimal extra syntax required to do so, with the goal of keeping it as easy to parse as possible. > > Explain this more. I don't really see what you are proposing... I was wondering about defining (as a parallel effort) some minimal syntax extensions to JSON which add the features (tags, anchors etc) required for the YAML model, but without going as far as full yaml syntax, eg something like: { "foo": *anchor, "bar": !date &anchor "2010-01-01", 123: 456, ... } It would require an extended JSON parser to deal with it of course, but still a fair bit easier to parse than full YAML syntax. Perhaps a bit of a half-baked idea, but it might complement the goals of JSYNC somewhat, defining something which is easier to parse but supports the full YAML model with tags/aliases, and also offering a version of it which is embedded within plain JSON as an alternative for cases where you want to be able to use a plain JSON parser. -Matt |
From: Ingy d. N. <in...@in...> - 2010-06-09 17:32:22
|
On Wed, Jun 9, 2010 at 5:03 AM, Matthew Willson <mat...@gm...>wrote: > * A tagged custom type which is constructed from (say) a string rather than > from a map - my example was {"$tag": "!!timestamp", "$value": > "2001-12-15T02:59:43.1Z"} > > > My apologies, I just spotted this in your example on a second look ("!!date > March 2, 1962") and from that I could have guessed what you intended for an > anchored string too. > > I guess there's a trade-off to be made here between conciseness > and speed/ease of parsing. One of my goals was to get as much mileage as > possible out of fast native JSON parsers, so I tended towards letting the > JSON parser do as much of the work as possible, rather than encoding things > in strings which I would then need to parse myself in javascript. The only > overhead which it adds in the common case, is traversing the parsed JSON > tree for objects and checking for at most 3 special key names on each object > encountered, whereas yours additionally needs to visit each string with a > regexp. > I wouldn't design a language based on a premature optimization of an implementation detail. With all the code needed to make a JSYNC implementation, I doubt that detail would make any noticeable difference. On the other hand, making the the language overly complex surely would be noticed. A couple more points, if you aren't convinced. First, anchored regexes are very fast operations. Second, you haven't avoided traversing the entire tree. In your example, every leaf node would need to be examined to see if it was one of your special objects. And speaking of objects, in your example you use an object with 2 pairs of 4 strings, where I use a single string. So not only is your serialization longer, requiring more memory, but your in memory representation is huge compared to a simple string. If this causes more mallocs, you can throw out any savings you _might_ get avoiding a regexp. To me, none of that matters. The real issue is creating a nice, adoptable language. But then again my proposal is less concise when it comes to serializing. > Using "!", "&", "*" keys (as per your proposal) rather than "$tag", > "$anchor", "$alias" etc would help a bit with that though. > > "!!int 55": "I can't drive", > > > Presume you also have a plan for the case where the key is an object too? > :) > I do. It was talked about in an earlier discussion on this list. It's not pretty, but it works. And objects as keys are not very common so that's OK with me. { " ": { "JSYNC": "1.0", "TAG": { "!": "tag:ingy.net,2005:" }, "keys": { "&001": [2,5], "&002": [6,6] }, }, "!": "diceRolls", "*001": 14, "*002": 21 } %TAG ! tag:ingy.net,2005: --- !diceRolls [2, 5]: 14 [6, 6]: 21 ... Basically a key containing a single space, points to an object that can contain extra YAML information. It can also contain objects to be used as keys later, by reference. That's the only way I can think of to do complex keys in JSON. Also, I guess an alternative approach (or perhaps one which is undertaken in >> parallel?) would be to propose an extension to json with new syntax for >> references and tags, to match the YAML model, but only the minimal extra >> syntax required to do so, with the goal of keeping it as easy to parse as >> possible. >> > > Explain this more. I don't really see what you are proposing... > > > I was wondering about defining (as a parallel effort) some minimal syntax > extensions to JSON which add the features (tags, anchors etc) required for > the YAML model, but without going as far as full yaml syntax, eg something > like: > > { > "foo": *anchor, > "bar": !date &anchor "2010-01-01", > 123: 456, > ... > } > > It would require an extended JSON parser to deal with it of course, but > still a fair bit easier to parse than full YAML syntax. Perhaps a bit of a > half-baked idea, but it might complement the goals of JSYNC somewhat, > defining something which is easier to parse but supports the full YAML model > with tags/aliases, and also offering a version of it which is embedded > within plain JSON as an alternative for cases where you want to be able to > use a plain JSON parser. > The above is valid YAML[1], so effectively you are defining another subset of YAML. I would advise against this. Other people have defined non standard subsets of YAML, in the name of simplicity. (YAML::Tiny in Perl). I think this just muddies the waters, and confuses people. It would be better to get all the YAML implementations working properly and in harmony, with similar APIs. I think JSYNC, would facilitate that. Cheers, Ingy [1] It is actually invalid YAML because the alias precedes the anchor. -Matt > |
From: Matthew W. <mat...@gm...> - 2010-06-09 18:07:11
|
> I wouldn't design a language based on a premature optimization of an implementation detail. With all the code needed to make a JSYNC implementation, I doubt that detail would make any noticeable difference. ... > If this causes more mallocs, you can throw out any savings you _might_ get avoiding a regexp. On the other hand, with languages like javascript with immutable strings, you have to malloc more strings as a result of any regexp matches. But yes you may have a point there. Would have to measure to tell for sure and would probably depend on the shape of the data. I don't think trying to optimise this (whether for parsing speed, or for size in memory, or size on the wire) is premature though. At least for me, the only reason I'm considering a somewhat hacky embedding like this is because I'm running under performance constraints in a somewhat limiting environment, and want to leverage the fast native JSON parser available in that environment to the maximum extent possible when it comes to deserializing YAML data. Arguments about elegance seem a bit besides the point---while neither of them look awful, this is going to be a somewhat clunky syntax hack at best IMO, that's just the nature of it. Personally I felt mine was slightly more straightforward on the syntax front if more verbose, but I'm not too bothered either way. > The above is valid YAML[1], so effectively you are defining another subset of YAML. I would advise against this. Other people have defined non standard subsets of YAML, in the name of simplicity. (YAML::Tiny in Perl). I think this just muddies the waters, and confuses people. It would be better to get all the YAML implementations working properly and in harmony, with similar APIs. I think JSYNC, would facilitate that. Fair enough. Although I think the complexity of YAML syntax is a problem (of which JSYNC is one symptom), and so some standardised subset of the syntax which enables the full semantics would be nice. If others have already managed to 'muddy the waters' when it comes to standardising this, that's too bad. Anyway will be interested to see the spec for JSYNC when it arrives; perhaps at that point I'll have a go at benchmarking a javascript implementation against the simpler (but less general) approach which we're using at the moment. -Matt |
From: William S. <sp...@rh...> - 2010-06-15 16:48:40
|
Ingy dot Net wrote: > Right. The serialization needs to be 100% JSON. This means that any unquoted value in YAML, if the lack of quotes is used to determine the default tag, must be tagged in the conversion. What I am unclear on is when this is allowed, the YAML spec seems to skim over this but it certainly is using unquoted to distinguish the 'null' from the 4-letter string "null". > because space sorts before '!'. The first printable chars are: > > <space> ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ > > I was thinking it would be nice if the JSYNCisms key sorted above the > normal keys. I think I see but that would require all bytes < '.' to be dot-prefixed and none above to require this. This would require all control characters and double-quote to be dot-prefixed, despite the fact that they also need backslash escape for JSON. And '+' and '-' despite the fact that they are allowed in unquoted JSON. Also '(' would require it. And YAML has reserved '@' for future use so it seems that should require a dot. So my feeling is that the short, but unfortunately non-contiguous set of "!%&*@" are the only special prefix characters. |
From: Ingy d. N. <in...@in...> - 2010-06-15 18:00:08
|
On Tue, Jun 15, 2010 at 9:48 AM, William Spitzak <sp...@rh...> wrote: > Ingy dot Net wrote: > > Right. The serialization needs to be 100% JSON. >> > > This means that any unquoted value in YAML, if the lack of quotes is used > to determine the default tag, must be tagged in the conversion. What I am > unclear on is when this is allowed, the YAML spec seems to skim over this > but it certainly is using unquoted to distinguish the 'null' from the > 4-letter string "null". Right, this is called "implicit typing" in YAML. Implicitly typing on quoted strings may be forced in YAML by using a single !. --- number: ! "42" I think we can do the same for JSYNC. because space sorts before '!'. The first printable chars are: > > <space> ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ > > I was thinking it would be nice if the JSYNCisms key sorted above the > normal keys. > I think I see but that would require all bytes < '.' to be dot-prefixed and > none above to require this. This would require all control characters and > double-quote to be dot-prefixed, despite the fact that they also need > backslash escape for JSON. And '+' and '-' despite the fact that they are > allowed in unquoted JSON. Also '(' would require it. And YAML has reserved > '@' for future use so it seems that should require a dot. > > So my feeling is that the short, but unfortunately non-contiguous set of > "!%&*@" are the only special prefix characters. > You are probably right. Let's start with that set. Cheers, Ingy |
From: Brad B. <bm...@ma...> - 2010-06-15 18:36:37
|
On Tue, Jun 15, 2010 at 2:00 PM, Ingy dot Net <in...@in...> wrote: > On Tue, Jun 15, 2010 at 9:48 AM, William Spitzak <sp...@rh...> wrote: >> So my feeling is that the short, but unfortunately non-contiguous set of >> "!%&*@" are the only special prefix characters. > > You are probably right. Let's start with that set. > <bikeshed>You can call them the cursing characters.</bikeshed> |
From: Ingy d. N. <in...@in...> - 2010-06-15 18:59:44
|
On Tue, Jun 15, 2010 at 11:36 AM, Brad Baxter <bm...@ma...> wrote: > On Tue, Jun 15, 2010 at 2:00 PM, Ingy dot Net <in...@in...> wrote: >> On Tue, Jun 15, 2010 at 9:48 AM, William Spitzak <sp...@rh...> wrote: >>> So my feeling is that the short, but unfortunately non-contiguous set of >>> "!%&*@" are the only special prefix characters. >> >> You are probably right. Let's start with that set. >> > > <bikeshed>You can call them the cursing characters.</bikeshed> <lol>:-D</lol> |