From: Oren Ben-K. <or...@be...> - 2012-01-01 07:10:48
|
On Sun, Jan 1, 2012 at 5:12 AM, Jake Buurma <ja...@si...> wrote: > > On 12/31/2011 5:16 PM, Geoff Adams wrote: > > Hello, > > I'm trying to understand the process of tag resolution in YAML 1.2, and > some of the language in the spec seems unclear to me. In particular, I am > trying to understand the implications of this sentence from section 3.3.2: > > "Resolving the tag of a node must only depend on the following three > parameters: (1) the non-specific tag of the node, (2) the path leading from > the root to the node, and (3) the content (and hence the kind) of > the node." > > To understand rule (2), I need to know the definition of a path to a > node, which does not seem to be given anywhere in the spec. From the rest > of the section, it seems that the key node associated with a value node in > a mapping is part of the value node's path. > > Yes, this probably needs additional wording to be absolutely correct. > My guess is that a path is then a list of nodes, constructed as follows: > > For the root node: > - Nothing, or a root path > For a value node in a mapping: > - The path to the corresponding key node > - The key node > For a key node in a mapping: > - The path to the mapping node > - The mapping node > For a node in a sequence: > - The path to the sequence node > - The sequence node > > Is this correct? > > Pretty much. > I am also unclear about the interpretation of the rule, "resolution must > not consider the content of any other node, except for the content of > the key nodes directly along the path leading from the root to the > resolved node." Is the tag of a node considered part of its content? > > That is a tricky question; I was just thinking about it the other day and you are correct in saying: > Section 3.2.1.1 suggests that it is not. However, if the tag of a > collection may be considered as part of the resolution process of its > contents (rule 2), then attempting to resolve the tag of such a collection > by rule (3) could potentially lead to circularity: the tag of the > collection depends on the tags of nodes in its contents, but those tags may > depend on the tag of the collection. > > Right. Take the following example: a simple "point" tag, where a point is a mapping with an "x" key and a "y" key. We can reasonably say: well, if the mapping has a "point" tag, and we are at the "x" key, then the value is expected to be a number - call this "top-down tag resolution". At the same time, we can reasonably say: well, if a mapping node has an "x" key with a number value and a "y" key with a number value, it is a "point" - call this "bottom up tag resolution". People intuitively expect both directions to "work", but as you point out, this makes the order of resolving the tags a tricky issue. Hmmm. YAML was designed to allow, "as much as possible", for streaming processing. The ideal was that a node would be able to be resolved "as soon as it was read". Now, both top-down and bottom-up tag resolution orders allow this, so it doesn't help us choose between them, and anyway as I said above, people "expect" both directions to work. Double hmmm. > Of course the current core schema does not have this problem, as it never > applies rule (2), nor does it apply rule (3) to collections. But it seems > that any schema that did both of these would need to be written carefully > to avoid potentially subtle circular dependencies. Is this considered a > problem for the schema authors rather than for the YAML spec itself? > > My intuition is that a schema specifies both top-down rules and bottom-up rules, and that top-down rules trump bottom-up rules. However, this requires clarification. A schema specifies that a collection node with some tag C has some internal structure - e.g., that it contains sub-nodes with tags S_i. This can be viewed both as a top-down rule (if you see an explicit tag C, automatically assign tags S_i to the sub-nodes), or as a bottom-up rule (if you see sub-nodes with tags S_i, assign the tag C to the collection). During tag resolution, if the collection node has an explicit tag C, then we can apply the top-down rule and automatically assign the tags S_i to the sub-nodes. This should trump any bottom-up tag resolution for these nodes. In this case, if the sub-nodes have incompatible explicit tags, then we have invalid input (along the same lines as having incompatible physical structure). However, if a collection node has no explicit tag, then we must rely on bottom-up rules to resolve the tags of its sub-nodes. In this case, if the sub-nodes have an explicit tag, we just use it. Having done all that, we now employ the bottom-up rule and resolve the collection to have the tag C if it contains the expected sub-nodes with the correct tags S_i. This raises the question of what to do if the collection has the expected structure (e.g., the right set of keys in a mapping - "x" and "y" for a "point") but having the wrong tags (e.g., a string value instead of a number value for the "x" key). Should we make this an error, or do we simply resolve the tag to be something else (e.g., a regular mapping)? My intuition is that we should do the latter; an application that relies on the collection having a specific tag ("point") would barf at getting a general hash table if/when it matters to it, and if it doesn't care, why should we? The bottom line is, if we explicitly tag the root node of a document, we can (mostly) simply derive all the sub-nodes tags and need hardly ever rely on pattern matching. This is the most efficient and most type-safe option. However, if we haven't explicitly tagged the root node, or if the schema contains areas where "it might be either X or Y or Z", then we would need to rely on pattern matching (I'm using this term to cover both regexps matched to scalars and "expected structure" matched to collections). This is the most flexible but less type-safe option. I guess 3.3.2 needs to be expanded to include the above clarification. I think it is the only way this "could ever work" in practice given the current wording but I admit that as it stands it is a rather opaque statement. This was a great spec clarification question - thanks! Have fun, Oren Ben-Kiki |