From: Geoff A. <geo...@gm...> - 2012-01-01 01:17:05
|
Hello, I'm trying to understand the process of tag resolution in YAML 1.2, and some of the language in the spec seems unclear to me. In particular, I am trying to understand the implications of this sentence from section 3.3.2: "Resolving the tag of a node must only depend on the following three parameters: (1) the non-specific tag of the node, (2) the path leading from the root to the node, and (3) the content (and hence the kind) of the node." To understand rule (2), I need to know the definition of a path to a node, which does not seem to be given anywhere in the spec. From the rest of the section, it seems that the key node associated with a value node in a mapping is part of the value node's path. My guess is that a path is then a list of nodes, constructed as follows: For the root node: - Nothing, or a root path For a value node in a mapping: - The path to the corresponding key node - The key node For a key node in a mapping: - The path to the mapping node - The mapping node For a node in a sequence: - The path to the sequence node - The sequence node Is this correct? I am also unclear about the interpretation of the rule, "resolution must not consider the content of any other node, except for the content of the key nodes directly along the path leading from the root to the resolved node." Is the tag of a node considered part of its content? Section 3.2.1.1 suggests that it is not. However, if the tag of a collection may be considered as part of the resolution process of its contents (rule 2), then attempting to resolve the tag of such a collection by rule (3) could potentially lead to circularity: the tag of the collection depends on the tags of nodes in its contents, but those tags may depend on the tag of the collection. Of course the current core schema does not have this problem, as it never applies rule (2), nor does it apply rule (3) to collections. But it seems that any schema that did both of these would need to be written carefully to avoid potentially subtle circular dependencies. Is this considered a problem for the schema authors rather than for the YAML spec itself? Thanks for any clarifications. Regards and happy new year, Geoff Adams |
From: Jake B. <ja...@si...> - 2012-01-01 03:44:15
|
On 12/31/2011 5:16 PM, Geoff Adams wrote: > Hello, > > I'm trying to understand the process of tag resolution in YAML 1.2, > and some of the language in the spec seems unclear to me. In > particular, I am trying to understand the implications of this > sentence from section 3.3.2: > > "Resolving the tag of a node must only depend on the following three > parameters: (1) the non-specific tag of the node, (2) the path leading > from the root to the node, and (3) the content (and hence the kind) of > the node." > > To understand rule (2), I need to know the definition of a path to a > node, which does not seem to be given anywhere in the spec. From the > rest of the section, it seems that the key node associated with a > value node in a mapping is part of the value node's path. My guess is > that a path is then a list of nodes, constructed as follows: > > For the root node: > - Nothing, or a root path > For a value node in a mapping: > - The path to the corresponding key node > - The key node > For a key node in a mapping: > - The path to the mapping node > - The mapping node > For a node in a sequence: > - The path to the sequence node > - The sequence node > > Is this correct? > > I am also unclear about the interpretation of the rule, "resolution > must not consider the content of any other node, except for > the content of the key nodes directly along the path leading from > the root to the resolved node." Is the tag of a node considered part > of its content? Section 3.2.1.1 suggests that it is not. However, if > the tag of a collection may be considered as part of the resolution > process of its contents (rule 2), then attempting to resolve the tag > of such a collection by rule (3) could potentially lead to > circularity: the tag of the collection depends on the tags of nodes in > its contents, but those tags may depend on the tag of the collection. > Of course the current core schema does not have this problem, as it > never applies rule (2), nor does it apply rule (3) to collections. > But it seems that any schema that did both of these would need to be > written carefully to avoid potentially subtle circular dependencies. > Is this considered a problem for the schema authors rather than for > the YAML spec itself? > > Thanks for any clarifications. > > Regards and happy new year, > Geoff Adams > > > ------------------------------------------------------------------------------ > Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex > infrastructure or vast IT resources to deliver seamless, secure access to > virtual desktops. With this all-in-one solution, easily deploy virtual > desktops for less than the cost of PCs and save 60% on VDI infrastructure > costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox > > > _______________________________________________ > Yaml-core mailing list > Yam...@li... > https://lists.sourceforge.net/lists/listinfo/yaml-core |
From: Oren Ben-K. <or...@be...> - 2012-01-01 07:10:48
|
On Sun, Jan 1, 2012 at 5:12 AM, Jake Buurma <ja...@si...> wrote: > > On 12/31/2011 5:16 PM, Geoff Adams wrote: > > Hello, > > I'm trying to understand the process of tag resolution in YAML 1.2, and > some of the language in the spec seems unclear to me. In particular, I am > trying to understand the implications of this sentence from section 3.3.2: > > "Resolving the tag of a node must only depend on the following three > parameters: (1) the non-specific tag of the node, (2) the path leading from > the root to the node, and (3) the content (and hence the kind) of > the node." > > To understand rule (2), I need to know the definition of a path to a > node, which does not seem to be given anywhere in the spec. From the rest > of the section, it seems that the key node associated with a value node in > a mapping is part of the value node's path. > > Yes, this probably needs additional wording to be absolutely correct. > My guess is that a path is then a list of nodes, constructed as follows: > > For the root node: > - Nothing, or a root path > For a value node in a mapping: > - The path to the corresponding key node > - The key node > For a key node in a mapping: > - The path to the mapping node > - The mapping node > For a node in a sequence: > - The path to the sequence node > - The sequence node > > Is this correct? > > Pretty much. > I am also unclear about the interpretation of the rule, "resolution must > not consider the content of any other node, except for the content of > the key nodes directly along the path leading from the root to the > resolved node." Is the tag of a node considered part of its content? > > That is a tricky question; I was just thinking about it the other day and you are correct in saying: > Section 3.2.1.1 suggests that it is not. However, if the tag of a > collection may be considered as part of the resolution process of its > contents (rule 2), then attempting to resolve the tag of such a collection > by rule (3) could potentially lead to circularity: the tag of the > collection depends on the tags of nodes in its contents, but those tags may > depend on the tag of the collection. > > Right. Take the following example: a simple "point" tag, where a point is a mapping with an "x" key and a "y" key. We can reasonably say: well, if the mapping has a "point" tag, and we are at the "x" key, then the value is expected to be a number - call this "top-down tag resolution". At the same time, we can reasonably say: well, if a mapping node has an "x" key with a number value and a "y" key with a number value, it is a "point" - call this "bottom up tag resolution". People intuitively expect both directions to "work", but as you point out, this makes the order of resolving the tags a tricky issue. Hmmm. YAML was designed to allow, "as much as possible", for streaming processing. The ideal was that a node would be able to be resolved "as soon as it was read". Now, both top-down and bottom-up tag resolution orders allow this, so it doesn't help us choose between them, and anyway as I said above, people "expect" both directions to work. Double hmmm. > Of course the current core schema does not have this problem, as it never > applies rule (2), nor does it apply rule (3) to collections. But it seems > that any schema that did both of these would need to be written carefully > to avoid potentially subtle circular dependencies. Is this considered a > problem for the schema authors rather than for the YAML spec itself? > > My intuition is that a schema specifies both top-down rules and bottom-up rules, and that top-down rules trump bottom-up rules. However, this requires clarification. A schema specifies that a collection node with some tag C has some internal structure - e.g., that it contains sub-nodes with tags S_i. This can be viewed both as a top-down rule (if you see an explicit tag C, automatically assign tags S_i to the sub-nodes), or as a bottom-up rule (if you see sub-nodes with tags S_i, assign the tag C to the collection). During tag resolution, if the collection node has an explicit tag C, then we can apply the top-down rule and automatically assign the tags S_i to the sub-nodes. This should trump any bottom-up tag resolution for these nodes. In this case, if the sub-nodes have incompatible explicit tags, then we have invalid input (along the same lines as having incompatible physical structure). However, if a collection node has no explicit tag, then we must rely on bottom-up rules to resolve the tags of its sub-nodes. In this case, if the sub-nodes have an explicit tag, we just use it. Having done all that, we now employ the bottom-up rule and resolve the collection to have the tag C if it contains the expected sub-nodes with the correct tags S_i. This raises the question of what to do if the collection has the expected structure (e.g., the right set of keys in a mapping - "x" and "y" for a "point") but having the wrong tags (e.g., a string value instead of a number value for the "x" key). Should we make this an error, or do we simply resolve the tag to be something else (e.g., a regular mapping)? My intuition is that we should do the latter; an application that relies on the collection having a specific tag ("point") would barf at getting a general hash table if/when it matters to it, and if it doesn't care, why should we? The bottom line is, if we explicitly tag the root node of a document, we can (mostly) simply derive all the sub-nodes tags and need hardly ever rely on pattern matching. This is the most efficient and most type-safe option. However, if we haven't explicitly tagged the root node, or if the schema contains areas where "it might be either X or Y or Z", then we would need to rely on pattern matching (I'm using this term to cover both regexps matched to scalars and "expected structure" matched to collections). This is the most flexible but less type-safe option. I guess 3.3.2 needs to be expanded to include the above clarification. I think it is the only way this "could ever work" in practice given the current wording but I admit that as it stands it is a rather opaque statement. This was a great spec clarification question - thanks! Have fun, Oren Ben-Kiki |
From: Geoff A. <geo...@gm...> - 2012-01-03 03:06:04
|
On Sun, Jan 1, 2012 at 1:46 AM, Oren Ben-Kiki <or...@be...> wrote: Right. Take the following example: a simple "point" tag, where a point is a > mapping with an "x" key and a "y" key. We can reasonably say: well, if the > mapping has a "point" tag, and we are at the "x" key, then the value is > expected to be a number - call this "top-down tag resolution". At the same > time, we can reasonably say: well, if a mapping node has an "x" key with a > number value and a "y" key with a number value, it is a "point" - call this > "bottom up tag resolution". > > People intuitively expect both directions to "work", but as you point out, > this makes the order of resolving the tags a tricky issue. > Hmmm. > > YAML was designed to allow, "as much as possible", for streaming > processing. The ideal was that a node would be able to be resolved "as soon > as it was read". Now, both top-down and bottom-up tag resolution orders > allow this, so it doesn't help us choose between them, and anyway as I said > above, people "expect" both directions to work. > > Double hmmm. > >> Of course the current core schema does not have this problem, as it never >> applies rule (2), nor does it apply rule (3) to collections. But it seems >> that any schema that did both of these would need to be written carefully >> to avoid potentially subtle circular dependencies. Is this considered a >> problem for the schema authors rather than for the YAML spec itself? >> >> My intuition is that a schema specifies both top-down rules and bottom-up > rules, and that top-down rules trump bottom-up rules. However, this > requires clarification. > > A schema specifies that a collection node with some tag C has some > internal structure - e.g., that it contains sub-nodes with tags S_i. This > can be viewed both as a top-down rule (if you see an explicit tag C, > automatically assign tags S_i to the sub-nodes), or as a bottom-up rule (if > you see sub-nodes with tags S_i, assign the tag C to the collection). > > During tag resolution, if the collection node has an explicit tag C, then > we can apply the top-down rule and automatically assign the tags S_i to the > sub-nodes. This should trump any bottom-up tag resolution for these nodes. > In this case, if the sub-nodes have incompatible explicit tags, then we > have invalid input (along the same lines as having incompatible physical > structure). > > However, if a collection node has no explicit tag, then we must rely on > bottom-up rules to resolve the tags of its sub-nodes. In this case, if the > sub-nodes have an explicit tag, we just use it. Having done all that, we > now employ the bottom-up rule and resolve the collection to have the tag C > if it contains the expected sub-nodes with the correct tags S_i. > > This raises the question of what to do if the collection has the expected > structure (e.g., the right set of keys in a mapping - "x" and "y" for a > "point") but having the wrong tags (e.g., a string value instead of a > number value for the "x" key). Should we make this an error, or do we > simply resolve the tag to be something else (e.g., a regular mapping)? My > intuition is that we should do the latter; an application that relies on > the collection having a specific tag ("point") would barf at getting a > general hash table if/when it matters to it, and if it doesn't care, why > should we? > This makes a lot of sense to me. Although I can also imagine some rather perverse cases. Suppose there's a "point" tag like you describe, with an "x" key and a "y" key, both of which are numbers. But suppose there's also a "male-genotype" tag, with an "x" key and a "y" key, both of which are strings identifying particular genotypes for the X and Y sex chromosomes. Obviously I'm stretching things a bit here, but how would this be resolved without explicit tags? Consider: --- case A: x: Genotype-1 y: Genotype-2 case B: x: 1 y: 2 ... Now, the value of case A obviously can't be a point, but for case B, the values 1 and 2 could theoretically be interpreted as strings or as numbers. So in this case the normal order of resolution applies? That is, in the absence of any top-down information, we fall back on our usual bottom-up rule for scalars, which would assign 1 and 2 as numbers, and thus case B is a "point" rather than a "male-genotype". Obviously, anyone dealing with an application that requires two data structures that could look that similar to each other should ultimately be responsible for ensuring that they're disambiguated by explicitly tagging them. But I like that this approach at least gives a generic schema-based tag resolver an unambiguous strategy for resolution. Just to be even more perverse, though, suppose the application also has a "chromosome-type" tag, which applies to some specially-formatted string that the application can decode. A string that doesn't match this format is resolved just as a standard !!str. Now suppose the "male-genotype" is a mapping with keys "x" and "y", with both values as "chromosome-types". But, suppose further that chromosome-types occur only within "male-genotypes"; a string elsewhere in the document that would otherwise match the chromosome-type pattern is still interpreted as a string. If, in the absence of any explicit tags, we rely ONLY on bottom-up rules for resolution, it seems like we could never resolve these strings as "chromosome-types." On the other hand, we could pursue a strategy of testing various possible resolutions iteratively. That is, first, could this be a male-genotype? If that fails, could this be a point? If that fails, then it is a generic mapping. That would lead to the opposite outcome for the previous example, though. Maybe this usage is really too arcane to even worry about. I'm mostly just curious to hear your take on it. The bottom line is, if we explicitly tag the root node of a document, we > can (mostly) simply derive all the sub-nodes tags and need hardly ever rely > on pattern matching. This is the most efficient and most type-safe option. > However, if we haven't explicitly tagged the root node, or if the schema > contains areas where "it might be either X or Y or Z", then we would need > to rely on pattern matching (I'm using this term to cover both regexps > matched to scalars and "expected structure" matched to collections). This > is the most flexible but less type-safe option. > > I guess 3.3.2 needs to be expanded to include the above clarification. I > think it is the only way this "could ever work" in practice given the > current wording but I admit that as it stands it is a rather opaque > statement. > > Thanks for the explanation! It was great to hear your thoughts on the topic. Regards, Geoff Adams |