From: Oren Ben-K. <or...@be...> - 2004-09-07 19:58:21
|
Here's the summary of the conclusion Clark and I reached in an IRC session. The following is a slight modification to the current spec that seems to fix most of the warts. Brian didn't see this yet, but both Clark and I feel very good about this. - Scalar nodes that have no explicit tag, and that are written in any style except the "plain" style, are reported by the parser as if they were associated with the tag "tag:yaml.org,2002:str". - All other nodes that have no explicit tag (both plain scalars and untagged collection nodes) are reported by the parser as "having no tag" (having a NULL tag). - A node tagged with an explicit "null tag" is also reported as "having no tag": --- this has no tag: ! "23" just like this: 23 ... This allows non-plain scalar types to be "untagged". Note that the semantics of "!" can not be overriden, and that using a double "!!" is invalid: --- this is invalid: !! 23 ... - If a YAML document is loaded "all the way" to what is called a "complete representation", each node is converted to an appropriate native object (what Brian likes to call a "cave drawing"). - For each node, this inevitably requires two steps: (1) deciding on the type of the "cave drawing" to use; (2) converting the node to the "cave drawing" of that type. - For nodes with tags, deciding on the type of "cave drawing" is based on the tag. For nodes without tags, deciding on the type of "cave drawing" is based on the value of the node. Specifically, the decision must not depend on any syntactical details, and must not depend on the node's location in the document or the value of any other node other than the one being "typed". Note the node may be a collection, in which case all its content is available as input. (But see below) - It is required that each type of "cave drawing" used by the application have a name (== tag). Note this name (tag) may be private ("!foo") or global (a URI). It just needs to have _some_ sort of name. - Therefore, deciding on the type of "cave drawing" for an untagged node can be expressed as "deciding on the tag of the node". Hence, this process is called "tag resolution". That's it. Implications: - An implementation is free to go directly from an untagged node to a "cave drawing". An implementation may go directly from syntax to "cave drawing", for that matter. The spec places no limitations on the specific APIs or implementation details. "What is not explicitly forbidden is allowed". - An implementation is free to use any type of "cave drawing" for any node. For example, it can load all scalar nodes into the integer 12. However, if the "cave drawing" chosen does not obey the semantics set down by the node's tag, the application is said to have "transformed" the document rather than merely "loaded" it. Note this is perfectly OK in some contexts. - In contrast, the action of "filling in the blanks" done by the (possibly implicit, hidden) "tag resolution" step does not transform thge document. The "complete representation" of the document is _not_ modified by this step. Naturally, the representation is changed from being a "partial" one to a "complete" one. Round-tripping may/should/not reverse this process as appropriate (this isn't different from the rest of the round-tripping issues, including indentation, comments, anchors, tag prefixes etc.) - Why is restrict "tag resolution" to considering the value of a node? For the purpose of comparing tags for equality, it must be that 'NULL' == 'NULL'. For example, the following must be invalid due to a duplicate key: --- a : foo a : bar ... If the tag resolution is restricted to examing the value of each node, then the normal rule of comparing nodes (tag == tag && value == value) just keeps on working (where NULL == NULL). An implementation MAY use a more complex way to decide on the type of "cave drawing" to load each untagged into. However, using a more complex way is considered "transforming" the document rather than "loading" it. Again, this isn't forbidden, and it makes sense in certain contexts. It is just a different operation. Of course, this means that: --- "a" : foo a : foo ... Would not be caught as a duplicate key prior to tag resolution. There's really no helping it... This is the same problem as: --- !!int 10 : foo !!int 012 : bar ... Some duplicate keys can only be caught in the process of construction the "cave drawings", regardless of the issue of implicit tags. Have fun, Oren Ben-Kiki |
From: Clark C. E. <cc...@cl...> - 2004-09-07 20:37:17
|
On Tue, Sep 07, 2004 at 10:58:14PM +0300, Oren Ben-Kiki wrote: | - Scalar nodes that have no explicit tag, and that are written in any | style except the "plain" style, are reported by the parser as if they | were associated with the tag "tag:yaml.org,2002:str". | | - All other nodes that have no explicit tag (both plain scalars and | untagged collection nodes) are reported by the parser as "having no | tag" (having a NULL tag). | | - A node tagged with an explicit "null tag" is also reported as "having | no tag": | | --- | this has no tag: ! "23" | just like this: 23 | ... --- - "string" - "null" - [] - {} ... is equivalent to --- - !!str 'string' - ! 'null' - ! [] - ! {} | This allows non-plain scalar types to be "untagged". Note that the | semantics of "!" can not be overriden, and that using a double "!!" is | invalid: | | --- | this is invalid: !! 23 | ... | | - If a YAML document is loaded "all the way" to what is called a | "complete representation", each node is converted to an appropriate | native object (what Brian likes to call a "cave drawing"). | | - For each node, this inevitably requires two steps: (1) deciding on the | type of the "cave drawing" to use; (2) converting the node to the "cave | drawing" of that type. A complete representation happens if and only if: - there is a partial representation - every NULL tag is provided, and - each tag is recognized (i.e. a canonical form is possible) | | - For nodes with tags, deciding on the type of "cave drawing" is based | on the tag. For nodes without tags, deciding on the type of "cave | drawing" is based on the value of the node. | | Specifically, the decision must not depend on any syntactical details, | and must not depend on the node's location in the document or the value | of any other node other than the one being "typed". Note the node may | be a collection, in which case all its content is available as input. | (But see below) | | - It is required that each type of "cave drawing" used by the | application have a name (== tag). Note this name (tag) may be private | ("!foo") or global (a URI). It just needs to have _some_ sort of name. | | - Therefore, deciding on the type of "cave drawing" for an untagged node | can be expressed as "deciding on the tag of the node". Hence, this | process is called "tag resolution". In short, a tag resolution is a simple transformation where only NULL tags are replaced using only the content of the given node. This word is nice; it gives us a special fuzzy feeling that we havent' really changed the intent of the document (which a transform would imply) but instead are simply filling-in the missing details | That's it. Implications: | | - An implementation is free to go directly from an untagged node to a | "cave drawing". An implementation may go directly from syntax to "cave | drawing", for that matter. The spec places no limitations on the | specific APIs or implementation details. "What is not explicitly | forbidden is allowed". With the simple resolution rule: (SCALAR, NULL) -> '!my-special-variant-tag' (MAPPING, NULL) -> 'tag:yaml.org,2002:map' (SEQUENCE, NULL) -> 'tag:yaml.org,2002:seq' You can consider resolution "completely optional", or at least a simple process that can be skipped. ;) | | - An implementation is free to use any type of "cave drawing" for any | node. For example, it can load all scalar nodes into the integer 12. | However, if the "cave drawing" chosen does not obey the semantics set | down by the node's tag, the application is said to have "transformed" | the document rather than merely "loaded" it. Note this is perfectly OK | in some contexts. | | - In contrast, the action of "filling in the blanks" done by the | (possibly implicit, hidden) "tag resolution" step does not transform | thge document. The "complete representation" of the document is _not_ | modified by this step. Naturally, the representation is changed from | being a "partial" one to a "complete" one. Round-tripping | may/should/not reverse this process as appropriate (this isn't | different from the rest of the round-tripping issues, including | indentation, comments, anchors, tag prefixes etc.) The complete representation is not modified by this step, beacuse until all NULL tags are provided, a complete representation does not exist. | - Why is restrict "tag resolution" to considering the value of a node? | | For the purpose of comparing tags for equality, it must be that 'NULL' | == 'NULL'. For example, the following must be invalid due to a | duplicate key: | | --- | a : foo | a : bar | ... This prevents the unexpected resolution, --- !foo a: foo !bar a: bar | If the tag resolution is restricted to examing the value of each node, | then the normal rule of comparing nodes (tag == tag && value == value) | just keeps on working (where NULL == NULL). Note that NULL semantics here are equivalent to an empty string, '' therefore an implementation can use an empty string for this purpose. Or, really, any special string that does not occur in the wild. | An implementation MAY use a more complex way to decide on the type of | "cave drawing" to load each untagged into. However, using a more | complex way is considered "transforming" the document rather than | "loading" it. Again, this isn't forbidden, and it makes sense in | certain contexts. It is just a different operation. | | Of course, this means that: | | --- | "a" : foo | a : foo | ... | | Would not be caught as a duplicate key prior to tag resolution. There's | really no helping it... This is the same problem as: | | --- | !!int 10 : foo | !!int 012 : bar | ... similar not the same, in the former case, after resolution the document could be equivlaent to, --- !!str 'a': foo !!str 'a': foo ... Which would not be well formed. Thus, tag recognition is not even needed here, where it is in the prior case. | | Some duplicate keys can only be caught in the process of construction | the "cave drawings", regardless of the issue of implicit tags. Exactly. Sometimes only applications know what duplicates are. ... This proposal works for both groups: a) those that want to consider missing tags as just a !implicit-mapping; without special significance b) those who want to consier a missing tag as something very speical with tis own resolution mechanism; a limited one that doesn't change intent, but does have an effect I fit into the former, Oren into the latter. Clark |
From: Clark C. E. <cc...@cl...> - 2004-09-07 23:34:25
|
I think the important concept that Oren is concerned about is the word 'resolution', which he'd like to keep in the YAML spec so that users can talk about documents where only implicit tags have been filled in, specifically, a way to make a partial-representation complete without implying an arbitrary re-structuring of the document. In particular, he would like it to consider the loading of, --- integer: 23 float: 3.4 ... into a document equivalent to, --- !!map !!str "integer": !!int "23" !!str "float": !!float "3.4" ... to not be an "arbitrary transform". He wants to give a very restricted sort of transform, which only provides implicit values and in a manner based soley on content, a special name -- "resolution". I agree. ... However, I disagree with Oren that a special NULL tag (which is not specific to a given kind) is required, especially when it does not have the semantics that people expect with NULL; namely that NULL != NULL. I think the original set of tags introduced by David (given my next pass at renaming them), would suffice: !implicit-mapping !implicit-sequence !implicit-variant-scalar # untagged plain scalar !implicit-ordinary-scalar # any other untagged scalar The rules for these tags would be simple; if a tag is missing, the parser provides one of the four tags above, as if it had been explicitly provided. Since all of these tags are 'private' it the YAML Processor by default will not recognize them unless additional 'resolution' information is provided by the application. The tags, by themselves, carry no particular significance other than they are private, and not defined by the YAML specification. When emitting a document (pretty printing), if the tag is a !implicit-variant-scalar and if the scalar's content matches the plain scalar production, then the tag can be omitted and printed using the plain scalar style. For the !implicit-ordinary-scalar, one of the non-plain styles can be used, and the tag can be omitted. For !implicit-mapping and !implicit-sequence, the tag cam be omitted. In other words, this is a simple syntax-sugar operation -- it is not a magical process. ... Since I do sympathize with Oren's need to express the usage this special-kind of transform, not so much that people would be restricted by it, but rather so that it can be given a name and used in discussions as needed. But, this concept of 'resolution' can be described quite simply using this definition: A YAML node is considered 'resolved' its '!implicit-' tag has been replaced with either a global tag, or a private tag that does not start with '!implicit-'; and, when this replacement was done only based on the content of the given node. In particular, scalar nodes can only use their character content to make this determination, and collection nodes can use any previously resolved child node. A YAML graph is considered 'resolved', if all of the nodes in the graph have been resolved, and no other changes to the graph have been made. There is no provision for 'unresolving', although a reverse transform before emitting is certainly desirable for readability. Within a graph, it is possible that the !implicit-variant-scalar be resolved to other different tags depending on the node's content, for example, content of 3.4 may be resolved to a !!float, while in the same document 'a' may be resoved to !!str. Also note that for the resolution to be valid, the new document after replacement should not make the graph invalid. In other words, --- 'a': one a: two ... would be reported as, --- !implicit-ordinary-scalar "a": !implicit-ordinary-scalar "one" !implicit-variant-scalar "a": !implicit-ordinary-scalar "two" ... which while perfectly valid, would become invalid if a resolution process rewrote the graph to be equivalent to: --- !!str "a": !!str "one" !!str "a": !!str "two" ... I believe that this is an equivalent formulation to Oren's proposal, only a bit more straight-forward and not burdened with the special nature of an empty or NULL tag. I think it is logically equivalent (other than he uses !!str directly for !implicit-ordinary-scalar). By and large, it's a matter of having longer ugly tags, or having a special empty tag. That's it. There is not a huge difference between these two syntax/API alternatives. Your feedback would be much apprechiated. Clark P.S. I've considered that these would be special 'tag:yaml.org,2002:' tags; however, an important part of implicit tags is that they are not 'recognized', and thus it seems private tags best have this characteristic. On Tue, Sep 07, 2004 at 04:37:14PM -0400, Clark C. Evans wrote: | On Tue, Sep 07, 2004 at 10:58:14PM +0300, Oren Ben-Kiki wrote: | | - Scalar nodes that have no explicit tag, and that are written in any | | style except the "plain" style, are reported by the parser as if they | | were associated with the tag "tag:yaml.org,2002:str". | | | | - All other nodes that have no explicit tag (both plain scalars and | | untagged collection nodes) are reported by the parser as "having no | | tag" (having a NULL tag). | | | | - A node tagged with an explicit "null tag" is also reported as "having | | no tag": | | | | --- | | this has no tag: ! "23" | | just like this: 23 | | ... | | --- | - "string" | - "null" | - [] | - {} | ... | | is equivalent to | | --- | - !!str 'string' | - ! 'null' | - ! [] | - ! {} | | | This allows non-plain scalar types to be "untagged". Note that the | | semantics of "!" can not be overriden, and that using a double "!!" is | | invalid: | | | | --- | | this is invalid: !! 23 | | ... | | | | - If a YAML document is loaded "all the way" to what is called a | | "complete representation", each node is converted to an appropriate | | native object (what Brian likes to call a "cave drawing"). | | | | - For each node, this inevitably requires two steps: (1) deciding on the | | type of the "cave drawing" to use; (2) converting the node to the "cave | | drawing" of that type. | | A complete representation happens if and only if: | - there is a partial representation | - every NULL tag is provided, and | - each tag is recognized (i.e. a canonical form is possible) | | | | | - For nodes with tags, deciding on the type of "cave drawing" is based | | on the tag. For nodes without tags, deciding on the type of "cave | | drawing" is based on the value of the node. | | | | Specifically, the decision must not depend on any syntactical details, | | and must not depend on the node's location in the document or the value | | of any other node other than the one being "typed". Note the node may | | be a collection, in which case all its content is available as input. | | (But see below) | | | | - It is required that each type of "cave drawing" used by the | | application have a name (== tag). Note this name (tag) may be private | | ("!foo") or global (a URI). It just needs to have _some_ sort of name. | | | | - Therefore, deciding on the type of "cave drawing" for an untagged node | | can be expressed as "deciding on the tag of the node". Hence, this | | process is called "tag resolution". | | In short, a tag resolution is a simple transformation where only | NULL tags are replaced using only the content of the given node. | | This word is nice; it gives us a special fuzzy feeling that | we havent' really changed the intent of the document (which | a transform would imply) but instead are simply filling-in | the missing details | | | | That's it. Implications: | | | | - An implementation is free to go directly from an untagged node to a | | "cave drawing". An implementation may go directly from syntax to "cave | | drawing", for that matter. The spec places no limitations on the | | specific APIs or implementation details. "What is not explicitly | | forbidden is allowed". | | With the simple resolution rule: | | (SCALAR, NULL) -> '!my-special-variant-tag' | (MAPPING, NULL) -> 'tag:yaml.org,2002:map' | (SEQUENCE, NULL) -> 'tag:yaml.org,2002:seq' | | You can consider resolution "completely optional", or | at least a simple process that can be skipped. ;) | | | | | - An implementation is free to use any type of "cave drawing" for any | | node. For example, it can load all scalar nodes into the integer 12. | | However, if the "cave drawing" chosen does not obey the semantics set | | down by the node's tag, the application is said to have "transformed" | | the document rather than merely "loaded" it. Note this is perfectly OK | | in some contexts. | | | | - In contrast, the action of "filling in the blanks" done by the | | (possibly implicit, hidden) "tag resolution" step does not transform | | thge document. The "complete representation" of the document is _not_ | | modified by this step. Naturally, the representation is changed from | | being a "partial" one to a "complete" one. Round-tripping | | may/should/not reverse this process as appropriate (this isn't | | different from the rest of the round-tripping issues, including | | indentation, comments, anchors, tag prefixes etc.) | | The complete representation is not modified by this step, beacuse | until all NULL tags are provided, a complete representation does | not exist. | | | - Why is restrict "tag resolution" to considering the value of a node? | | | | For the purpose of comparing tags for equality, it must be that 'NULL' | | == 'NULL'. For example, the following must be invalid due to a | | duplicate key: | | | | --- | | a : foo | | a : bar | | ... | | This prevents the unexpected resolution, | | --- | !foo a: foo | !bar a: bar | | | If the tag resolution is restricted to examing the value of each node, | | then the normal rule of comparing nodes (tag == tag && value == value) | | just keeps on working (where NULL == NULL). | | Note that NULL semantics here are equivalent to an empty string, '' | therefore an implementation can use an empty string for this purpose. | Or, really, any special string that does not occur in the wild. | | | An implementation MAY use a more complex way to decide on the type of | | "cave drawing" to load each untagged into. However, using a more | | complex way is considered "transforming" the document rather than | | "loading" it. Again, this isn't forbidden, and it makes sense in | | certain contexts. It is just a different operation. | | | | Of course, this means that: | | | | --- | | "a" : foo | | a : foo | | ... | | | | Would not be caught as a duplicate key prior to tag resolution. There's | | really no helping it... This is the same problem as: | | | | --- | | !!int 10 : foo | | !!int 012 : bar | | ... | | similar not the same, in the former case, after resolution | the document could be equivlaent to, | | --- | !!str 'a': foo | !!str 'a': foo | ... | | Which would not be well formed. Thus, tag recognition is not | even needed here, where it is in the prior case. | | | | | Some duplicate keys can only be caught in the process of construction | | the "cave drawings", regardless of the issue of implicit tags. | | Exactly. Sometimes only applications know what duplicates are. | | | ... | | This proposal works for both groups: | | a) those that want to consider missing tags as just a | !implicit-mapping; without special significance | | b) those who want to consier a missing tag as something very | speical with tis own resolution mechanism; a limited one | that doesn't change intent, but does have an effect | | I fit into the former, Oren into the latter. | | Clark | | | ------------------------------------------------------- | This SF.Net email is sponsored by BEA Weblogic Workshop | FREE Java Enterprise J2EE developer tools! | Get your free copy of BEA WebLogic Workshop 8.1 today. | http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click | _______________________________________________ | Yaml-core mailing list | Yam...@li... | https://lists.sourceforge.net/lists/listinfo/yaml-core -- Clark C. Evans Prometheus Research, LLC. http://www.prometheusresearch.com/ o office: +1.203.777.2550 ~/ , mobile: +1.203.444.0557 // (( Prometheus Research: Transforming Data Into Knowledge \\ , \/ - Research Exchange Database /\ - Survey & Assessment Technologies ` \ - Software Tools for Researchers ~ * |
From: T. O. <tra...@ru...> - 2004-09-08 03:31:15
|
I'm trying to understand the debate here. Could you outline the "warts". A couple of things I'm picking out are: A) How implicit tags are handled B) Resolution defined as a limited transform that preserves some-sort of identity/equality C) Ambiguity in hash key equality The issues seem to center around identity/equality. Probably I'm missing something important about all this. Nonetheless let me make a few observations. An implicit tag is not "NULL". It can only be expressly left out because I know precisely what it _will be_ in doing so. Hence I expect: --- #=> !!seq - "this" #=> - !!str --- #=> !!map "here" : there #=> !!str : !!imp And by expectation only can I leave it out. So what matters most is that is something specific. So then we are talking about this potentail "resolution" that goes on (may go on), to "mildly" transform these unspecified tags --in particular how to handle the !!imp. (Speaking of which, I was thinking myself that 'variant' might be a better word. I see that someone else thought of this too. So !!var might be better tag.) The problem you face here is that, try as you may, any attempt to limit the transforms is going to be hackish. And that's simply because prior to loading, a YAML doc has ZERO semantic value. So trying to limit the transform to keep "something" the same will be impossible, b/c you have no idea what that something is. Now, I will retract my statement that a YAML document has ZERO semantic value. That's not exactly true. It does have some structural meaning. YAML spec makes a big deal about what is _usable_ vs. _unusable_ structural meaning. Hence using > or | does not make two different types of strings. So the _usable_ structural meaning is what I'll call semantic value level ONE. At that level YAML docs have !!map, !!seq, !!str and !!imp (or !!var if you prefer). Each of these has some inherent semantic value --!!str is a scalar conformming to certain encoding restrictions; !!seq is an ordered list; etc. So there is some definite semantics here that effect a YAML doc. And where that is most notable is with !!imp and !!map, b/c the former has an inherent semantic ambiguity, and that later has a restriction dependent on equality --which is an inherent semantic evaluation. Put the two together and there are bound to be problems. So one might want to blame the !!imp (after all that is what an imp is for ;), but actually the !!imp is pointing out the truth about everything in the doc: _You have to idea what it will become!_ The only think you know is that it is structured in a certain way to help get it to where ever it is it's going. The fact that we distinguish between !!imp and !!str is only b/c !!imp will quite possibly become a type of scalar other than a string. In other words it's a convenience to tell the loader to LOOK AT THE SYNTAX OF THE VALUE to decide what type of scalar this should be. So, no matter how you word it, this act is a _transformational_ one. Normally that's all up to the application, but this particular transformation is so common that it falls back on to YAML to deal with. But why is it so common? Mainly b/c YAML gives no core value to the common types it defines in it's type repository: !!int, !!bool, !!float, etc. These are the very types !!imp is transformed into _100%_ of the time. YAML's attempt at being a lowest common denominator might be too low --and that's really the source of the trouble with !!imp --it is a categorization between string scalars and value-dependent scalars. Nonetheless, there's little we can do, because ultamately it's up to the application to decide what type something will become. Even when we sepcifically state something is a !!str, it might not end up that way. So we end up right back where we started on this. A YAML doc really has no sematic value, only structural value. So what's up with structural value? YAML has chosen the !!seq and !!map as its two fundamental collection types b/c they correspond closely to collections of common languages like Python, Perl, Ruby, and so on. _But_ these languages have semantic distinctions too --percisely around the trouble spot of hash key equality! This should tell us something. YAML intends to be foremost a serialization format and a cross-platform tool. So right here lies its most troublesome problem. The fact is, when it comes to hash key equality, there is no fixing the bugger --it's bringing semantic value into something that simply has none. So you can't fully enforce it. The only way to get rid of the wart is to drop the !!map as a prime type. DON'T PANIC! First, I'm not saying this _must_ be done. We could just accept the wart. On the other hand, it doesn't hurt to look at things from the new perspective we are now developing. Now, a mapping is a function, mapping a key value to a result value, thus any of the result values can be equal, but not the key values. But a mapping can also be seen as more general thing called a "relation" _with_ an additional limitation of equality on the "key" value. I think Clark tends to call these pairs, although that sort of losses sight that it is a collection. More specifically it is an unordered list --a set, of pairs; I will label it !!rel. Now here's the interesting thing. An !!imp is to a !!str as a !!map is to a !!rel. Just as !!imp may or may not become a !!str dependent on what the application determines about the value, so too a !!map may or may not be have to be a !!rel based on what the application determines about the "key" value. So what does this mean? Well, we now realize we should consider !!map in a catagory with !!imp --both are somewhat ambiguous types. Of course, if we wanted to break this down still further we would end up with just two collection types: !!set and !!seq. A !!rel is then just a !!set of 2-element !!seq. A !!map is the same with the equality restriction on the 1st elements of the !!seq. These are the truly the _fundemental_ collection types. Yet a !!seq can be used for a !!set, if the order is ignored, just sa a !!rel can be used for a !!map, if the "key" values are kept unequal. in other words, fundementally everything can be accomplished with a !!seq. But just b/c everything can be donw with a !!seq and a !!str (just two types!) dosen't make it very convienient --another good reason perhaps more types should just be considered cental to the YAML spec (!!int, !!float, etc.) Anyway, some things to think about. I'll close with this final thought: If YAML intends to stay out of the business of semantics, then it must give up trying to tame transformation. -- T. |
From: Clark C. E. <cc...@cl...> - 2004-09-08 04:46:13
|
On Tue, Sep 07, 2004 at 11:31:05PM -0400, T. Onoma wrote: | An implicit tag is not "NULL". It can only be expressly left out because I | know precisely what it _will be_ in doing so. Hence I expect: | | --- #=> !!seq | - "this" #=> - !!str | | --- #=> !!map | "here" : there #=> !!str : !!imp That works for me. With Oren's proposal, this is equivalent to: | --- #=> ! | - "this" #=> - !!str | | --- #=> ! | "here" : there #=> !!str : ! If you only used the ! (empty) tag for !!imp, then you get: | | --- #=> !!seq | - "this" #=> - !!str | | --- #=> !!map | "here" : there #=> !!str : ! All of them are quite equivalent (as far as the model is concerned), it is more a matter of what conveys the intent the best. The latter one probably best matches "intent", that is, only the plain scalar is truly subject to 'resolution' as Oren likes to put it. | So then we are talking about this potentail "resolution" that goes on | (may go on), to "mildly" transform these unspecified tags --in | particular how to handle the !!imp. (Speaking of which, I was thinking | myself that 'variant' might be a better word. I see that someone else | thought of this too. So !!var might be better tag.) I kinda like !imp since an imp is an evil creature; but this not germane to the conversation. ;) | The problem you face here is that, try as you may, any attempt to limit | the transforms is going to be hackish. I agree; which is why 'resolution' was just a descriptive word, to describe a class of transformations which are particulary well behaved with respect to omitted tags. | The fact that we distinguish between | !!imp and !!str is only b/c !!imp will quite possibly become a type of | scalar other than a string. In other words it's a convenience to tell | the loader to LOOK AT THE SYNTAX OF THE VALUE to decide what type of | scalar this should be. So, no matter how you word it, this act is a | _transformational_ one. That is the key observation. | But why is it so common? Mainly b/c YAML gives no core value to the | common types it defines in it's type repository: !!int, !!bool, !!float, | etc. These are the very types !!imp is transformed into _100%_ of the | time. YAML's attempt at being a lowest common denominator might be too | low --and that's really the source of the trouble with !!imp --it is a | categorization between string scalars and value-dependent scalars. Right. | So what's up with structural value? YAML has chosen the !!seq and !!map | as its two fundamental collection types b/c they correspond closely to | collections of common languages like Python, Perl, Ruby, and so on. | _But_ these languages have semantic distinctions too --percisely around | the trouble spot of hash key equality! This should tell us something. | YAML intends to be foremost a serialization format and a cross-platform | tool. So right here lies its most troublesome problem. The fact is, when | it comes to hash key equality, there is no fixing the bugger --it's | bringing semantic value into something that simply has none. So you | can't fully enforce it. This doesn't have to be considered a wart. One defines an !!imp, as the fourth core type in the YAML language, equality between !!imps are done via string equality, and since !!imp 1 and !!str 1 have a different tag, they are not equal. Thus, there need be no ambiguity if unspecified tags are mapped directly to these four tags. If a program wishes to _transform_ an !!imp 1 to an !!int 1, the result is a _different_ document. If Oren wants to give a transform from !!imp to another tag a special pass for good behavior, that's fine, but it's still a transform. And, being a transform, it's entirely possible that while the original document may be valid, the result isn't. | in other words, fundementally everything | can be accomplished with a !!seq. But just b/c everything can be donw | with a !!seq and a !!str (just two types!) dosen't make it very | convienient Yes, YAML is _not_ S-Expressions. ;) | another good reason perhaps more types should just be | considered cental to the YAML spec (!!int, !!float, etc.) Well, we stared, back in the day by explicitly listing this 'resolution' stage as a function encoded into the YAML specific. It had no ambiguity. Alas, Brian and I spared over representations of !!date and this led us down the 'implicit typing is application defined' road. If somone wants to use implicit typing, I see no problem with loading their untagged plain scalars as a !!imp; if they don't like it, they can run a transform on the data before they load it. But this transform is in the application's domain and not defined by YAML. The 'resolution' thing and the NULL tag is a compromise position, it's OK with me as long as NULL == NULL (aka, it is semantically equivalent to !!imp). I can always label it _IMP in my code and be done with it. | I'll close with this final thought: If YAML intends to stay out of the | business of semantics, then it must give up trying to tame | transformation. Well, it should say only one thing: A transformation should only use informatoin in the representation model (it should not use key order, comments, the prologue, yada yada, or any other information from the presentation or serial model). That's in big print; but that's all it says. Any transform, including mapping everything to an empty document is perfectly OK if that's what the application wants to do. Thanks for reading the discussion and chiming in. Cheers! Clark |
From: Sean O'D. <se...@ce...> - 2004-09-08 00:22:56
|
On Tuesday 07 September 2004 16:34, Clark C. Evans wrote: > I think the important concept that Oren is concerned about is the word > 'resolution', which he'd like to keep in the YAML spec so that users can > talk about documents where only implicit tags have been filled in, > specifically, a way to make a partial-representation complete without > implying an arbitrary re-structuring of the document. In particular, > he would like it to consider the loading of, > > --- > integer: 23 > float: 3.4 > ... > > into a document equivalent to, > > --- !!map > !!str "integer": !!int "23" > !!str "float": !!float "3.4" <snip> With schemas doing validation and typing, it's not a long step for a schema to do implicit typing, so maybe keep that option open for the future. It's fine to have a default implicit typing phase, but I'm not sure those new rules and the whole !implicit thing maps well to something like that. Why not just keep the phase simple? Perform the yaml.org implicit typing *if* its the default namespace (prefix?) and *after* all other transformation schemas have been run, and then apply the "if it's untagged, it might become implicit" rulset but even then, just make it !yaml.org!map, !yaml.org!seq, etc. Sean O'Dell |
From: Clark C. E. <cc...@cl...> - 2004-09-08 01:30:28
|
Sean, This proposal does two things: - It specifies that omitted tags are simply a syntax-shorthand for well known 'tags'. In my proposal, each tag is a private '!implicit-*' per David's suggestion. In Oren's syntax non-plain scalars are always 'tag:yaml.org,2002:str' and all other tags are NULL (which has the exact semantics as the empty string as far as comparison goes). This keeps the information model for 'post-parse/pre-load' the same as the YAML Graph Representation, instead of a custom model with a plain-scalar flag. Ick. - It also defines, for informational purposes only, a specific word 'resolution' to refer to the act of filling in these omitted tags using only the content of the node, and not doing anything else. In short, it defines a very limited transformation, nothing more nothing less. In particular, it doesn't say applications couldn't just leave the tags as-is or do an arbitrary transform that re-structures the entire input graph. (I think...) The proposal certainly doesn't require, or describe any intermediate 'default implicit typing phase'. Although implementations are welcome to provide such a thing. Does this help? Clark ... (older draft of this response) This is _not_ proposing any sort of a default resolution rules. On the contrary, it is just defining what 'resolution' means, for the sole purpose of human discussion; namely to differentate it from arbitrary 'transforms' which actually do change content. It also gives a specific syntax rule, but that's an entirely different issue (my !implicit-tags or Oren's ! empty tag, let's choose one, it doesn't matter which). On Tuesday 07 September 2004 16:34, Clark C. Evans wrote: > In particular, he would like it to consider the loading of, > > --- > integer: 23 > float: 3.4 > ... > into a document, > > --- !!map > !!str "integer": !!int "23" > !!str "float": !!float "3.4" to be an example of 'resolving' the tags. Just as it would be OK to resove the first document to: --- !my-map !my-integer "integer": !my-23 "23" !my-float "float": !my-3.4 "3.4" just as well. The idea that tags resolve, without changing the content, and only according to the tag's value is all that is important. That they resolve to is not the issue, its that nodes arn't added and scalar values arn't changed or dropped; in short, that the transform was limited to only filling in omitted tags. On Tue, Sep 07, 2004 at 05:22:15PM -0700, Sean O'Dell wrote: | With schemas doing validation and typing, it's not a long step for a | schema to do implicit typing, so maybe keep that option open for the | future. It's fine to have a default implicit typing phase, This is not talking about a default implict typing; it's just saying that some sort of typing happens to implicit nodes, and that such typing doesn't completely re-arrange the YAML graph. | but I'm not | sure those new rules and the whole !implicit thing maps well to | something like that. Why not just keep the phase simple? Perform the | yaml.org implicit typing *if* its the default namespace (prefix?) and | *after* all other transformation schemas have been run, and then apply | the "if it's untagged, it might become implicit" rulset but even then, | just make it !yaml.org!map, !yaml.org!seq, etc. This is an example of an application specific "resolution" rule, and not one that should be explained by the specification. If you want to do it this way, great. ;) Clark -- Clark C. Evans Prometheus Research, LLC. http://www.prometheusresearch.com/ o office: +1.203.777.2550 ~/ , mobile: +1.203.444.0557 // (( Prometheus Research: Transforming Data Into Knowledge \\ , \/ - Research Exchange Database /\ - Survey & Assessment Technologies ` \ - Software Tools for Researchers ~ * |
From: Sean O'D. <se...@ce...> - 2004-09-08 05:05:21
|
On Tuesday 07 September 2004 18:30, Clark C. Evans wrote: > > This proposal does two things: > > - It specifies that omitted tags are simply a syntax-shorthand for > well known 'tags'. In my proposal, each tag is a private > '!implicit-*' per David's suggestion. In Oren's syntax non-plain > scalars are always 'tag:yaml.org,2002:str' and all other tags are > NULL (which has the exact semantics as the empty string as far as > comparison goes). This keeps the information model for > 'post-parse/pre-load' the same as the YAML Graph Representation, > instead of a custom model with a plain-scalar flag. Ick. You are talking about the default tag for all scalars, right? Typing them as !!str if they have no explicit tag until the loader does something else with them, like apply built-in implicit typing, or running a transformation schema, right? What's a non-plain scalar? Why not make all untagged scalars !!str initially? > - It also defines, for informational purposes only, a specific word > 'resolution' to refer to the act of filling in these omitted tags > using only the content of the node, and not doing anything else. In > short, it defines a very limited transformation, nothing more > nothing less. In particular, it doesn't say applications couldn't > just leave the tags as-is or do an arbitrary transform that > re-structures the entire input graph. (I think...) Using the content of the node? How so? What criteria would you apply to a node's content to make it !!str tagged or not? > The proposal certainly doesn't require, or describe any intermediate > 'default implicit typing phase'. Although implementations are welcome > to provide such a thing. It sure seems like you're doing a default implicit typing phase. > Does this help? It would help a lot more if you guys spoke in plainer terms. Sean O'Dell |
From: Oren Ben-K. <or...@be...> - 2004-09-08 05:32:48
|
On Wednesday 08 September 2004 06:31, T. Onoma wrote: > I'm trying to understand the debate here. Could you outline the > "warts". Well, Clark and I agree on the semantics in a basic level, but seem to disagree on the mechanism for explaining/describing them. Which is good, it means we are getting somewhere. IMVHO, Clark's proposal (using !implicit-*) is confusing and arbitrary, while using NULL is cleared and a better reflection of what is actually happening. > The > problem you face here is that, try as you may, any attempt to limit > the transforms is going to be hackish. Exactly. Once you present resolution as taking some tag and re-writing it to be a different tag, the word "transform" is used, and you open the door for a host of issues, like why can't I do other transforms, to my own private tags, and can't I explicitly write !implicit-foo and transform that, and why do I need to have a transform in the first place - can't I simply load the document as it is, please? Sure, there are possible answers. But this whole song-and-dance is completely unnecessary. > So the _usable_ structural meaning is > what I'll call semantic value level ONE. At that level YAML docs have > !!map, !!seq, !!str and !!imp (or !!var if you prefer). We call this the node's _kind_. It can be a mapping, a sequence, or a scalar. (!!map, !!seq and !!str are specific types; not all mappings are !!map, not all sequences are !!seq, and not all scalars are !!str). Also note that prior to loading, some nodes have a known (explicit) tag. E.g., a mapping might be a "!invoice". So "value level one" goes beyond the mere node kind - for _some_ nodes anyway. > ... actually the !!imp is pointing out the truth about > everything in the doc: _You have to idea what it will become!_ Exactly! This is why I feel that using NULL is the simplest, correct solution. I simply *do not know* the name of the type of "cave drawing" that will be used for the node ("what it will become"). This makes everything simple. Obviously, if you don't know what "it will become", before it "becomes" anything, you need to _decide_ what it "will become". Right? Well, we give a name to this decision: "deciding-what-it-will-become" == "tag resolution". This isn't an optional step (again: you *must* "decide-what-it-will-become" before it "becomes"). This isn't a transform: you are NOT changing "what it will become" (the tag of the node). In Clark's way of explaining things, since you *pretend* you know the tag of the node, this _is_ a transform, as you *are* changing the tag of some nodes. So you go into a whole song-and-dance about why its OK to change some tags and not others, and so on. Ugh, Ick, and Oy Vey :-) Here's an analogy. You have a hospital ("application"). Whenever someone comes in ("a node is loaded"), you check his name against your medical histories database ("construct a native object"). People who visit the hospital often are all listed in the database. You give them a plastic card with a barcode ("some nodes have tags"). However, most people who come in, come in for the first time, and have no card ("most nodes have no tag"). You don't know their name - you have to ask them (if uncouncious, you have to look at their possesions) to discover their name ("you must give a tag to these nodes by looking at their content"). Clark's proposal is that we name all the people without a card "Joe" ("we give all untagged nodes the tag "!implicit"). So, if you ask the name in the waiting room who is there, she'll say: "I have Alice and Joe and Joe and Joe and Joe" ("!!float and !implicit and !implicit and !implicit and !implicit"). Now, for a person to be treated, we enter his name to the database to look up prior history and record new history ("for a node to be loaded, we need to decide on the native object type"). Under my approach, you say: First, find out their name. If they have a tag, fine. If not, FIND IT OUT - ask them, look at their pockets for a driving license, etc. Now that you have a name, use it. ("if a node has a tag, fine; otherwise, look in its content for regexp etc. and decide on one. Now, use the native object type associated with the tag"). Under Clark's approach, you say: If their name is "Joe", ask them what their name is, look in their pockets for a driving license, etc. Then RENAME each "Joe" to a new name, and then look it up in the database. ("transform all !implicit tags to something else based on the node content. Now, use the native object type associated with the tag"). Under my approach, it makes perfect sense to say: I failed to/didn't yet discover this person's name, I'll call him as "John Doe" ("NULL"). However, I can't look the name "John Doe" up in the database for prior medical history. ("I failed to/didn't yet discover the node's tag, I'll just leave it NULL. However, I can't load it into a native object"). Under Clark's approach, it is like saying: This unconcious guy has a chart saying his name is "John Doe" ("!implicit"). Let's see if he has a history of prior admissions... ("use the tag !implicit as if it as a normal private tag to look up the native data type"). Oh wait, I can't. So his name really isn't "John Doe". Or is it? Can I look him up in the database or not? Nurse, were you here when this guy was admitted, did he have any ID? ("are !implicit 'resolved' or are they not? is this node !implicit or a !float waiting to be resolved? which callback gave you that node, again - a loader callback or a parser callback?"). In my approach, you _can_ treat a "John Doe" without using any medical history. This is a weaker form of treatment - you might misdiagnose, etc. ("You can 'load' a NULL-tagged node to a "generic YAML node". This is a weaker form of 'load' - you might get equality wrong, etc."). So you take extra care when you treat him ("applications are limited in what they can do with NULL-tagged nodes"). See? You do NOT look up in the database to see the prior history of "John Doe" (you do not say that "the native data type for NULL nodes is <some-type>"). Hence, "John Doe" means "I have no name", it is NOT a name ("a NULL tag means "I have no tag", it is NOT a tag called !implicit"). Really, Clark's approach baffles me. I can see that its functionally equivalent - which is good, since this means we are getting somewhere. But for the life of me I can't see it simplifies anything. It just complicates them. Using NULL tags is the simplest, most direct way of representing the "physics" of the issue. Some nodes have no tags, period. Here, look at the document. See a tag? No? well, there you have it. It has no tag. Now, if I need the node's tag, I must find it out first, right? It because I say so, it just the way things are. There's no tag, I need it, I must find it out. If I *don't* need it (e.g., YAML pretty printer), I simply don't bother finding it out. Brain-dead simple, can't get it wrong, can explain it to a first grader. "What you see is what you get". Magical !implicit-* tags that change what they mean for different nodes is complicated. See, this node has an invisible !"implicit" tag. Its invisible, that's why you can't see it, OK? Except it doesn't _really_ have this tag, because if I actually _need_ the tag, I first might want to change it to something else. If I don't actually need the tag (e.g., YAML pretty printer), I don't bother to change it. No, it is *not* allowed to change any other tag, that is very different and would be very bad. Because I say so, that's why! Brain-numbing complicated, can't get it right, can't explain it to people in this list without a zillion posts (and even then, I bet some got it wrong). "What you see is what I mean". And the worst of it all... its *unnecessary*. The semantics is _the same_ as the much simpler alternative (using NULLs). > I'll close with this final thought: If YAML intends to stay out of > the business of semantics, then it must give up trying to tame > transformation. +100. Transformations can and will do anything at all, it is OK to apply them, there's nothing in the spec to forbid them. It is just that they are out of the scope of the YAML core spec. We must ensure that nobody even *thinks* of the word "transformation" when he simply loads his document. "Transform it? I dumped my object, I loaded it back - what transformation? Magical invisible tags that disappear just before I use them? What have you been ingesting?" I dumped, I loaded, it is the *SAME*. Have fun, Oren Ben-Kiki |
From: Oren Ben-K. <or...@be...> - 2004-09-08 06:27:19
|
Aside from the philosophical mumbo-jumbo, here's another simple operational reason why "!implict" doesn't cut it: Under the "!implicit" proposal, its impossible to look at a node and know whether it is "reall" and "!implicit" or it is a "!!float" that just waits to be resolved. In contrast, if a node's tag is NULL, you know it hasn't been resolved yet. Finally, for everyone who simply loves "!implicit" and hates NULL. Just resolve every scalar NULL tag to "!implicit" (a better name, like Onoma correctly pointed out, would be "!variant"). Resolve NULL collection tags according to their kind (to !!map and !!seq). I think this sort of argument (plus the fact that NULL == NULL) was what got Clark thinking NULL is OK in the first place, and the fact that it _is_ NULL and is "resolved" to !variant, !!map and !!seq is what got me to agree it makes sense. This is a delicate compromise where all of us get what we want - trying to tweak it is bound to break something. So don't :-) "Needless to say", in a given implementation you can collapse the resolution step and have your system report untagged nodes as "!variant", "!!seq" and "!!map" IMMEDIATELY. Or you could use someones generic implementation that does use NULL and ask it nicely to perform this step for you without you ever noticing. Whatever. Either way, your code doesn't have to think about it if you don't want it to. The NULL -> resolution -> tag step is for (1) making things easy to explain; (2) a very general guide for "generic" implementations. It does NOT require your code to work that way (with explicit steps). It is exactly like the "serial model" in this respect. There's no requirement that an implementation allow accessing the YAML file as serialized events. Even if some implementation provides both "GetNext" in addition to "LoadItAll", you can just simply ignore "GetNext". This doesn't mean we should throw out the whole serial model section, it is there for the above same 2 excellent reasons. Again: as the spec says (and people keep forgetting), an implementation may go directly from syntax to native data, without ANY intermediate steps. So, if you dislike NULLs, your code can never have any NULL in sight. Please avoid the argument "this is horrible, I don't want to test for NULL in my code, I just don't need this mechanism, I can live without it". That's "true, but mostly irrelevant". You do NOT have to deal with NULLs in your implementation if you don't want to, any more than you have to deal with "Serialized events" in your code if you don't want to. Sorry to keep harping on that, but it seems to be people's main objection :-) Have fun, Oren Ben-Kiki |
From: T. O. <tra...@ru...> - 2004-09-08 07:11:39
|
On Wednesday 08 September 2004 02:27 am, Oren Ben-Kiki wrote: > Sorry to keep harping on that, but it seems to be people's main > objection :-) Okay, I see. So there this confusion of sorts over this resolution stage --a very minor one, b/c it would seem that the two approaches are equivalent. First off, I assume that can only be so b/c as resolution time the kind of node is recognized despite any tag. In otherwords YAML has a mapping, sequence and scalar and really these have nothing to do with tags. So if they are each marked NULL its fine. At resolution time the difference is known. Cool? Not really. Again, resolution is trying to be a tamer of transformation. Somewhere from the application side I have to specify what to do about the resolution i.e. what to do about those NULLs, just as I specify any transformation. An implementation that made me deal with them in two totally different ways would suck (IMHO). I assign a tag to type. Period. Even so whether you call it resolution time or transformation time, if we're using NULLs, implementors will have to give us a way to specify "if NULL mapping -> type." and so on. The easiest way to do this then is to just give it a name. Then I can use the regular old type transform methods I already have. No problemo. As far as explaining it to someone, how hard is that, really? Do you think its really easier to explain "that mapping has no tag so it's tag is NULL" vs. "that mapping has no tag so it has an implied tag of !null-map." I think its about equal. But having an actual name makes the application's work easier. So I think we should go that way. NULL does make sense --but that's not the problem. The question I have now is: Are the "!implict-*" the same as "!!*"? It depends on this: should the application be able to "resolve" tags differently based on whether the tag was explicitly given or not? In other words, is this part of the BIG disclamer Clark menton's or not? Also what about default %TAG prefixes? NULL gets out of those, but what if we use !implict-* or !!*. How should default %TAG play a role in that case? And lastly, as usual what am I missing? BTW: Sorry Oren, but the hospital analogy kind of stunk. Hospital's _DO_ give unknown patients the name's John and Jane Doe. -- T. |
From: Clark C. E. <cc...@cl...> - 2004-09-08 13:11:14
|
On Wed, Sep 08, 2004 at 03:11:33AM -0400, T. Onoma wrote: | implementors will have to give us a way to specify "if NULL | mapping -> type." and so on. The easiest way to do this then is | to just give it a name. Then I can use the regular old type | transform methods I already have. No problemo. You've pointed out another difference I didn't think of earlier. In the current information model, each tag is associated with exactly one kind (sclar, mapping, sequence). Assuming Oren's NULL would be implemented as a empty string '', this special tag would violate that rule. This makes sense in Oren's world beacuse it isn't a tag, but it would, nonetheless cause problems. Two choices: - losen the rule, to allow !date { month: 23, day: 3, year: 2004 } and then define equality to be of the tuple (kind,tag,content) - find three separate 'values' for NULL In API terms, if you loosen the rule, we'd have to use a (kind,tag) pair when looking up the appropriate handler. Otherwise, we need three _distinct_ tags. | As far as explaining it to someone, how hard is that, really? Do you | think its really easier to explain "that mapping has no tag so it's | tag is NULL" vs. "that mapping has no tag so it has an implied tag | of !null-map." I think its about equal. But having an actual name | makes the application's work easier. So I think we should go that way. | NULL does make sense --but that's not the problem. Nods. | The question I have now is: Are the "!implict-*" the same as "!!*"? It | depends on this: should the application be able to "resolve" tags | differently based on whether the tag was explicitly given or not? In | other words, is this part of the BIG disclamer Clark menton's or not? I'm not sure if this is the question, but assuming that an unspecified plain scalar is reported !unspecified-implicit then, both - !unspecified-implicit "x" - x should be reported by the parser identically. | Also what about default %TAG prefixes? NULL gets out of those, but what | if we use !implict-* or !!*. How should default %TAG play a role? I was thinking it'd be better 'fix' these default tags to be a very specfic value (regardless of %TAG directive). I think my favorite option is to use !!map, !!seq, !!str, and !!imp where !! is tag:yaml.org,2002: While not as flexible as the rest, it avoids any possible ambiguity. The mechanism is already quite clever, no point in making it more so by having its prefixes shift. Cheers, Clark -- Clark C. Evans Prometheus Research, LLC. http://www.prometheusresearch.com/ o office: +1.203.777.2550 ~/ , mobile: +1.203.444.0557 // (( Prometheus Research: Transforming Data Into Knowledge \\ , \/ - Research Exchange Database /\ - Survey & Assessment Technologies ` \ - Software Tools for Researchers ~ * |
From: T. O. <tra...@ru...> - 2004-09-08 13:48:54
|
On Wednesday 08 September 2004 09:11 am, you wrote: > On Wed, Sep 08, 2004 at 03:11:33AM -0400, T. Onoma wrote: > | implementors will have to give us a way to specify "if NULL > | mapping -> type." and so on. The easiest way to do this then is > | to just give it a name. Then I can use the regular old type > | transform methods I already have. No problemo. > > You've pointed out another difference I didn't think of earlier. In the > current information model, each tag is associated with exactly one kind > (sclar, mapping, sequence). Assuming Oren's NULL would be implemented > as a empty string '', this special tag would violate that rule. This > makes sense in Oren's world beacuse it isn't a tag, but it would, > nonetheless cause problems. Two choices: > > - losen the rule, to allow !date { month: 23, day: 3, year: 2004 } > and then define equality to be of the tuple (kind,tag,content) > > - find three separate 'values' for NULL > > In API terms, if you loosen the rule, we'd have to use a (kind,tag) > pair when looking up the appropriate handler. Otherwise, we need > three _distinct_ tags. All things being equal, the later seems easier. But their might good reasons for the form that I am unawares. > | The question I have now is: Are the "!implict-*" the same as "!!*"? It > | depends on this: should the application be able to "resolve" tags > | differently based on whether the tag was explicitly given or not? In > | other words, is this part of the BIG disclamer Clark menton's or not? > > I'm not sure if this is the question, but assuming that an unspecified > plain scalar is reported !unspecified-implicit then, both > > - !unspecified-implicit "x" > - x > > should be reported by the parser identically. Sorry, I should have been more clear: #ex 1 -- !!map a: 1 b: 2 #ex 2 -- a: 1 b: 2 So is it acceptable/desirable to allow _different_ transforms for the above two examples? > | Also what about default %TAG prefixes? NULL gets out of those, but what > | if we use !implict-* or !!*. How should default %TAG play a role? > > I was thinking it'd be better 'fix' these default tags to be a very > specfic value (regardless of %TAG directive). I think my favorite > option is to use !!map, !!seq, !!str, and !!imp where !! is > tag:yaml.org,2002: While not as flexible as the rest, it avoids any > possible ambiguity. The mechanism is already quite clever, no point in > making it more so by having its prefixes shift. Well, assuming !implicit-* == !!*, then %TAG !! my.org,2004 will or will not alter the !!* implicits? If not, that's an exception. Is it worth an exception? -- T. |
From: Clark C. E. <cc...@cl...> - 2004-09-08 12:36:05
|
On Wed, Sep 08, 2004 at 08:32:42AM +0300, Oren Ben-Kiki wrote: | On Wednesday 08 September 2004 06:31, T. Onoma wrote: | > I'm trying to understand the debate here. Could you outline the | > "warts". | | Well, Clark and I agree on the semantics in a basic level, but seem to | disagree on the mechanism for explaining/describing them. Which is | good, it means we are getting somewhere. Yesterday I thought we were fencing over syntax, now I'm not sure. | Well, we give a name to this decision: "deciding-what-it-will-become" == | "tag resolution". This isn't an optional step (again: you *must* | "decide-what-it-will-become" before it "becomes"). This isn't a | transform: you are NOT changing "what it will become" (the tag of | the node). Ok. So you really _don't_ think that this is a transform, albeit a limited one; if so, this is a substansive difference. Imagine, --- 23 Two 'processes' look at this document. A REXX one, who loads the document and says "ohh, 23 is a string"; and a Python one, which loads it and says, "ooh, 23 is an integer". So, neither of these two programs has 'changed' the document above when they simply "loaded" it. No transform. All is good. Both are right? No change in information? Now both of these processes "save" the data; the REXX one writes "--- !!str 23" and the Python one writes "--- !!str 23", right? And this, obviously, doesn't 'change' the document. So now, does this mean !!str is !!int ? I hope not. Where am I mis-understanding? Or are you suggesting that both of these processes would purposefully 'discard' information (that already existed when they save the document?). If so, it seems you need a new Flag in your model "was I resolved?", so you know which tags to discard on the way out. Is this 'implicit' flag per-type or per node? ... I view this as a simple transform with an inverse transform to put the toy back the way you found it. Perhaps the word 'resolution' doesn't accurately frame the intent, think "isomorphism" is better, that is, a reverseable transform. If your process is "playing nice", and you need to change the incoming document (by filling in the missing tags, for example), then you probably need to provide a reverse transform when you want to write the document on the way out. If you fill-in tags (say according to a regex), you need to strip them on the way out, according to that same regular expression. Really, such a concept need not be limited to just the NULL tags to operate in an "expected" way. I might, for example, transform all Perl::BigInt to python/long, and then reverse the transform on the way out. How is this a problem? In any case, you _are_ changing the document's content, right? Cheers! Clark -- Clark C. Evans Prometheus Research, LLC. http://www.prometheusresearch.com/ o office: +1.203.777.2550 ~/ , mobile: +1.203.444.0557 // (( Prometheus Research: Transforming Data Into Knowledge \\ , \/ - Research Exchange Database /\ - Survey & Assessment Technologies ` \ - Software Tools for Researchers ~ * |
From: Oren Ben-K. <or...@be...> - 2004-09-08 21:40:11
|
On Wednesday 08 September 2004 15:36, Clark C. Evans wrote: > | Well, Clark and I agree on the semantics in a basic level, but seem > | to disagree on the mechanism for explaining/describing them. Which > | is good, it means we are getting somewhere. > > Yesterday I thought we were fencing over syntax, now I'm not sure. Seems that way, cause the second I turned my back... :-) > | Well, we give a name to this decision: > | "deciding-what-it-will-become" =3D=3D "tag resolution". This isn't an > | optional step (again: you *must* "decide-what-it-will-become" > | before it "becomes"). This isn't a transform: you are NOT changing > | "what it will become" (the tag of the node). > > Ok. So you really _don't_ think that this is a transform, Aha! If you say something a dozen times, eventually people will hit on=20 the notion you mean it! :-) > Imagine, > > --- 23 > > Two 'processes' look at this document. A REXX one, who loads the > document and says "ohh, 23 is a string"; and a Python one, which > loads it and says, "ooh, 23 is an integer". Imagine: %TAG !foo! tag:domain.com,2003: --- !foo!bar abc Two 'processes' look at this document. A REXX one, who loads the document and says "ohh, abc is an invoice id"; and a Python one, which loads it and says, "ooh, abc is a hex color code". > Now both of these processes "save" the data; the REXX one writes "--- > !!str 23" and the Python one writes "--- !!str 23", right? You mean the Python one writes !!int 23. At any rate... > And this,=20 > obviously, doesn't 'change' the document. So now, does this mean > !!str is !!int ? I hope not. Where am I mis-understanding? The fact that whoever wrote the document had an intent. We loosely call=20 this intent "the docoment schema". This schema can be expressed in many=20 ways - as lines of code spread through an application, for example. Or=20 in a README.txt file. Whatever. When the REXX process looked at "-- 23" and said "this is a string", it=20 applied a schema. Likewise, when the Pythin code looked at "-- 23" and=20 said "this is an integer", it applied a schema. Alas, obviously they didn't both use the same schema. _That_ is the=20 problem. Now, REGARDLESS of the issue of implicit tags, if one application=20 decides 'tag:foo.com,2003:bar' means hex colors and another decides=20 'tag:foo.com,2003:bar' means invoice id, you have a problem. Right?=20 That's why we acompany each tag with a description of its semantics. We=20 say each application "should" obey these semantics. It MAY choose not=20 to, which is OK!!! but in that case, it "transformed" the document. In a similar way, the author/designer of the "-- 23" document's schema=20 has stated (intended) what the semantics of implicit tags is in his=20 documents. "I hereby decide that implicit '[0-9]+' scalars mean=20 integers in all Apache configuration files". Sure, your REXX=20 application MAY choose to ignore this and treat them as strings. It may=20 also choose to load "!!str" nodes as bit vectors and "!!map" nodes as=20 strangely-coded audio. In all these cases, it transforms the document.=20 I really don't see the problem. Let it - it might be that treating=20 Apache config files as encoded audio yields sweet music - this is=20 completely besides the point. > In the > current information model, each tag is associated with exactly one=20 > kind (sclar, mapping, sequence). =A0Assuming Oren's NULL would be > implemented as a empty string '', this special tag would violate that=20 > rule. =20 We are going no where at all very fast until you and Onoma understand=20 that I absolutely, truly, really really mean it when I say that a node=20 with no tag, well, HAS NO TAG. Again: THE NULL TAG IS NOT A TAG. Saying something like "the NULL tag is magical because it applies to=20 several kinds" indicates you completely misunderstood what I mean. THE=20 NULL TAG IS NOT A TAG. Again, and again, and again: a sequence node may=20 have no tag. A mapping node may have no tag. A scalar node may have no=20 tag. My cat has no tag. That does not make "no tag" into some sort of=20 magical tag that applies to all kinds of node and to my cat. These=20 nodes - and my cat - simply HAVE NO TAG. Your problem is that you keep on viewing "having no tag" as "having a=20 magical tag". You keep trying to _use_ this "magical no-tag tag". This=20 keeps to leading to contradictions, inconsistencies, complications,=20 special cases, restrictions, page-long sets of rules etc. You cleverly=20 manage to push the problem around - from one phase of the processing to=20 another, from one set of complex rules to another (I admit to losing=20 track of the flow of proposals any more). The inherent contradiction=20 keeps on popping up - you simply can't treat "having no tag" as a=20 special sort of a tag. Give it up. The simplest, most straightforward way of thing about having=20 no tag is as having no tag. Onoma wrote: > BTW: Sorry Oren, but the hospital analogy kind of stunk. Hospital's > _DO_ give unknown patients the name's John and Jane Doe. He's making the same mistake. Sure they do. That's their way to encode=20 NULL. I claim that hispitals treat the special name "John Doe" as a=20 NULL. Specifically, a doctor will NEVER EVER search the hospital's=20 admission records to see whether "John Doe" has checked in the last=20 year and reported any medical condition or drug allergies. The doctor=20 is completely aware at each point that "John Doe" is NOT the patient's=20 name. In ALL your proposals (and Clark's), it is ALLOWED for an application to=20 make this mistake - to treat "whatever-we-call-the-no-tag-tag-today" as=20 a normal, run-of-the-mill tag. This is the root cause for all the=20 problems you have getting a simple set of rules to work. It just _can't_ work. Treating the "having no tag" as if it is a special=20 sort of tag is like trying to square a circle. Give it up... Have fun, Oren Ben-Kiki |
From: Clark C. E. <cc...@cl...> - 2004-09-08 23:07:53
|
On Thu, Sep 09, 2004 at 12:40:04AM +0300, Oren Ben-Kiki wrote: | Alas, obviously they didn't both use the same schema. | _That_ is the problem. Ok. So your methodology _requires_ a schema then? If this is true, there is nothing wrong with your schema having an 'implicit transform' sending !implicit-* to what ever tags you wish. The only thing that your NULL idea ensures is that the Representation model isn't sufficient to describe the 'resolution' process; and that means, we need another model. One that says what these NULL thingies are, and how they behave. | Your problem is that you keep on viewing "having no tag" as "having a | magical tag". You keep trying to _use_ this "magical no-tag tag". This | keeps to leading to contradictions, inconsistencies, complications, | special cases, restrictions, page-long sets of rules etc. Not at all. If the tag is missing, it should be '!implicit-xxx' or what ever we say it should be. I don't care, but an missing tag should signify a tag. What the tag means is _completely_ up to the application. No special rules, no such thing as 'resolution', you omit the tag in the syntax, you know exactly what you get when you parse. Best, Clark |
From: Clark C. E. <cc...@cl...> - 2004-09-08 14:08:51
|
On Wed, Sep 08, 2004 at 09:48:12AM -0400, T. Onoma wrote: | #ex 1 | -- !!map | a: 1 | b: 2 | | #ex 2 | -- | a: 1 | b: 2 | | So is it acceptable/desirable to allow _different_ transforms | for the above two examples? I think it is entirely acceptable to allow as many transforms as one wants for both of the examples above. *wink* But, if you are saying, should they be reported by the Parser identically -- if we define an omitted tag on a mapping to be !!map or tag:yaml.org,2002:map then answer is yes. Is this desireable; sure, why not. A tag:yaml.org,2002:map can be yer generic mapping. If an application knows that it is something more specific, they can transform it according to the schema of their desire. | > | Also what about default %TAG prefixes? NULL gets out of those, but what | > | if we use !implict-* or !!*. How should default %TAG play a role? | > | > I was thinking it'd be better 'fix' these default tags to be a very | > specfic value (regardless of %TAG directive). I think my favorite | > option is to use !!map, !!seq, !!str, and !!imp where !! is | > tag:yaml.org,2002: While not as flexible as the rest, it avoids any | > possible ambiguity. The mechanism is already quite clever, no point in | > making it more so by having its prefixes shift. | | Well, assuming !implicit-* == !!*, then | | %TAG !! my.org,2004 | | will or will not alter the !!* implicits? If not, that's an | exception. Is it worth an exception? Oh, no, it wouldn't be an exceptional rule, if that's what you're asking. The rule could be: If a tag is missing, the parser will report, depending on the kind and style of node, the following 'cooked' tags: mapping: tag:yaml.org,2002:map sequence: tag:yaml.org,2002:seq plain scalar: tag:yaml.org,2002:imp other scalars: tag:yaml.org,2002:str Simple enough, the rule wouldn't not be defined how it would appear in an 'equivalent serialization' (although, that's handy for giving examples). For example, %TAG !! tag:my.org,2004: --- - !!bing mybing - "strval" ... Would then have an equivalent serialization: %TAG !! tag:my.org,2004: %TAG !yaml! tag:yaml.org,2002: --- - !!bing mybing - !yaml!str strval ... Does help? Clark -- Clark C. Evans Prometheus Research, LLC. http://www.prometheusresearch.com/ o office: +1.203.777.2550 ~/ , mobile: +1.203.444.0557 // (( Prometheus Research: Transforming Data Into Knowledge \\ , \/ - Research Exchange Database /\ - Survey & Assessment Technologies ` \ - Software Tools for Researchers ~ * |
From: T. O. <tra...@ru...> - 2004-09-08 15:12:06
|
On Wednesday 08 September 2004 10:08 am, Clark C. Evans wrote: > On Wed, Sep 08, 2004 at 09:48:12AM -0400, T. Onoma wrote: > | #ex 1 > | -- !!map > | a: 1 > | b: 2 > | > | #ex 2 > | -- > | a: 1 > | b: 2 > | > | So is it acceptable/desirable to allow _different_ transforms > | for the above two examples? > > I think it is entirely acceptable to allow as many transforms > as one wants for both of the examples above. *wink* > > But, if you are saying, should they be reported by the Parser > identically -- if we define an omitted tag on a mapping to be > !!map or tag:yaml.org,2002:map then answer is yes. > > Is this desireable; sure, why not. A tag:yaml.org,2002:map can be yer > generic mapping. If an application knows that it is something more > specific, they can transform it according to the schema of their desire. Hmm? Sorry, it seems almost as if you're saying that they both should be=20 reported the same by the parser _and_ the transform should be able to make = a=20 distinction. I'm really wondering about if making the distinction violates= =20 the BIG notice: A transformation should only use information in the representation =A0 model (it should not use key order, comments, the prologue, yada yada, =A0 or any other information from the presentation or serial model). Hmmm.... Okay, I think I'm ready to make my recommendation. I don't think they shoul= d=20 be !!*. Here's why. !!imp is a plan-scalar, but really it is variant-scalar. The name conveys a= =20 big difference in how one thinks about it. !!imp is really a fiction of=20 sorts, to tell the resolver to consider this scalar for a "basic=20 transformation", if nothing applies then make it a regular scalar. My point= =20 being that the tag !!imp is not a description for an actual type,=20 like !!str, !!map and !!seq are. But likewise one could think the same about untag mapping and sequences too= =2E=20 They could be variant-mapping and variant-seq. And actually a variant-mappi= ng=20 makes a lot of sense, b/c if the keys fail equality after transformations,= =20 then it can fall back to a !!rel. (An actual !!map on the other hand would= =20 error) Of course other transforms may be applicable according to the=20 application --but this is certainly the sensible default. So then, we really need different tags for these: variant-map, variant-seq = and=20 variant-str. But is it one ! or two !!? Or maybe something completely=20 different? Perhaps, like I suggested before: ?map ?seq ?str At any rate. I think the variants, in whatever form, will work nicely. > Oh, no, it wouldn't be an exceptional rule, if that's what you're > asking. The rule could be: > > If a tag is missing, the parser will report, depending on > the kind and style of node, the following 'cooked' tags: > > mapping: tag:yaml.org,2002:map > sequence: tag:yaml.org,2002:seq > plain scalar: tag:yaml.org,2002:imp > other scalars: tag:yaml.org,2002:str > > Simple enough, the rule wouldn't not be defined how it would appear > in an 'equivalent serialization' (although, that's handy for > giving examples). For example, > > %TAG !! tag:my.org,2004: > --- > - !!bing mybing > - "strval" > ... > > Would then have an equivalent serialization: > > %TAG !! tag:my.org,2004: > %TAG !yaml! tag:yaml.org,2002: > --- > - !!bing mybing > - !yaml!str strval > ... > > Does help? Yep. > Clark P.S. I would like to add !rel to the repository, it is like !pairs but=20 unordered. =2D-=20 T. |
From: Clark C. E. <cc...@cl...> - 2004-09-08 15:46:03
|
On Wed, Sep 08, 2004 at 11:12:01AM -0400, T. Onoma wrote: | Hmm? Sorry, it seems almost as if you're saying that they both should be | reported the same by the parser _and_ the transform should be able to make | a distinction. Well, if it does so without using presentation / serialization information, that's fine. For example, it could use the time of day the message arrived to disinguish between them, or any other external information. We just don't want people using non-informational parts of the YAML document (so that emitters are free to choose among the forms without accidently changing the meaning of the document). | Okay, I think I'm ready to make my recommendation. I don't think they | should be !!*. Here's why. | | !!imp is a plan-scalar, but really it is variant-scalar. The name conveys | a big difference in how one thinks about it. !!imp is really a fiction of | sorts, to tell the resolver to consider this scalar for a "basic | transformation", if nothing applies then make it a regular scalar. My | point being that the tag !!imp is not a description for an actual type, | like !!str, !!map and !!seq are. Ok. Oren will like your line of reasoning here. ;) | But likewise one could think the same about untag mapping and sequences | too. They could be variant-mapping and variant-seq. And actually a | variant-mapping makes a lot of sense, b/c if the keys fail equality | after transformations, then it can fall back to a !!rel. If the two keys fail equality test after transformation, the answer is simple -- don't do this transform. It is an error, game over. | (An actual !!map on the other hand would error) Oh dear. If any two keys in _any_ mapping (regardless of the mapping's tag) are equal, it violates the YAML Model, and the document is "invalid". If you decide to change the container to a 'set' like object, fine, but at that point, your implementation is no longer compliant with YAML's Representation Model. [see below on definition of 'rel'] | Of course other transforms may be applicable according to the | application --but this is certainly the sensible default. I doubt it. A primary assumption higher level tools, like a path or schema validating language, make will not be valid; those tools are useless to you, and interoperability is shot. If you want a _sensible_ default, you'd downgrade one of the offending keys to have a !tag and value that no longer conflict; in your suggestion below, keep the keys as 'variant-str' as they were parsed, for example. In any case, the transform failed, it is in error. | So then, we really need different tags for these: variant-map, | variant-seq and variant-str. But is it one ! or two !!? | Or maybe something completely different? Perhaps, like I | suggested before: | | ?map | ?seq | ?str | | At any rate. I think the variants, in whatever form, will work nicely. In this case, I'd suggest that the !variant-* forms (and David called them !unspecified-* which is just as good) be private types. So, these rules would be: untagged mapping: !unspecified-mapping untagged sequence: !unspecified-sequence plain scalar: !unspecified-scalar other scalars: tag:yaml.org,2002:str Yes? | P.S. I would like to add !rel to the repository, it is | like !pairs but unordered. Could you spell out what the semantics of !rel would be and how you'd serialize it? Cheers! Clark |
From: Clark C. E. <cc...@cl...> - 2004-09-08 16:36:53
|
| But likewise one could think the same about untag mapping and sequences | too. They could be variant-mapping and variant-seq. And actually a | variant-mapping makes a lot of sense, b/c if the keys fail equality | after transformations, then it can fall back to a !!rel. | (An actual !!map on the other hand would error) Let me try to explain how I would handle duplicate key issues with !unspecified-scalar. Suppose you had a document, --- x : > unspecified "x": 'string' Assume the omitted tag 'expansion' rules make this document report exactly as if the following were parsed instead: %TAG tag:yaml.org,2002: --- !unspecified-mapping !unspecified-scalar 'x': !yaml!str "unspecified" !yaml!str 'x': !yaml!str "string" ... In this case, a transform (or as oren says, resolution) which only converts (!unspecified-scalar, 'x') into ('tag:yaml.org:2002,str','x') would violate YAML's model, since the key "x" is duplicated. This violation is completely independent of what !unspecified-mapping turns out to be, a !!map or a !wibble. You've got a few options: a) Report, at the time of transformation, that the source document is invalid for the schema you have intended (that is, one that maps unspecified scalars to strings). This is the most straight-foward and obvious thing to do. b) Just don't transform that particular key; load it as a generic 'YAML Scalar' object (which isn't equal to "x") and let the application cope. c) Either change the offending key's tag to something that isn't 'tag:yaml.org:2002,str' (keeping it the same does this), or change the content. Note, this would be a rather odd transformation as it really alters the intent, but alas, if the application is fine with that sort of thing, great. d) Transform the container into something that does have unique keys for this case, possibilities: # a sequence of sequences, where the transform imposes # an order, say alphabetical by mapping value --- - ["x", 'string'] - [x, "unspecified"] # or, a sequence of mappings (pairs), also where the # an order is imposed --- - {"x": 'string' } - { x : "unspecified" } # a mapping of sequences (as keys) empty character # (which only works if key + value is unique) --- ? ["x", 'string'] ? [x, "unspecified"] In any case, only changing the mapping's tag to a '!!rel' is not sufficient. Yea? Clark |
From: T. O. <tra...@ru...> - 2004-09-08 17:43:35
|
On Wednesday 08 September 2004 12:36 pm, Clark C. Evans wrote: > Let me try to explain how I would handle duplicate key issues > with !unspecified-scalar. Suppose you had a document, > > --- > x : > > unspecified > "x": 'string' > > Assume the omitted tag 'expansion' rules make this document report > exactly as if the following were parsed instead: > > %TAG tag:yaml.org,2002: > --- !unspecified-mapping > !unspecified-scalar 'x': !yaml!str "unspecified" > !yaml!str 'x': !yaml!str "string" > ... > > In this case, a transform (or as oren says, resolution) which only > converts (!unspecified-scalar, 'x') into ('tag:yaml.org:2002,str','x') > would violate YAML's model, since the key "x" is duplicated. This > violation is completely independent of what !unspecified-mapping turns > out to be, a !!map or a !wibble. You've got a few options: > > a) Report, at the time of transformation, that the source > document is invalid for the schema you have intended > (that is, one that maps unspecified scalars to strings). > > This is the most straight-foward and obvious thing to do. Not really. > b) Just don't transform that particular key; load it as > a generic 'YAML Scalar' object (which isn't equal to "x") > and let the application cope. A "disregarding" of what I asked a scalar to be. There's other considerations here then just those that arise with ?str (!variant-str), like 23 vs. 23.0 -- So no. > c) Either change the offending key's tag to something that isn't > 'tag:yaml.org:2002,str' (keeping it the same does this), or change > the content. Note, this would be a rather odd transformation as it > really alters the intent, but alas, if the application is fine > with that sort of thing, great. Indeed, the application would need to specify such behavior. And "keeping it the same" is keeping it not "real". That's my problem with using ! or !!. It's acting like its somehow real. It's not. No, it has be made a native type in the end of some sort. You can't get out of that. > d) Transform the container into something that does have > unique keys for this case, possibilities: > > # a sequence of sequences, where the transform imposes > # an order, say alphabetical by mapping value > --- > - ["x", 'string'] > - [x, "unspecified"] > > # or, a sequence of mappings (pairs), also where the > # an order is imposed > --- > - {"x": 'string' } > - { x : "unspecified" } > > # a mapping of sequences (as keys) empty character > # (which only works if key + value is unique) > --- > ? ["x", 'string'] > ? [x, "unspecified"] Or a bucket of bolts: --- !bucket - !bolt - "x" - 'string' - !bolt - x - "unspecified' It a transformation after all, so the app rules the roost. The question is not what it could become but what is the proper _default behavior_. > In any case, only changing the mapping's tag to a '!!rel' > is not sufficient. > > Yea? So, Nay. Because, just as I ask for a ?str (i.e. !variant-str or !unspecified-str, whatever) and it doesn't match on "resolution", I expect it become a !!str b/c that's the best serialization it can give me based on the circumstances. Likewise if I ask for a !map? and b/c of key equality issues on resolution, I also expect the best approximation it can give me, and with the ?map, that is a !!rel. In either case I can always override this behavior, say, for instance, cough up an error instead. But by default I'd rather it do it's best. Now, you can say this "violates the YAML model" all you want, but that doesn't do the end-user a lick of good. We need to take it into account. Doing so deals seamlessly with the problems we have with Python's Dictionary not being the same a Ruby's Hash. In other words certain valid Hashes in Ruby can not be represented as a Dictionary in Python. So we need something to catch those cases. While, yes, there are all the options suggestted above, we want the most proper and reasonable default --the closest type possible, to count on. In this case it is certainly a !!rel, since a !!map does not have order. Also, I point out that these "alternate" types aren't a big deal. _why already has the others implemented, like !omap. They are not hard to implement. And lets face it, you can call it just a "recommendation" but general implementors are going to take it as just a "requirement". That's simply the way it is. BTW the last example above of "a mapping of sequences (as keys) empty character" emulates (albeit it is not exactly the same as) a !!rel. -- T. |
From: Clark C. E. <cc...@cl...> - 2004-09-08 18:36:21
|
On Wed, Sep 08, 2004 at 01:43:28PM -0400, T. Onoma wrote: | > In this case, a transform (or as oren says, resolution) which only | > converts (!unspecified-scalar, 'x') into ('tag:yaml.org:2002,str','x') | > would violate YAML's model, since the key "x" is duplicated. This | > violation is completely independent of what !unspecified-mapping turns | > out to be, a !!map or a !wibble. You've got a few options: | > | > a) Report, at the time of transformation, that the source | > document is invalid for the schema you have intended | > (that is, one that maps unspecified scalars to strings). | > | > This is the most straight-foward and obvious thing to do. | | Not really. As I've specified the scenerio above, the result of applying the simple to the original document produces an Invalid document. You can verify this with a thought-experiement, no implementaion is required. The only real answer here is an error. Of course, if your application wanted to 'fudge' it to change the structure of the container (to, say a list of lists, or a mapping of keys to a list of values); both of these are OK. And both of these are application specific decisions. However, your other examples where you hint at Python's problem with { 1: 'integer', 1.0: 'float' } is an exceptional case, and it is implementation specific. This is a situation where the Python mapping and its native types fail to behave how the YAML Model specifies. For these sorts of icky cases where the native binding doesn't match YAML's semantics, it is competely appropriate (with a warning) to use a 'generic' object instead of a native object. In this specific circumstance, I'm not sure what the best solution is, perhaps the loader uses a 'generic' mapping that implement's YAML's definition of equality, or perhaps the loader would bind both values to a more generic 'Number' or 'Variant' that implement's YAML's equality. This is completely an implementation issue though. And they do not require a structural change to the container nor a change in the semantics of the result. | That's my problem with using ! or !!. It's acting like its | somehow real. It's not. No, it has be made a native type | in the end of some sort. You can't get out of that. Well, !bingles isn't real either. Besides having a tag, your application needs to 'recognize' it and then, if you are luckly, it may be 'available' in your local environment. That it appeared as untagged in the syntax doesn't change this fundamental reality about type specifiers. Best, Clark -- Clark C. Evans Prometheus Research, LLC. http://www.prometheusresearch.com/ o office: +1.203.777.2550 ~/ , mobile: +1.203.444.0557 // (( Prometheus Research: Transforming Data Into Knowledge \\ , \/ - Research Exchange Database /\ - Survey & Assessment Technologies ` \ - Software Tools for Researchers ~ * |
From: T. O. <tra...@ru...> - 2004-09-08 19:33:45
|
On Wednesday 08 September 2004 02:36 pm, you wrote: > On Wed, Sep 08, 2004 at 01:43:28PM -0400, T. Onoma wrote: > | > In this case, a transform (or as oren says, resolution) which only > | > converts (!unspecified-scalar, 'x') into ('tag:yaml.org:2002,str','x') > | > would violate YAML's model, since the key "x" is duplicated. This > | > violation is completely independent of what !unspecified-mapping turns > | > out to be, a !!map or a !wibble. You've got a few options: > | > > | > a) Report, at the time of transformation, that the source > | > document is invalid for the schema you have intended > | > (that is, one that maps unspecified scalars to strings). > | > > | > This is the most straight-foward and obvious thing to do. > | > | Not really. > > As I've specified the scenerio above, the result of applying the simple > to the original document produces an Invalid document. You can verify > this with a thought-experiement, no implementaion is required. The > only real answer here is an error. A thought experiment according to your way of thinking. (see below) > Of course, if your application wanted to 'fudge' it to change the > structure of the container (to, say a list of lists, or a mapping of > keys to a list of values); both of these are OK. And both of these > are application specific decisions. Well, I wouldn't call it a "fudge" if that's what my app wants. That devalues Brian's principle of who is in control. Know what I mean? > However, your other examples where you hint at Python's problem with { > 1: 'integer', 1.0: 'float' } is an exceptional case, and it is > implementation specific. This is a situation where the Python mapping > and its native types fail to behave how the YAML Model specifies. For > these sorts of icky cases where the native binding doesn't match YAML's > semantics, it is competely appropriate (with a warning) to use a > 'generic' object instead of a native object. In this specific > circumstance, I'm not sure what the best solution is, perhaps the loader > uses a 'generic' mapping that implement's YAML's definition of equality, > or perhaps the loader would bind both values to a more generic 'Number' > or 'Variant' that implement's YAML's equality. This is completely an > implementation issue though. And they do not require a structural > change to the container nor a change in the semantics of the result. Ah, see here what's I was talking about when I said: "If YAML intends to stay out of the business of semantics, then it must give up trying to tame transformation." Now look at what you said: "This is a situation where the Python mapping and its native types fail to behave how the YAML Model specifies. For these sorts of icky cases where the native binding doesn't match YAML's _semantics_ ..." [my emphasis] So YAML is doing exactly what it's supposed to keep it's nose out of. Further it's a bit audacious (IMHO) to think of Python failing YAML's spec. YAML is a tool that Python programmer use, not the other way around. The burden is on YAML, if it wants to support Python. So then you go one to say: "I'm not sure what the best solution is..." and your not sure precisely because you've extricated the most plausible one based on an abstract thought of a semantic you believe YAML to have. (ref above) And so... Force that semantic value and you're forced to tame an unruly transformation --that's too bad, since there is quite a natural --even proper, one. > | That's my problem with using ! or !!. It's acting like its > | somehow real. It's not. No, it has be made a native type > | in the end of some sort. You can't get out of that. > > Well, !bingles isn't real either. Besides having a tag, your > application needs to 'recognize' it and then, if you are luckly, it may > be 'available' in your local environment. That it appeared as untagged > in the syntax doesn't change this fundamental reality about type > specifiers. No, !bingles is very real. Its mine (or whomever's). So it real to me (or whomever) A "variant-whatchamacallit", on the other hand, is not real --it is real to no one, and can never be real --it must be transformed. That fact that it's untagged specifically puts the final burden of "resolution" on the value. That's the reality. That's why we have the notion of "!imp" to begin with. Likewise the same mechanism can be an elegant solution for dealing with the _variants_ of implemented "maps". To be very clear, it really just boils down to this: - Add !!rel to the repository - Add a statement along these lines: "If the _recommended_ type !!rel has been implemented, and upon resolution to native-type, a '!variant-map' does not meet the requirements of yaml's !!map (namely the inequality of keys), and also assuming no other transformation has been specified explicitly for this tag, then the _recommended_ behavior is to transform it into a !!rel. If !!rel is not implemented then the recommended behavior is, by necessity, to throw an error." T. |
From: David H. <dav...@bl...> - 2004-09-08 21:16:13
|
T. Onoma wrote: > An implicit tag is not "NULL". It can only be expressly left out because I > know precisely what it _will be_ in doing so. Hence I expect: > > --- #=> !!seq > - "this" #=> - !!str > > --- #=> !!map > "here" : there #=> !!str : !!imp > > And by expectation only can I leave it out. So what matters most is that is > something specific. Hmm. That is very different from how I interpreted !!seq, !!map and !!str. My impression was that, e.g. !!seq was intended to say "this is *just* a ordinary sequence; don't try to implicit-type it to something more specific." This is even more important for !!str. If !!seq is merged with the implicit sequence tag, etc., then there is no way to say this. -- David Hopwood <dav...@bl...> |
From: Clark C. E. <cc...@cl...> - 2004-09-08 22:55:10
|
On Wed, Sep 08, 2004 at 10:15:59PM +0100, David Hopwood wrote: | Hmm. That is very different from how I interpreted !!seq, !!map | and !!str. My impression was that, e.g. !!seq was intended to | say "this is *just* a ordinary sequence; don't try to | implicit-type it to something more specific." Hmm. Just to play a bit of devil's advocate here. There are two cases I can think of: - Assume you have your own application 'scheme', say you are a timetracking program. When you run into a timeslip you are going to know which tags should be implicitly typed or not. In this case, it doesn't matter if your tag is !!seq or not, you're going to apply your types. - If your application is 'generic', and you arn't familar with a timeslip, then you don't have enough information to type the tag; so, !!seq fits just fine. So, I claim: - that there is no reason to have both !implicit-seq and tag:yaml.org,2002:seq - which ever way you want to call it, it is a generic-implicit So, while I like your idea of !implicit-sequence, I'm not sure what advantage it has over just plain !!seq, besides that Oren/T.Onoma might agree to it and we can be done with this. ;) Clark |