From: Oren Ben-K. <or...@be...> - 2004-09-11 04:48:36
|
On Thursday 09 September 2004 23:17, David Hopwood wrote: > It can do. The fact that a tag is associated with a single kind is > simply a matter of convention; it isn't enforced, and can't be > enforced for unrecognized tags. Again: The spec can't "enforce" *anything*. The spec simply states what call a "YAML schema", what is we call a "conforming YAML processor" and so on. So, you are free to have tags which apply to more than one kind. Just don't call them YAML tags and don't call your schema a YAML schema. If you do, well, we'll complain loudly. Since YAML is a trademark (I think), we might even be able to force you to rename your stuff to "NotYaml"... But we can't and don't have the faintest motivation to make you stop doing whatever works for you :-) > If implementations assume this > property in other places where it isn't used redundantly, they will > get into trouble. -10. Must applications have to carry a load of redundant data and extra processing logic on the off-chance someone feeds them NotYaml pretending to be YAML? > Nodes don't represent native objects; native objects represent nodes. -10. This is a major point. YAML is a data serialization language. "In the beginning, there was the data". It is _then_ serialized. When you load a YAML document, you are *RE*constructing the "data" that was originally serialized into the document. It doesn't matter if the document was written by hand. Whoever wrote the document had some "data" - whether in a computer storage or just in his mind - which he serialized into YAML. Loading - either into computer storage or into the mind of a human reader - reverses the process. It _can't_ happen that a YAML document gets created spontaneously and _then_ someone extracts the data from it. The big bang wasn't a well-formed YAML document, and neither are virtual particle pairs :-) So... Nodes _do_ represent native objects. That's why we call this a "representation" :-) Now, the whole "identity" thing makes more sense. It talks about being able to serialize data where the same object is reachable through different paths. Operationally, when loading the data, you want to say "the object _here_ is the same one as the one I already loaded from _there_". The spec says that if you present your data object as a YAML scalar, then YAML will _not_ preserve the identity of your native objects; when someone else loads the document, in his "data", all equal loaded objects may be the same object - or they might each be a different one even if you used anchors in the YAML document. The spec it says that if you present your data as YAML collections, then YAML _will_ preserve the identity; whenever anyone will load the YAML document, he'll have exactly one object per identical node, so by using anchors you can ensure he'll have exactly one data object for each of your data objects. And again, if an application depends on this rule, and some schema says differently, "bad things will happen". Well, again: an implementation need not be burdened with defensive programming against someone trying to pass a NonYaml as if it was YAML. > ... a Python API ... cannot > represent every valid element of the YAML node graph model > distinctly... If your original data is Python, you simply define a strict YAML schema that works for you, which is actually quite easy even if Python doesn't distinguish between !!int and !!float. Just say all your numbers are !!float, for example, even if they have no '.' in them. Seems simple enough to me. However, if you load someone's schema for serializing Java data into Python, you will face problems, because his data uses types that have no direct match in your types set. You have two options: - Strictly adhere to the semantics he had in mind for his data, by defining new Python types, or using shadow objects to preserve non-Python information, or any other trick. Costly, difficult, but is strict YAML and, more importantly, you are safe knowing that you did not mangle what the document means (the original semantic of the data). - Take shortcuts, coerce types to the nearest not-quite-appropriate Python type, and so on. Fine. As you point out, this is quite reasonable in many cases. Again, the spec doesn't forbid you from doing that. It just asks you nicely (OK, we'll sue you for every penny you got if you don't :-) to clearly document "if you use the parser in this mode, it isn't a simple loading of a YAML document. What you get is NOT what the original document writer has intended. But it is very simple and vey useful, so you will want to do this anyway". > ... to handle YAML documents generically, without making > assumptions that are only valid in the context of particular > applications/formats, it's absolutely necessary to have an > unambiguous representation that can distinctly represent every > possible YAML node graph. What is important is to agree that the spec can lay down rules in the first place! Whatever rules you come up with, someone will say "well, you can't enforce that, so my code also have to deal with the case the document breaks this rule". You'll be getting nowhere fast. I think the current spec does a good job at defining a way to handle YAML documents generically without assuming anything about the particular *valid YAML schema* (== "application"). If the document does not follow a "valid YAML schema", it isn't YAML, and I can't say _anything_ about it. What if it uses the indentation level to encode information? What if comments contain meaningful data? What if the meaning depends on the time of day I read it at? Have fun, Oren Ben-Kiki |