On Thursday 09 September 2004 23:17, David Hopwood wrote:
> It can do. The fact that a tag is associated with a single kind is
> simply a matter of convention; it isn't enforced, and can't be
> enforced for unrecognized tags.
Again: The spec can't "enforce" *anything*. The spec simply states what
call a "YAML schema", what is we call a "conforming YAML processor" and
so on. So, you are free to have tags which apply to more than one kind.
Just don't call them YAML tags and don't call your schema a YAML
schema. If you do, well, we'll complain loudly. Since YAML is a
trademark (I think), we might even be able to force you to rename your
stuff to "NotYaml"... But we can't and don't have the faintest
motivation to make you stop doing whatever works for you :-)
> If implementations assume this
> property in other places where it isn't used redundantly, they will
> get into trouble.
-10. Must applications have to carry a load of redundant data and extra
processing logic on the off-chance someone feeds them NotYaml
pretending to be YAML?
> Nodes don't represent native objects; native objects represent nodes.
-10. This is a major point. YAML is a data serialization language.
"In the beginning, there was the data".
It is _then_ serialized. When you load a YAML document, you are
*RE*constructing the "data" that was originally serialized into the
It doesn't matter if the document was written by hand. Whoever wrote the
document had some "data" - whether in a computer storage or just in his
mind - which he serialized into YAML. Loading - either into computer
storage or into the mind of a human reader - reverses the process.
It _can't_ happen that a YAML document gets created spontaneously and
_then_ someone extracts the data from it. The big bang wasn't a
well-formed YAML document, and neither are virtual particle pairs :-)
So... Nodes _do_ represent native objects. That's why we call this a
Now, the whole "identity" thing makes more sense. It talks about being
able to serialize data where the same object is reachable through
different paths. Operationally, when loading the data, you want to say
"the object _here_ is the same one as the one I already loaded from
The spec says that if you present your data object as a YAML scalar,
then YAML will _not_ preserve the identity of your native objects; when
someone else loads the document, in his "data", all equal loaded
objects may be the same object - or they might each be a different one
even if you used anchors in the YAML document.
The spec it says that if you present your data as YAML collections, then
YAML _will_ preserve the identity; whenever anyone will load the YAML
document, he'll have exactly one object per identical node, so by using
anchors you can ensure he'll have exactly one data object for each of
your data objects.
And again, if an application depends on this rule, and some schema says
differently, "bad things will happen". Well, again: an implementation
need not be burdened with defensive programming against someone trying
to pass a NonYaml as if it was YAML.
> ... a Python API ... cannot
> represent every valid element of the YAML node graph model
If your original data is Python, you simply define a strict YAML schema
that works for you, which is actually quite easy even if Python doesn't
distinguish between !!int and !!float. Just say all your numbers
are !!float, for example, even if they have no '.' in them. Seems
simple enough to me.
However, if you load someone's schema for serializing Java data into
Python, you will face problems, because his data uses types that have
no direct match in your types set. You have two options:
- Strictly adhere to the semantics he had in mind for his data, by
defining new Python types, or using shadow objects to preserve
non-Python information, or any other trick. Costly, difficult, but is
strict YAML and, more importantly, you are safe knowing that you did
not mangle what the document means (the original semantic of the data).
- Take shortcuts, coerce types to the nearest not-quite-appropriate
Python type, and so on. Fine. As you point out, this is quite
reasonable in many cases. Again, the spec doesn't forbid you from doing
that. It just asks you nicely (OK, we'll sue you for every penny you
got if you don't :-) to clearly document "if you use the parser in this
mode, it isn't a simple loading of a YAML document. What you get is NOT
what the original document writer has intended. But it is very simple
and vey useful, so you will want to do this anyway".
> ... to handle YAML documents generically, without making
> assumptions that are only valid in the context of particular
> applications/formats, it's absolutely necessary to have an
> unambiguous representation that can distinctly represent every
> possible YAML node graph.
What is important is to agree that the spec can lay down rules in the
first place! Whatever rules you come up with, someone will say "well,
you can't enforce that, so my code also have to deal with the case the
document breaks this rule". You'll be getting nowhere fast.
I think the current spec does a good job at defining a way to handle
YAML documents generically without assuming anything about the
particular *valid YAML schema* (== "application"). If the document does
not follow a "valid YAML schema", it isn't YAML, and I can't say
_anything_ about it. What if it uses the indentation level to encode
information? What if comments contain meaningful data? What if the
meaning depends on the time of day I read it at?