For a given processor, each tag used by a given document can be
either known or unknown. A tag is considered known if and only if
the canonical value for the scalar is available. If a tag is
unknown, then equality or peeking into the scalar's value
results with a warning (which can be turned off).
On Sat, Oct 11, 2003 at 11:20:12PM +0200, Oren Ben-Kiki wrote:
| The best answer I have is that the weak model is needed when there's
| _partial_ knowledge of a schema. I can load a YAML document to memory
| knowing only some of its types, verify only some of the nodes against
| some partial schema, perform some processing based on the types I do
| know, and emit the result by round-tripping the unknown types
| For example, a YAML based messaging system may load a "message" as a
| single object where certain fields (the message headers) would be
| "known" and be validated, while others would be loaded as "unknown" and
| merely forwaded. Or a YSLT-like program could load a document, do some
| structural re-arranging based on fields that use just the core types,
| and emit the result, preserving all the exotic unknown types.
Right. In this case, the "unknown" nodes are available for
forwarding, but should not participate in conditional logic.
On Thu, Oct 16, 2003 at 10:14:37AM +0200, Oren Ben-Kiki wrote:
| > You could treat a unknown scalar as a string. However, this
| > does not allow a YAML processor to correctly identify
| > malformed documents.
| While this is an issue, I don't think it is a critical one. A documument
| containing !foo is clearly intended to be consumed by some system that
| is aware of !foo; It see no harm in that invalid format detection is
| delayed until the document reaches such a system. For example, it might
| pass through a generic YAML messaging system.
Ok. So, if a document subtree contains one or more "unknown"
nodes, then that subtree (and all parents) cannot be validated,
ie, you can't say its valid or invalid YAML.
| > That is, they are
| > asserting that the tag operates in a way which is isomorphic
| > (same effect in all cases) to string equality plus tag
| > equality. If this isn't true, you have garbage in, garbage
| > out... nothing YAML can do about this. Right?
| I think GIGO is the only realistic approach for a generic YAML tool
| (e.g. YSLT, the hypothetical YAML messaging system). Issues of validity
| must be deferred until the document reaches a system that (expects to)
| understand all its types.
Right. So, a YAML document can be valid, indeterminant, invalid.
If two unknown scalars are used within a mapping, and they have
the same tag and value, then the mapping (and thus the document)
is invalid. If a mapping contains two unknown scalars of the
same tag but different values, then the mapping (and thus the
document) is indeterminant. Therefore, the possibility of unknown
typing implies that a given YAML document in the context of a
given YAML processor may have three states; if the state is invalid
or valid, then this designation holds regardless of the processor used.
| Right. That would be perfect for something like a YAML-based messaging
| system. For YPATH functionality, you might want to allow such values to
| be compared as strings with an appropriate warning.
Ok. So if YPATH uses the 'value' of a scalar, and the scalar
is not known, then a warning should be given.
| A YSLT tool could emit a warning when a new mapping is created using
| keys with unknown types.
| If on the other hand you accept that YSLT _by itself_ makes no
| guarantees about the validity of input/output documents
This is the best approach... probably.
| > For the impact on the spec; I'd like the spec model section
| > to discuss this issue -- outline the possible problems and
| > then make the recommendation of requiring full tag
| > resolution. In other words, awareness and consistency are important.
| IMO the right solution is to distinguish between schema-aware/specific
| applications and schema-unaware/generic tools.
I'm not sure what you mean by schema aware/unaware. I see a schema
as a transformation that changes !str typing into !whatever typing.
The issue isn't at all related to schema as much as it is with
!whatever not being a type known by the YAML processing system.
| The former always work at the strong model and pose no problem.
| The latter must fall back to the weak model when unknown tags are
| encountered, and can only guarantee the validity of their output
| given that their input was valid in the first place.
I assume you mean this to be granular, ie, a loader can create
both native nodes when it knows the !tag, and 'generic scalar'
nodes when it doesn't. As such, I see one model, with each
tag in the system having a flag if it is known or unknown. I
am not comfortable with two models (a weak vs a strong) as the
problem is more complicated than this and for the model section
to be valuable it must describe the interaction between nodes
of a known and unknown types.
| I can see adding a requirement that when a generic tool is asked
| to perform some operation that potentially _introduces_ invalidities to
| the output, it must emit a warning (unless the user explicitly disables
| such warnings). Specifically such warnings should include comparing
| unknown tags as (tag, string) tuples and creating/modifying keys with
| unknown tags.
Right. So, nodes with an a tag which is unknown to a given processor
will have a set of warnings to be used when the node is used in an
equality setting (recursively defined) or when looking 'into' the
string value during a process. This way literal copying does not
issue a warning, but just about every other use of scalars with
an unknown tag does.
On Thu, Oct 16, 2003 at 09:49:08AM -0700, Brian Ingerson wrote:
| I totally agree with everything you've said here. I think we are
| converging rapidly.
Somewhat. I don't agree with Oren in that we have a weak and
a strong model; I think we have one graph model where the
!tag has a known/unknown flag dependent upon the processor.
Also, I think you and Oren are confusing unknown types vs
the action of a schema processor. In particular, I see
a schema processor as working at the graph level (perhaps
on a serialization) *transforming* the data by changing
!str nodes into !whatever nodes.
| Most everything boils down to:
| * ALWAYS CONSIDER: Schema Blind vs Schema Aware
| * Generic YAML processors should be as forgiving as possible
| * Specific applications should be as rigid as they need to be
Yes; although I'd also say that the default behavior for a
YAML processor should be to notify the user of a potential
problem due to an unknown type; and to let this notification
be turned off.
| I asserted to Clark that at the YPATH "level" you always know the
| tag by definition.
And this assertion is wrong. As you may use YPath to select a
node from a graph where the node selected has an "unknown tag".
| YPATH operations work on a set of "ynode" nodes, whose scalar nodes
| always contain the tuple ['canonical-tag', 'canonical-string-value'].
| Where 'canonical' means whatever the 'schema'
| (loader/schema-doc/type-adapter/application-code/viewer) says it means.
And this is also too restrictive. YPath operations work on
'ynode' generic nodes (perhaps as a view). Scalar ynodes
are a tuple, (tag, value), and within a given processor
the tag is either known or unknown. If the tag is known,
then the scalar's value is canonical.
| So YPATH always uses strong equality. It's just that the "strength"
| can vary widely from application to application.
Well, YPATH can always get at a scalar's value or do scalar equality;
but if the scalar's tag is unknown, then a warning should be issued
by the YAML processor (a warning that can be turned off).