From: Clark C. E. <cc...@cl...> - 2003-10-08 04:57:00
|
I think I have a few "insights" here, so please read. ;) To review, there are two items which we have agreement: 1. We will remove format from the syntax specifications, simplifying the productions to allow both scalars and collections to use #fragments in their tag uri's. This will better match implementaions and allow for version numbers to use #fragments. 2. While we are editing, we will change 'taguri' to 'tag', to follow the underlying taguri spec we are relying upon. Let me propose 'tag' for the name of this new !thingy, as we were having problems picking between "type family" and "transfer method". I like tag beacuse it goes with the taguri spec, but is short, clear and has less baggage than the other two names (giving us the opportunity to define what it means more clearly, see below).... it is also just that, a 'tag' which labels a serialized node with typing information. ... Onto the core debate... what this change did is re-open the "model" discussion that we never really successfully closed. Let me start with a primer. We have 4 core models, and the open issue is about if the fourth model belongs in the spec or not. Here are the first three models, and this should mostly be concensus (I hope): syntax This is the model that closely relates to the actual productions. A tokenizer would have an API equivalent to the syntax model. In this model, one is concerned about line breaks, comments, indentation, styles, and other issues relating to human presentation. serial This is the model of YAML as it is pushed through a thin pipe in chunks; it breaks scalars into chunks and imposes an order on mapping keys. This model explicitly lacks presentation items such as style, indentation, and comments. A parser would have an API substantially similar to the serial model. graph This is a YAML model where an entire document can be put into a random access storage. In this model, scalars are unified wholes, mapping keys are unordered. However, the actual scalar data is still "strings" with a !type-marker. A generic YAML node interface, either having its own storage, or acting as a wrapper over native data types would closely match this model. The big source of contension is to the existance of the fourth model. It has been called 'typed', 'native', and several other names, none of which really capture the essential aspect of the model... that is, complete awareness of a given !tag by the system. Let me propose a fresh name for this model, 'target' since this is the term often used for platform-specific (aka type aware) assembler linkages. So, without further ado: target This is the capstone YAML model which is similar to the graph model, but implementations will have explicit knowledge of each tag used, and its interpretation within this system. This is often called the "native" model, but I am using "target" model here to include systems where the knowlege in the system exists to create a canonical form of a given data type (see below). Certainly native bindings do this, but other targets could exist; a random access, canonical form, generic YAML node could also match this model. The target model extends the graph model to provide strong-equality. The primary rationale for having models is to let us flesh out our vocabulary so that we can better communicate about implementations, and potential usages (or bad uses) of YAML data. A secondary, but still very important rationale, is to provide a clear interpretation of YAML information so that generic tools, such as a path or schema language can be developed for YAML without having huge intellectual weeds growing in our garden. This latter issue, strangely enough, is best seen through the notion of node "equality", which is essential to define correctly if one is to talk about a ypath or schema (without node equality, a path is quite useless); or even the "obvious" constraint that a mapping may not have two keys which are equal. For XML this is simple, nodes are equal if their string content is equal. However, this does not work as well for YAML, since YAML information is typed. And, to allow for human readability, YAML types are further complicated by necessarly allowing various ways to write a particular type (aka format). I like to think of this as "weak" vs "strong" equality. For weak equality we can only say if two things are equal, we cannot say if they are inequal. For example: RHS LHS weak-equality strong-equality !int 11 !int 11 equal equal !int 11 !int 12 unknown not equal !int 11 !int 0xB unknown equal At best, what the type-ignorant "graph" model can give you is weak-equality, that is, one could have a mapping { !int 11: eleven, !int 0xB: eleven } and there is no way to know that this is invalid YAML unless the implementation has detailed awareness of the !int tag and its formats. Specifically, it would have to know that 0xB is the hex format for the number 11, and therefore the two values above are actually equal, even though they are serialized with different characters. Therefore, a critical constraint of YAML (mapping keys are not equal) requires strong-equality. While weak-equality is close, it is just not good enough. So, in this case, we must have a 'target' model in the YAML specification. That is, either a native binding or a canonical form for each scalar value is needed for YAML and any tools build on top of YAML; else we will have weeds and thus bugs in our garden. ... So. This actually makes the decision straight-forward. We have two options: 1. We allow for data types to have more than one format (even though the format is not in the !tag) and thus must have 4 models (graph != target in this case) Good: we have the flexibility of allowing integers to be written in different formats, 0xA and 11 ; dates as '10-JAN-2002' or '2002/01/10'. Bad: this gives us one additional model, and the extra descriptive complexity that goes with it OR, 2. We state clearly that string comparison for scalar values having the same !tag is strong-equality. And, in this case, data types (via a single !tag) may only have one format. In this case we only need 3 models. Good: we only need 3 models, and we can validate a YAML document and say that it is compliant without needing external type-specific knowledge (the method to make a canonical form) Bad: different 'styles' for writing scalars will have to be handled by the application programmer, ie !int/hex, !int/oct are different types as far as YAML is concerned. Thus far, we have done a great job providing a framework where 'content' and 'style' can be cleanly separated. It would be very nice to continue this trend over into scalar formats; but I am not sure if it is worth the complexity -- perhaps this is the point where we should let the application programmer deal with the complexity? The 'graph' vs 'target' model distinction will have a tangable implementation: each YAML system will have a 'type registry' where !tags will be put into the system, probably with either a method to make the string 'canonical' or to provide a native node with equality operator. Also, if a given !tag used for mapping keys is not known by the implementation, it will be impossible to verify that the YAML document is indeed valid. However, given that YAML systems are _already_ doing this, the actual implementation is not much of an additional burden (it is part of what we already do). Thus, the hard part will mostly be explaining it in the model. I am leaning towards 'four' models again. But not out of just being tipsy -- but rather since I now see a clear separation between the between the 'graph' and 'target' that is motivated by tangable "operational" needs; that is, if a YAML document is invalid, it will not be loadable by some systems (due to duplicate keys). And this clarity comes only with bucketing canonicalization and building native nodes together as doing the same abstract operation -- providing strong equality. Ok. I hope this helps to clarify things. Best, Clark |
From: Oren Ben-K. <or...@be...> - 2003-10-08 05:11:51
|
Clark C. Evans [mailto:cc...@cl...] wrote: > I think I have a few "insights" here, so please read. ;) > To review, there are two items which we have agreement: > > 1. We will remove format from the syntax specifications, > simplifying the productions to allow both scalars and > collections to use #fragments in their tag uri's. This > will better match implementaions and allow for version > numbers to use #fragments. > > 2. While we are editing, we will change 'taguri' to 'tag', > to follow the underlying taguri spec we are relying upon. +1 > Let me propose 'tag' for the name of this new !thingy +1. Nice. > The big source of contension is to the existance of the > fourth model. It has been called 'typed', 'native', and > several other names, none of which really capture the > essential aspect of the model... that is, complete awareness > of a given !tag by the system. Let me propose a fresh name > for this model, 'target' since this is the term often used > for platform-specific (aka type aware) > assembler linkages. Hmmm. It still lacks something, but I guess it's the best we can do. > So. This actually makes the decision straight-forward. > We have two options: > > 1. We allow for data types to have more than one format > (even though the format is not in the !tag) and thus > must have 4 models (graph != target in this case) > ... > OR, > > 2. We state clearly that string comparison for scalar > values having the same !tag is strong-equality. And, > in this case, data types (via a single !tag) may only > have one format. In this case we only need 3 models. It is not reasonable to require 12.5 to be written as "1.25e1". And no "!float/tricky-stuff" tag will help you there. Case closed :-) > I am leaning towards 'four' models again. +1. Let's get on with it. > Ok. I hope this helps to clarify things. Wonderfully so. Nicely done! Have fun, Oren Ben-Kiki |
From: Brian I. <in...@tt...> - 2003-10-08 12:28:06
|
On 08/10/03 07:11 +0200, Oren Ben-Kiki wrote: > Clark C. Evans [mailto:cc...@cl...] wrote: > > I think I have a few "insights" here, so please read. ;) > > To review, there are two items which we have agreement: > > > > 1. We will remove format from the syntax specifications, > > simplifying the productions to allow both scalars and > > collections to use #fragments in their tag uri's. This > > will better match implementaions and allow for version > > numbers to use #fragments. > > > > 2. While we are editing, we will change 'taguri' to 'tag', > > to follow the underlying taguri spec we are relying upon. > > +1 +1 > > Let me propose 'tag' for the name of this new !thingy > > +1. Nice. +1 > > So. This actually makes the decision straight-forward. > > We have two options: > > > > 1. We allow for data types to have more than one format > > (even though the format is not in the !tag) and thus > > must have 4 models (graph != target in this case) > > ... > > OR, > > > > 2. We state clearly that string comparison for scalar > > values having the same !tag is strong-equality. And, > > in this case, data types (via a single !tag) may only > > have one format. In this case we only need 3 models. > > It is not reasonable to require 12.5 to be written as "1.25e1". And no > "!float/tricky-stuff" tag will help you there. > > Case closed :-) > > > I am leaning towards 'four' models again. > > +1. Let's get on with it. > > > Ok. I hope this helps to clarify things. > > Wonderfully so. Nicely done! Before I go for 4 models, I would like have somebody *concisely* detail all the differences between the two graph binding variants. I think one main problem I'm having is the vagueness of the term "Graph Model" or even "Generic Model". I would suggest the term "Canon Model". Four models still leaves me with problems in the diagram. I would suggest the following: ->(parser) -> ->(loader)-> CANON <- / \ / \ / \ / \ / -> <- \ SYNTAX SERIAL X (viewer) \ / \ <- -> / \ / \ / \ / <-(emitter)<- <-(dumper)<- TARGET -> Which is as clear as I can do it in ASCII. Note the the 'X' is two lines crossing and *not* a new model :) This diagram has (at least) the following implications: * A loader/dumper works in conjunction with an application and its object classes. The code in the object classes is responsible for marshalling the object in and out of data. * A loader/dumper (together with application classes) can marshall between any type of graph, including both target and canon graphs. * A viewer can "upgrade" a target graph into a canon graph. * There is no named process for going from canon to target. * In fact, a viewer can be created by using a target dumper paired with a canon loader. So maybe it doesn't need to be in the diagram. If we go with four models, I'd like these assertions to hold true: * A target binding "is a" graph. * A canon binding "is a" graph. * We do not define what the general properties of a graph are. * We do define the minimum expected properties of a target binding are. ie the Least Common Denominator * We do define exact expected properties of canon binding. * These two definitions are know as the "Target Model" and the "Canon Model" respectively. * There is no known Target Binding found in nature that meets the exact criteria of a Canon Binding. The two concepts (models) are therefore completely separate. Studying the diagram above, it seems to me that we have three "levels of processing" and four "models", where model means "something we have decided to make rules about. We have the: * Syntax Level * Serial Level * Graph Level Under the Syntax and Serial Levels we have decided to define one model each. Under the Graph Level we have decided (possibly) to define two models. Hope that helps. Cheers, Brian |
From: Oren Ben-K. <or...@be...> - 2003-10-08 13:45:12
|
Brian Ingerson [mailto:in...@tt...] wrote: > Before I go for 4 models, I would like have somebody > *concisely* detail all the differences between the two graph > binding variants. Simple. In one model we have optional tags and weak equality. In the other we have mandatory tags and strong equality. Hey, here's a thought: how about we call these models the "Weak" model and the "Strong" model? Syntax (text), Serial (tree), Weak (graph), Strong (graph). Each exactly conveys the most important aspect that distinguishes it from the rest. It extends easily to tools - Strong YPATH and Weak YAPTH, Strong YSCHEMA and Weak YSCHEMA... I like this! How about it? > I think one main problem I'm having is the vagueness of the > term "Graph Model" or even "Generic Model". I would suggest > the term "Canon Model". Boom! :-) "Canon" is confusing. Especially since it does _not_ use the "canonical format" for the nodes (unless I misunderstood). > Four models still leaves me with problems in the diagram. I > would suggest the following: > > ->(parser) -> ->(loader)-> CANON <- > / \ / \ / \ > / \ / -> <- \ > SYNTAX SERIAL X (viewer) > \ / \ <- -> / > \ / \ / \ / > <-(emitter)<- <-(dumper)<- TARGET -> > > Which is as clear as I can do it in ASCII. Note the 'X' > is two lines crossing and *not* a new model :) The models diagram is a PITA because it attempts to define _operational_ terminology. If we view the _semantical_ relationship between the models, we get (excuse the ASCII ART): +--------------+ | SYNTAX | | +----------+ | | | SERIAL | | +-+-+----------+-+--------+ | | | +------+ | | | | | | | WEAK | | | STRONG | | | | +------+ | | | | | +----------+ | | | +--------------+ | +-------------------------+ (This would look so much better as a Van diagram using color!) The strong model is what we used to call the native model. It is strong because it allows for strong equality, etc. We need it to allow for strong (== useful) YPATH, YSCHEMA, etc. The weak model is the strong model minus knowledge of the semantics of tags. Hence tags are optional there (they wouldn't be of much use). The serial model is the weak model plus pair order, anchors, etc. The syntax model is the serial model plus indentation, comments, style, etc. These semantic relationships are always true. In away, the spec is nothing except for defining these models and relationships (with extra loving care for details when defining the syntax model :-) The problem with operational graphs is that they depend on the specific system. I can write a system where there is a direct translation from syntax to strong. I can write a system where syntax is translated to serial, then to weak, and only then to strong. Or any of another zillion combinations. All this has nothing to do with the definition of YAML. I agree with you that your operational graph makes a lot of sense. It will match most of the systems that will be built. I like it a lot. That said, I think we should get rid of it. We should go over the spec and purge the words "parser", "loader" etc. from it. Instead we should use "YAML processor" and refer to the _models_ instead of _modules_. I think you'll find this is easier than you expect. If you think this is too extreme :-), I'll go with adding an isolated section that includes your (great) diagram, saying how _typically_ YAML systems are built along the lines you describe. I would still banish the terms "parser" and "loader" from the rest of the spec. As for my Van diagram, I'm not certain it adds much to the text. Each section describing a model starts with a clear definition of how this model relate to its predecessor model. But we could include it if you insist :-) > [various operational notes] Sure, they are all good ways to implement a YAML system. > * There is no named process for going from canon to target. Fine. If someone writes one he'll get to name it :-) It all goes into the isolated section anyway (if that). > If we go with four models, I'd like these assertions to hold true: > > * A target binding "is a" graph. +1. "Strong" is a graph. > * A canon binding "is a" graph. +1. "Weak" is a graph". > * We do not define what the general properties of a graph are. I don't understand this. Elaborate? > * We do define the minimum expected properties of a target > binding are. +1. Specifically, the "Strong" model is defined by the "Native model" in the current spec. > * We do define exact expected properties of canon binding. +1. The "Weak" model is defined based on the "Graph model" in the current spec (minus formats, etc.). > * These two definitions are know as the "Target Model" and the "Canon > Model" respectively. -1. I hate "Canon". I like "Weak" and "Strong". I can live with "Graph" and "Target". > * There is no known Target Binding found in nature that meets > the exact > criteria of a Canon Binding. The two concepts (models) are therefore > completely separate. +1, trivially. Proof: Every strong/target binding must provide strong equality. Every weak/graph binding must provide weak equality. Hence no single binding can be both. QED. > Studying the diagram above, it seems to me that we have three > "levels of processing" "Levels of processing" are outside the scope of the spec. You can have them in an isolated, "typical processing pipeline" section if you insist. A system may have zero, one, two, or three levels of processing and still be a valid YAML system. > and four "models", where model means > "something we have decided to make rules about. +1 > We have the: > > * Syntax Level > * Serial Level > * Graph Level -1. We have no "levels". We have the: - Syntax model. - Serial model. - Weak model. - Strong model. > Under the Syntax and Serial Levels we have decided to define > one model each. Under the Graph Level we have decided > (possibly) to define two models. -10. No "levels". Just models. We don't need a new concept. Have fun, Oren Ben-Kiki |
From: Brian I. <in...@tt...> - 2003-10-08 14:45:43
|
On 08/10/03 15:44 +0200, Oren Ben-Kiki wrote: > Brian Ingerson [mailto:in...@tt...] wrote: > > Before I go for 4 models, I would like have somebody > > *concisely* detail all the differences between the two graph > > binding variants. > > Simple. In one model we have optional tags and weak equality. In the > other we have mandatory tags and strong equality. > > Hey, here's a thought: how about we call these models the "Weak" model > and the "Strong" model? Syntax (text), Serial (tree), Weak (graph), > Strong (graph). Each exactly conveys the most important aspect that > distinguishes it from the rest. It extends easily to tools - Strong > YPATH and Weak YAPTH, Strong YSCHEMA and Weak YSCHEMA... I like this! > How about it? I like the terms Weak and Strong (at least off the cuff) but look at the picture you have painted: Syntax - Text Serial - Tree Weak - Graph Strong - Graph We know what the 4 things on the left are. They are "Models". What of the *3* things on the right?? What are they *called*? I tried calling them levels below, but you didn't like it. Call them something else, but stop denying they exist. Also I would like some clarification (by example please) of why Native->Strong and Generic->Weak. I would have guessed it opposite. The Native model is more weakly/loosely defined, since it is a least common denominator of language bindings. The Generic model is more strongly defined by us, as it has to be usable by YPATH. Maybe the terms Strong and Weak aren't so great. Assertion: YPATH cannot be applied at the Serial model. You know, it really bothers me that we have a forked YPATH (strong vs weak). It is a smell that you are trying poorly to deodorize. Assertion: YPATH can only be applied on a graph. Assertion: A graph is not a model. I ask again: What then is a "graph"? > +--------------+ > | SYNTAX | > | +----------+ | > | | SERIAL | | > +-+-+----------+-+--------+ > | | | +------+ | | | > | | | | WEAK | | | STRONG | > | | | +------+ | | | > | | +----------+ | | > | +--------------+ | > +-------------------------+ > > (This would look so much better as a Van diagram using color!) Venn? Assertion: A van is a motorized vehicle. > The strong model is what we used to call the native model. It is strong > because it allows for strong equality, etc. We need it to allow for > strong (== useful) YPATH, YSCHEMA, etc. > > The weak model is the strong model minus knowledge of the semantics of > tags. Hence tags are optional there (they wouldn't be of much use). If they are sooo close then obviously there is an abstraction that covers them both for goodness sakes! Assertion: A car is a motorized vehicle. Assertion: A van has more interior room than a car. See? A Strong Graph and A Weak graph are both Graphs. The stong one just has more interior space. 3 models. 2 variations. > The serial model is the weak model plus pair order, anchors, etc. > > The syntax model is the serial model plus indentation, comments, style, > etc. > > These semantic relationships are always true. In away, the spec is > nothing except for defining these models and relationships (with extra > loving care for details when defining the syntax model :-) > > The problem with operational graphs is that they depend on the specific > system. I can write a system where there is a direct translation from > syntax to strong. I can write a system where syntax is translated to > serial, then to weak, and only then to strong. Or any of another zillion > combinations. All this has nothing to do with the definition of YAML. > > I agree with you that your operational graph makes a lot of sense. It > will match most of the systems that will be built. I like it a lot. > > That said, I think we should get rid of it. We should go over the spec > and purge the words "parser", "loader" etc. from it. Instead we should > use "YAML processor" and refer to the _models_ instead of _modules_. I > think you'll find this is easier than you expect. I can agree that implementation should go outside the spec. Of course the spec will quite useless to anybody other than someone writing an implementation guide. Perhaps in that vein, we move the examples section to a new place as well. Also let's nix all the diagrams. > > If we go with four models, I'd like these assertions to hold true: > > > > * A target binding "is a" graph. > > +1. "Strong" is a graph. > > > * A canon binding "is a" graph. > > +1. "Weak" is a graph". What is a graph? > > * We do not define what the general properties of a graph are. > > I don't understand this. Elaborate? It was poorly worded, but what I meant is that 1) *if* there are two graph models, then they *don't* share a common set of "graph rules". 2) *if* there is a LCD graph model, then we *do* define it, and then also define the differences in the Strong and Weak variants. > > * We do define the minimum expected properties of a target > > binding are. > > +1. Specifically, the "Strong" model is defined by the "Native model" in > the current spec. > > > * We do define exact expected properties of canon binding. > > > Studying the diagram above, it seems to me that we have three > > "levels of processing" > > "Levels of processing" are outside the scope of the spec. You can have > them in an isolated, "typical processing pipeline" section if you > insist. A system may have zero, one, two, or three levels of processing > and still be a valid YAML system. What is a Graph? > > and four "models", where model means > > "something we have decided to make rules about. > > +1 > > > We have the: > > > > * Syntax Level > > * Serial Level > > * Graph Level > > -1. We have no "levels". We have the: > > - Syntax model. > - Serial model. > - Weak model. > - Strong model. > > > Under the Syntax and Serial Levels we have decided to define > > one model each. Under the Graph Level we have decided > > (possibly) to define two models. > > -10. No "levels". Just models. We don't need a new concept. What is a graph? Cheers, Brian |
From: Clark C. E. <cc...@cl...> - 2003-10-08 15:04:15
|
Ok. We seem to be converging. On Wed, Oct 08, 2003 at 07:45:35AM -0700, Brian Ingerson wrote: | On 08/10/03 15:44 +0200, Oren Ben-Kiki wrote: | > Simple. In one model we have optional tags and weak equality. In the | > other we have mandatory tags and strong equality. | | I like the terms Weak and Strong (at least off the cuff) but look at the | picture you have painted: | | Syntax - Text | Serial - Tree | Weak - Graph | Strong - Graph Right. | Maybe the terms Strong and Weak aren't so great. It just talks about what sort of "equality" we have, weak equality happens when the particular YAML system does not know enough about the !tag to allow inequality to be tested. | Assertion: YPATH cannot be applied at the Serial model. Well, there are two things going on here. First, YPath will be defined at the Graph Model level, that is, it will not know about key ordering or aliases/anchors as these items are not in the graph model. Second, there is nothing stopping (at least a subset of) YPath from being applied on a serial API, an implementation of the serial model. | You know, it really bothers me that we have a forked YPATH (strong vs | weak). It is a smell that you are trying poorly to deodorize. Yes, this is a tough issue; but I don't think there is a choice given that some implementations will not have knowledge about at least one !tag. | Assertion: YPATH can only be applied on a graph. YPath will be defined using the graph model at least some of YPath can be executed on serialized YAML | Assertion: A graph is not a model. We use a graph in our model. A model is an abstraction describing which parts of YAML are "informational" and thus signficant, and which parts are "not-important". You can think of the three models as a abstract description (high enough to be useful, but not detailed enough to be precice) which provides guidelines to help interoperability. | > The weak model is the strong model minus knowledge of the semantics of | > tags. Hence tags are optional there (they wouldn't be of much use). | | If they are sooo close then obviously there is an abstraction that | covers them both for goodness sakes! Yes, that is a good idea. | 1) *if* there are two graph models, then they *don't* share a common set | of "graph rules". | 2) *if* there is a LCD graph model, then we *do* define it, | and then also define the differences in the Strong and Weak variants. *nod* I like it, especially since strong/weak designation are not properties of the graph as a whole but rather properties of a "tag" within a YAML system. Best, Clark |
From: Oren Ben-K. <or...@be...> - 2003-10-08 15:22:30
|
Brian Ingerson [mailto:in...@tt...] wrote: > I like the terms Weak and Strong (at least off the cuff) but > look at the picture you have painted: > > Syntax - Text > Serial - Tree > Weak - Graph > Strong - Graph > > We know what the 4 things on the left are. They are "Models". > > What of the *3* things on the right?? What are they *called*? I'd call them "model types". You know - Graph, Tree, List, DAG, Matching, Expander, Clique, ... Text, byte stream, ... That sort of stuff. Any correlation with "levels of processing" is coincidental :-) > Also I would like some clarification (by example please) of why > Native->Strong and Generic->Weak. I would have guessed it > opposite. Because Native allows you Strong equality; YPATH for "Strong" model is stronger - provides more operations - than YPATH for a weak model. > The > Native model is more weakly/loosely defined, since it is a > least common denominator of language bindings. The Generic > model is more strongly defined by us, as it has to be usable by YPATH. The question isn't how loosely/tightly the model is defined (or I'd call it "Loose" and "Tight"). It is how powerful the *model* is (what you can do with it), hence "Strong" and "Weak". > Assertion: YPATH cannot be applied at the Serial model. +1. Anchors will wreak havoc. > You know, it really bothers me that we have a forked YPATH > (strong vs. weak). It is a smell that you are trying poorly to > deodorize. First, this is Clark's idea :-) Assertion: YPATH supports the queries "real_x > 12.5" and "int_x == 8". It follows YPATH must allow "strong" operations. Assertion: YPATH should be usable without knowing all the tags. It follows YPATH must allow "weak" operations. It follows there are two types of operations. I don't think there are two YPATHs, just two types of operations in a single YPATH. > Assertion: YPATH can only be applied on a graph. +1. > Assertion: A graph is not a model. "True, but mostly irrelevant". (Some) models are graphs. > I ask again: What then is a "graph"? A graph is mathematical entity used to model stuff. It is defined as a tuple (V, E) where E is a subset of V x V ... :-) > > +--------------+ > > | SYNTAX | > > | +----------+ | > > | | SERIAL | | > > +-+-+----------+-+--------+ > > | | | +------+ | | | > > | | | | WEAK | | | STRONG | > > | | | +------+ | | | > > | | +----------+ | | > > | +--------------+ | > > +-------------------------+ > > > > (This would look so much better as a Van diagram using color!) > > Venn? My bad. Venn, of course. > If they are sooo close then obviously there is an abstraction > that covers them both for goodness sakes! Sure there is. The question is how _useful_ this abstraction is vs. how useful are the two different "sooo close" abstractions I defined. > Assertion: A car is a motorized vehicle. > Assertion: A van has more interior room than a car. > > See? A Strong Graph and A Weak graph are both Graphs. The > stong one just has more interior space. Right. So? Do you suggest that Van makers will be banned from using the word "van" in their ads, because "car" is good enough? Shouldn't I be able to look for a "van" for a delivery service? Try to rent a van without ever saying the word. "A *car*. You know, the big kind? Squarish, with lots of room in the back, and big doors so I can load my cargo? No, not a 5-door family car... Something bigger, tallish, you know?". It is all in the usage. We need "Weak" for type-unaware tools (drive my family around). We need "Strong" for type-aware tools (deliver UPS packages). These are two very different use cases. We need to be able to specify the requirements for both. We need two different words to call these two different sets of requirements. > I can agree that implementation should go outside the spec. > Of course the spec will quite useless to anybody other than > someone writing an implementation guide. Perhaps in that > vein, we move the examples section to a new place as well. > Also let's nix all the diagrams. Hey, I _said_ we could keep the diagram and the section! :-) As a matter of fact, it _is_ rather useful. BUT. I disagree the spec would be useless without it. > It was poorly worded, but what I meant is that > > 1) *if* there are two graph models, then they *don't* > share a common set > of "graph rules". -1. Why shouldn't they? Why can the serial and the syntax model share the rule that each scalar node has a value, but the weak model and the strong model may not do the same? > 2) *if* there is a LCD graph model, then we *do* define it, > and then also define the differences in the Strong and > Weak variants. Sharing does not imply an LCD. What's the LCD between a sports car and an SUV? Does it have a name? It isn't a useful concept so it doesn't. This doesn't mean there aren't shared traits between them both (e.g., oversized engine that is mainly used to burn petrol faster). Have fun, Oren Ben-Kiki |
From: Clark C. E. <cc...@cl...> - 2003-10-08 14:49:54
|
On Wed, Oct 08, 2003 at 03:44:54PM +0200, Oren Ben-Kiki wrote: | > Before I go for 4 models, I would like have somebody | > *concisely* detail all the differences between the two graph | > binding variants. | | Simple. In one model we have optional tags and weak equality. In the | other we have mandatory tags and strong equality. Right. You get weak equality when the YAML processor lacks knowledge about one or more tags that a YAML document has used, in this case one may need to have "generic" nodes. For the case of strong equality, the YAML processor knows enough about every type to form a native representation, or convert the node into a canonical form. | Hey, here's a thought: how about we call these models the "Weak" model | and the "Strong" model? Syntax (text), Serial (tree), Weak (graph), | Strong (graph). Each exactly conveys the most important aspect that | distinguishes it from the rest. It extends easily to tools - Strong | YPATH and Weak YAPTH, Strong YSCHEMA and Weak YSCHEMA... I like this! | How about it? I think you are on the right path. In an operational setting, the distinction between weak/strong can vary by tag (and hence all of those nodes which share the given tag). So, perhaps we actually have a single 'graph' model, but where each tag has a property "is known" within a given YAML system. When a tag is known, it will have a target which is either a native type or a generic node using canonical form, and then all nodes having that type are loaded using the given target (this is the strong case). If a tag is not known, then the node will be loaded into a generic node where strong equality is not supported; any comparisons using this node are done with weak equality; and issue a warning. I like this approach, beacuse the distinction really is on a tag-by-tag level, not on the entire graph. So, in summary, we have a single "graph" model. When loading information into the node, if any tag is unknown, a YAML processor must issue a warning. By unknown we mean that the YAML processor is unable to construct a native binding or convert the node to a canonical form. In this way, the user may get "unexpected" results when a given loader has unknown tags (due to weak equality); but the user is clearly warned about this issue. So, with this, we are back to the "friendly" three model diagram, which better reflects the operational characteristics of our implementations: ->(parser) -> ->(loader) -> / \ / \ / \ / \ SYNTAX SERIAL GRAPH \ / \ / \ / \ / <-(emitter)<- <-(dumper) -> | +--------------+ | | SYNTAX | | | +----------+ | | | | SERIAL | | | +-+-+----------+-+--------+ | | | | +------+ | | | | | | | | WEAK | | | STRONG | | | | | +------+ | | | | | | +----------+ | | | | +--------------+ | | +-------------------------+ This is also a good digram; but I'm not sure it helps to describe the tag-by-tag "is known" property within a given YAML system. Hmm. Best, Clark |
From: Oren Ben-K. <or...@be...> - 2003-10-08 15:30:37
|
Clark C. Evans [mailto:cc...@cl...] wrote: > In an operational setting, the distinction between > weak/strong can vary by tag (and hence all of those nodes > which share the > given tag). So, perhaps we actually have a single 'graph' > model, but where each tag has a property "is known" within a > given YAML system. Perhaps not. I can understand Brian guarding the usability of the spec for implementers - he's quite right to do so. But this doesn't mean that the spec is "YAML Howto for parser writers". If this is all that YAML was, we'd settle for the syntax spec and be done. YAML is more than that. The fact that an implementation may mix together unparsed YAML, events, unknown (generic) tagged nodes, and native objects in one big happy mish-mash does not mean we need to make a similar mish-mash out of our information model. In fact, I claim that only be defining clear semantics - regardless of any implementation - we can allow people to jump between models this way, because we provide them with a clear "map" of where they are at any given point. If we don't provide distinct "named points on the map" and merely say "this all area is called X", then people won't be able to distinguish where they are and things would get messy very quick. > When a tag is known, it will have a > target which is either a native type or a generic node using > canonical form, and then all nodes having that type are loaded > using the given target (this is the strong case). If a tag is > not known, then the node will be loaded into a generic node > where strong equality is not supported; any comparisons using > this node are done with weak equality; and issue a warning. I > like this approach, because the distinction really is on a > tag-by-tag level, not on the entire graph. YAWTIL - Yet Another Way To Implement YAML. So? I could think of a dozen others. Should each way get its own set of info models? > So, in summary, we have a single "graph" model. -10. Have fun, Oren Ben-Kiki |
From: Clark C. E. <cc...@cl...> - 2003-10-08 18:27:08
|
Ok. Oren and I spent some serious time on IRC this morning/afternoon fleshing out the models in greater detail. It is hard to express all of the details discussed, but let me try to summarize here. An outline for the new model section: 4.1 Overview - motivating why we do models (as a shared vocabulary) - gives a pretty diagram (Brian's will work) - very quickly outline of the models (syntax -> serial -> graph) - discusses how there are actually two graph models: strong and weak - perhaps gives a "stacked" diagram showing how each model builds onto the previous model, making it more "syntaxish" 4.2 Strong (Native) Graph - quick overview of what this models: native objects or a generic model using canonical forms, or cave drawings - defines a scalar node as an opaque value with a mandatory tag - defines each scalar tag as having a canonical form - defines strong equality of scalars based on canonical form - defines strong equality of collections recursively - explain that this model represents how YAML views native systems or canonical representations of YAML information. 4.3 Weak (Generic) Graph - quick overview discussing how not every native binding will know about every scalar tag, and the problems caused - defines a scalar node as a string value with an optional tag - defines weak equality as yes, no, unknown terinary operator - specifies how YAML texts which comply with the weak graph may fail to load into the strong graph: (a) keys which are unknown in a mapping that are equivalent under strong equality, (b) scalar values which have a specified tag, but where the string provided does not match the tag's expectations, for example --- !int X - explain how this model represents what would be exposed via a random access (DOM-like) viewer API. 4.4 Serial (Tree) Model - quick overview how this model adds only those elements necessary to break a graph (either weak or strong) into a sequential pipe - introduces alias/anchor - introduces key order - introduces scalar "chunking" - mentions that this model specifies the "minimum" requirements for a push/pull parser or emitter API. 4.5 Text (Syntax) Model - quick overview how this model adds all of the YAML goodies that makes human presentation nice - add styles - add comments - add directive - add details (new line endings, indentation, etc.) - mentions that this model specifies the requirements for a lexer API that preserves all aspects of a YAML text. Unlike the current model section, I'd like this pass to not only clarify the "details" of the model; but also provide a resoning for the model as providing for standard "feature sets" and vocabulary for discussing YAML systems. I was thinking of a diagram that is similar to Brian's... ->(parser) -> ->(loader)-> STRONG / \ / \ . / \ / \ . SYNTAX SERIAL GRAPH \ / \ / . \ / \ / . <-(emitter)<- <-(dumper)<- WEAK Where the point is to show that the loader may create a strong or a weak graph, primarly dependent upon if it encounters tags that it does not know about. It will say that "strong" nodes would usually be native objects, where a "weak" node would be a generic "yaml_node" object which is used when an appropriate native type is not found. Then, we would have some sort of "stack" diagram similar to the one presented along time ago... +------------------------ | SYNTAX | (+) style | (+) indentation | +---------------------- | | SERIAL | | (+) alias/anchor | | (+) key ordering | | (+) value chunking | | +-------------------- | | | WEAK | | | (+) string value | | | (+) weaked equality | | | +------------------ | | | | STRONG | | | | * tag | | | | * opaque value | | | | * equality | | | +------------------ | | +-------------------- | +---------------------- +------------------------ So, let me try to answer Brian's assertions: | * A loader/dumper works in conjunction with an application and its | object classes. The code in the object classes is responsible for | marshalling the object in and out of data. right | * A loader/dumper (together with application classes) can marshall | between any type of graph, including both target and canon graphs. right, and it chooses the strong or weak graph primarly dependent upon its knowledge of !tag, weak graphs are for unknown types | * A viewer can "upgrade" a target graph into a canon graph. a strong graph can always be "viewed" as a weak graph; abstractly, this is done by grabing the opaque value, giving it to the node's tag handler and out comes the string representation of the scalar | * There is no named process for going from canon to target. right, as choosing weak or strong graph depends primarly upon how much knowledge the loader has (all of the tags, none of the tags, or only some of the tags). | * A target binding "is a" graph. | * A canon binding "is a" graph. exactly, weak and strong are both graph structures | * We do not define what the general properties of a graph are. in the model we will define these properties in the 'strong' section, and then 'weaken' them by showing how not knowing the !tag information handles the case | * We do define the minimum expected properties of a target binding are. | ie the Least Common Denominator | * We do define exact expected properties of canon binding. | * These two definitions are know as the "Target Model" and the "Canon | Model" respectively. I think so, but I'm brain dead. ;) | * There is no known Target Binding found in nature that meets the exact | criteria of a Canon Binding. The two concepts (models) are therefore | completely separate. This is the issue that emerged in the talk with Oren, I'd like the model to reflect the "real world" case where a YAML text contains two tags, say !int and !!yikes -- the loader would know about one, and could use native integers (strong), but would not be able to find a native binding for !!yikes, so would use a generic yaml stub (weak) for these nodes. The result is a something that is in the "weak" model, where equality is only partially defined (but fully defined for just integers). I hope that the above does an "ok" job of encapsulating this case -- the important part, IMHO, is defining a terinary equality operator to deal with ugly cases like this. | Studying the diagram above, it seems to me that we have three "levels of | processing" and four "models", where model means "something we have | decided to make rules about. | | We have the: | | * Syntax Level | * Serial Level | * Graph Level Sure. Syntax - text model, Serial - tree model, and Graph - either strong or weak model depending on !tag knowledge of the environment. | Under the Syntax and Serial Levels we have decided to define one model | each. Under the Graph Level we have decided (possibly) to define two | models. Yep. Best, Clark |