From: Brian I. <in...@tt...> - 2004-09-01 03:17:09
|
Well not really completely different but... Yes Clark Oren and I had a wicked debate on irc. They have tried to represent me in the follow up posts, but have not really captured the vision of where I want this to go. I'll try to be explicit and blunt about my ideas in this post, although when I do this, things still seem to get misinterpreted. But here goes anyway... I'll just start by making some assertions in no particular order: 1) A YAML *document* cannot by itself contain all the semantics that all applications need for successful processing. Well they could of course, but then it would not be YAML. YAML is nice because it is clean and easy to read. There is just enough information there that a person can read the document and say, yeah I see where this is going. 2) To make a YAML document semantically complete in a given context, requires a thingy we refer to as a "schema". This is a body of code and or text that contains all the hints to add onto the information in the document that completes the YAML Information Model. A schema can be as simple as the instructions in a Python program that read config file: --- name: Brian Ingerson age: 21 and know that age needs to be an integer. 3) A YAML *parser* knows nothing about typing and tagging. It does not cook any data that it parses. It just reports it all to the consumer of the parse events. That consumer must be "schema aware" in order to know what to do with the document. Again, this can be simple or complex. But all the complexity is outside the parser. 4) Implicit typing is all done outside the parser. The parser just passes along enough contextual information so that the consumer can make the right choice. When I say consumer, btw, I am thinking "Loader", so I'll use that term. 5) There are many many many applications where the producer of a yaml document is the sole consumer of it. This includes most serialization. YAML is after all a serialization language primarily. In this case it does not matter what the schema is per se, as long as things round trip adequately. (NYN) 6) There are applications like config files where there is no mechanical producer, just a single application consuming them. The application is usually so specific that the schema is just hardwired into it. 7) There are other applications where documents are produced by one source and diseminated about the universe to be consumed by other processes. This seems to be the major thinking of XML heads (like Oren and Clark). In this case, it makes more sense for the documents to have a link of some kind, to a formal schema language that is readily understood by any potential consumer. 8) There are other cases where YAML is neither produced or consumed by machines. Like in friendly emails to other YAML speaking people. These use cases aren't really important though for this discussion. If the sender messes up the YAML the reader will likely still get it. 9) Tagging nodes in a YAML document is only important because we can't resolve schema ambiguity without them. If we could, I'd drop them like a lead brick. Tags tend to add a lot of noise to YAML. YAML is about eliminating noise. 10) That said, tags can be minimal. They don't need to be globally unique until they are comsumed by the loader. Until they are ready to be judged against a schema to determine what they really mean, So a tag only needs enough info to clearly disambiguate a node of one meaning from a node of another. 11) The prefix systems being discussed currently are used to make tags globally unique in the document. Not only is this overkill, it's not necessary. I also tend to think it's not even desirable. 12) Oren stated that there needs to be at least one piece of information in the document that states how to globally resolve everything. I agree with that notion. There needs to be the concept of a GUID. Global Universal ID. This is the only piece of information needed to resolve everything in the document. 13) A Parser must now report the GUID to the Loader. This is a new concept. Clark would have you believe that a parser cooks info from the GUID into each tag. I say no way. A parser reports the GUID and all the other simple events. Period. 14) Say it with me 3 times. ALL SCHEMA RESOLUTION HAPPENS IN THE LOADER. 15) A document can have 0 or 1 encoded GUIDs. In other words you can specify it or not. If you don't then the parser just fires off a start- document event with an *empty* GUID. This is valid. Why? 16) Many applications don't need the burden of an encoded GUID. The GUID is implied by the application. The serializing program doesn't need it to round trip things. It is just set up that way. The program isn't *sharing* the document with anyone else. The Loader that receives an empty GUID needs to decide for it self which action to take. Many times it is already programmed to not need the GUID. 17) Other Loaders absolutely need a GUID. A GUID is just a key to finding all the missing info that is needed to load and consume the document. If this type of Loader receives no GUID it must raise an exception. This would include loaders that consume YAML syndication feeds etc. The XML-head stuff. 18) We don't need any directives at all. All extra info including YAML version can be found at the other end of the GUID rainbow. NOTE: I could be wrong here. Directives are typically consumed by the parser. At least that was the intent. %YAML %TABWIDTH, etc. I just don't think we need any of these right now. 19) The taguri scheme is an acceptable encoding for GUID. We should probably stick with it. There can be all sorts of different ways that programs discover a schema, given a GUID. Including a global registry. This is nice because it can indicate when two GUIDS are actually compatible. 20) This would allow completely clean YAML. Even when we need to add tag crapola. Here is how dirty it would get: --- %ingy.net,2004/shiznit/% name: ingy birthdate: !date March 25th, 1964 last sneezed: !iso/date 2004/04/01T01:02:03 The tags 'date' and 'iso/date' are simply unique within the document. It isn't until load time that they are expanded/validated against the schema implied by the GUID 'ingy.net,2004/shiznit/'. What does the schema in question look like. Nobody knows for sure at this point. But it is definitely aware of the differences between 'date' and 'iso/date', whatever those differences are. 21) Note that I abandon the directive syntax altogether. That's because I feel that this is the only directive needed. In actuality it make good sense to just use the directive syntax though: --- %GUID:ingy.net,2004/shiznit/ 22) Let me reiterate! The GUID is *not* concatenated onto the tag by the parser. There is NO RESOLUTION IN THE PARSER. 23) Yes the YAML is always ambiguous within the document alone. That's the whole point. We can't fit all the semantic information in the document without ruining YAML so why even try? A YAML document only makes sense at the time it is consumed in a specific context. We can force the strictness using a GUID and a formal schema language if we want/need to. 24) The cool thing is that we can consider any tag in use today as valid. Messy but valid. Something unique to be resolved by a schema aware loader. Just remember the tag isn't really globally unique in the document. It needs information alluded to by a GUID. 25) This make backwards compatibility doable. So we get the best of all worlds. OK I'd better quit for tonight. Very busy schedule this week. Thanks for your time. Cheers, Brian |