From: Brian I. <in...@tt...> - 2004-09-01 03:17:09
|
Well not really completely different but... Yes Clark Oren and I had a wicked debate on irc. They have tried to represent me in the follow up posts, but have not really captured the vision of where I want this to go. I'll try to be explicit and blunt about my ideas in this post, although when I do this, things still seem to get misinterpreted. But here goes anyway... I'll just start by making some assertions in no particular order: 1) A YAML *document* cannot by itself contain all the semantics that all applications need for successful processing. Well they could of course, but then it would not be YAML. YAML is nice because it is clean and easy to read. There is just enough information there that a person can read the document and say, yeah I see where this is going. 2) To make a YAML document semantically complete in a given context, requires a thingy we refer to as a "schema". This is a body of code and or text that contains all the hints to add onto the information in the document that completes the YAML Information Model. A schema can be as simple as the instructions in a Python program that read config file: --- name: Brian Ingerson age: 21 and know that age needs to be an integer. 3) A YAML *parser* knows nothing about typing and tagging. It does not cook any data that it parses. It just reports it all to the consumer of the parse events. That consumer must be "schema aware" in order to know what to do with the document. Again, this can be simple or complex. But all the complexity is outside the parser. 4) Implicit typing is all done outside the parser. The parser just passes along enough contextual information so that the consumer can make the right choice. When I say consumer, btw, I am thinking "Loader", so I'll use that term. 5) There are many many many applications where the producer of a yaml document is the sole consumer of it. This includes most serialization. YAML is after all a serialization language primarily. In this case it does not matter what the schema is per se, as long as things round trip adequately. (NYN) 6) There are applications like config files where there is no mechanical producer, just a single application consuming them. The application is usually so specific that the schema is just hardwired into it. 7) There are other applications where documents are produced by one source and diseminated about the universe to be consumed by other processes. This seems to be the major thinking of XML heads (like Oren and Clark). In this case, it makes more sense for the documents to have a link of some kind, to a formal schema language that is readily understood by any potential consumer. 8) There are other cases where YAML is neither produced or consumed by machines. Like in friendly emails to other YAML speaking people. These use cases aren't really important though for this discussion. If the sender messes up the YAML the reader will likely still get it. 9) Tagging nodes in a YAML document is only important because we can't resolve schema ambiguity without them. If we could, I'd drop them like a lead brick. Tags tend to add a lot of noise to YAML. YAML is about eliminating noise. 10) That said, tags can be minimal. They don't need to be globally unique until they are comsumed by the loader. Until they are ready to be judged against a schema to determine what they really mean, So a tag only needs enough info to clearly disambiguate a node of one meaning from a node of another. 11) The prefix systems being discussed currently are used to make tags globally unique in the document. Not only is this overkill, it's not necessary. I also tend to think it's not even desirable. 12) Oren stated that there needs to be at least one piece of information in the document that states how to globally resolve everything. I agree with that notion. There needs to be the concept of a GUID. Global Universal ID. This is the only piece of information needed to resolve everything in the document. 13) A Parser must now report the GUID to the Loader. This is a new concept. Clark would have you believe that a parser cooks info from the GUID into each tag. I say no way. A parser reports the GUID and all the other simple events. Period. 14) Say it with me 3 times. ALL SCHEMA RESOLUTION HAPPENS IN THE LOADER. 15) A document can have 0 or 1 encoded GUIDs. In other words you can specify it or not. If you don't then the parser just fires off a start- document event with an *empty* GUID. This is valid. Why? 16) Many applications don't need the burden of an encoded GUID. The GUID is implied by the application. The serializing program doesn't need it to round trip things. It is just set up that way. The program isn't *sharing* the document with anyone else. The Loader that receives an empty GUID needs to decide for it self which action to take. Many times it is already programmed to not need the GUID. 17) Other Loaders absolutely need a GUID. A GUID is just a key to finding all the missing info that is needed to load and consume the document. If this type of Loader receives no GUID it must raise an exception. This would include loaders that consume YAML syndication feeds etc. The XML-head stuff. 18) We don't need any directives at all. All extra info including YAML version can be found at the other end of the GUID rainbow. NOTE: I could be wrong here. Directives are typically consumed by the parser. At least that was the intent. %YAML %TABWIDTH, etc. I just don't think we need any of these right now. 19) The taguri scheme is an acceptable encoding for GUID. We should probably stick with it. There can be all sorts of different ways that programs discover a schema, given a GUID. Including a global registry. This is nice because it can indicate when two GUIDS are actually compatible. 20) This would allow completely clean YAML. Even when we need to add tag crapola. Here is how dirty it would get: --- %ingy.net,2004/shiznit/% name: ingy birthdate: !date March 25th, 1964 last sneezed: !iso/date 2004/04/01T01:02:03 The tags 'date' and 'iso/date' are simply unique within the document. It isn't until load time that they are expanded/validated against the schema implied by the GUID 'ingy.net,2004/shiznit/'. What does the schema in question look like. Nobody knows for sure at this point. But it is definitely aware of the differences between 'date' and 'iso/date', whatever those differences are. 21) Note that I abandon the directive syntax altogether. That's because I feel that this is the only directive needed. In actuality it make good sense to just use the directive syntax though: --- %GUID:ingy.net,2004/shiznit/ 22) Let me reiterate! The GUID is *not* concatenated onto the tag by the parser. There is NO RESOLUTION IN THE PARSER. 23) Yes the YAML is always ambiguous within the document alone. That's the whole point. We can't fit all the semantic information in the document without ruining YAML so why even try? A YAML document only makes sense at the time it is consumed in a specific context. We can force the strictness using a GUID and a formal schema language if we want/need to. 24) The cool thing is that we can consider any tag in use today as valid. Messy but valid. Something unique to be resolved by a schema aware loader. Just remember the tag isn't really globally unique in the document. It needs information alluded to by a GUID. 25) This make backwards compatibility doable. So we get the best of all worlds. OK I'd better quit for tonight. Very busy schedule this week. Thanks for your time. Cheers, Brian |
From: Clark C. E. <cc...@cl...> - 2004-09-01 04:21:54
|
There are several concepts masking around as 'schema', often when I use schema, I refer to it as a set of types, each type identified by a tag. Let's call this a tagset, or namespace. Some tagsets, like YATL, may have tags, which allow children with tags from other tagsets, without knowing what those tagsets are or what the meaning of tags in those tagsets have. A transformation language is a good example of such a use case. On Tue, Aug 31, 2004 at 08:10:47PM -0700, Brian Ingerson wrote: | 10) That said, tags can be minimal. They don't need to be globally | unique until they are comsumed by the loader. Until they are ready | to be judged against a schema to determine what they really mean, So | a tag only needs enough info to clearly disambiguate a node of one | meaning from a node of another. [in the context of a particular schema]. What happens if you have a document that takes nodes from two differnet tagsets? How are you going to automate the process or merging them, or does the burden of mingling tags from different sets fall onto the application developer? This doesn't work with my use case. | 13) A Parser must now report the GUID to the Loader. This is a new | concept. Clark would have you believe that a parser cooks info from | the GUID into each tag. I say no way. A parser reports the GUID and | all the other simple events. Period. You are trying to find a single GUID that represents the schema for the document? Where 'schema' here means complete knowledge of how one would "find" all other tags... without further use of GUIDs? | 16) Many applications don't need the burden of an encoded GUID. The GUID | is implied by the application. The serializing program doesn't need | it to round trip things. It is just set up that way. The program | isn't *sharing* the document with anyone else. The Loader that | receives an empty GUID needs to decide for it self which action to | take. Many times it is already programmed to not need the GUID. | | 17) Other Loaders absolutely need a GUID. A GUID is just a key to | finding all the missing info that is needed to load and consume the | document. If this type of Loader receives no GUID it must raise an | exception. This would include loaders that consume YAML syndication | feeds etc. The XML-head stuff. Ok. | 18) We don't need any directives at all. All extra info including YAML | version can be found at the other end of the GUID rainbow. | | 19) The taguri scheme is an acceptable encoding for GUID. We should | probably stick with it. | | There can be all sorts of different ways that programs discover a schema, | given a GUID. Including a global registry. This is nice because it can | indicate when two GUIDS are actually compatible. Ok. | 20) This would allow completely clean YAML. Even when we need to add tag | crapola. Here is how dirty it would get: | | --- %ingy.net,2004/shiznit/% | name: ingy | birthdate: !date March 25th, 1964 | last sneezed: !iso/date 2004/04/01T01:02:03 | | The tags 'date' and 'iso/date' are simply unique within the document. It | isn't until load time that they are expanded/validated against the schema | implied by the GUID 'ingy.net,2004/shiznit/'. | | What does the schema in question look like. Nobody knows for sure at this | point. But it is definitely aware of the differences between 'date' and | 'iso/date', whatever those differences are. What happens if you want to mix data types from two different tagsets, do you need to "mint" a new GUID that then knows about both of these tagsets? This doesn't help my use case. | 21) Note that I abandon the directive syntax altogether. That's because | I feel that this is the only directive needed. In actuality it make | good sense to just use the directive syntax though: | | --- %GUID:ingy.net,2004/shiznit | | 22) Let me reiterate! The GUID is *not* concatenated onto the tag by the | parser. There is NO RESOLUTION IN THE PARSER. Ok. So, you've introduced a handle by which one could find one or more appopriate schemas (perhaps in different dialets). This isn't a bad idea. But it isn't sufficient. I may have a %GUID:yaml.org,2005:yatl but an instance of my YATL schema may have nodes from another schema, say 'okay/news' that it tests for equality with. I wouldn't want to have my yatl schema know about okay/news, but yet, somehow I need to _find_ the schema for okay/news to make sure that it matches some other GUID in some other document. | 23) Yes the YAML is always ambiguous within the document alone. That's | the whole point. We can't fit all the semantic information in the | document without ruining YAML so why even try? A YAML document only | makes sense at the time it is consumed in a specific context. We can | force the strictness using a GUID and a formal schema language if we | want/need to. Beacuse there is quite a bit you can do _without_ full semantic info. | 24) The cool thing is that we can consider any tag in use today as | valid. Messy but valid. Something unique to be resolved by a schema | aware loader. Just remember the tag isn't really globally unique in the | document. It needs information alluded to by a GUID. Well, we are going to have to agree to disagree here. | 25) This make backwards compatibility doable. So we get the best of | all worlds. | | OK I'd better quit for tonight. Very busy schedule this week. Thanks for | your time. Likewise... behind schedule. ;( Clark |
From: Oren Ben-K. <or...@be...> - 2004-09-01 07:19:32
|
OK, long answer... Sorry, I didn't have the time to make it shorter... Summary: I think the compromise (suggested by Clark) that I posted is the b= est=20 solution to the problem. It addresses all(most) Brian's points. The only disagreement left is narrowed to a single question. Is a single GU= ID=20 plus schema-specific "tag globalization" enough, or should there be a=20 standard purely syntactical mechanism for "tag globalization"? With this in= =20 mind... There are two different steps required in making a document be "completely= =20 known". =2D Step #1: For each node, compute a unique id identifying its type. The s= pec=20 calls this "tag resolution". Hence I use "tag globalization" for the act of= =20 converting a "!stuff" into a GUID. Less confusion. =2D Step #2: Given you know the id of the type of the node, map it to the=20 appropriate data structure/behavior/cave drawing. The spec calls this=20 "constructing". Both these steps are determined by the "schema". Not the parser job. Fine. The question remains: what is the _input_ to step #1? A. GUID tags for some subset of the nodes (current spec, Clark, myself)... = or B. A single (possibly implicit) GUID identifying the schema plus arbitrary= =20 schema-specific tag-globalization mechanism per node (Brian, Onoma). Both mechanisms work. However, they have very different trade-offs. Approach (B) allows for much terser "hints" with the absolute minimum of=20 "noise" (as Brian aptly put it). There's no problem mix-and-matching a=20 zillion "tag sets" or "name spaces" using whatever the specific schema=20 fancies. But, in this approach, there's no standard way to designate different tag=20 sets, or for that matter anything at all. Each schema writer can - no, _mus= t_=20 =2D come up with his own way of deciding how to "globalize" tags. =46or example, the schema could say you should compute the 32-bit CRC of th= e=20 node's hint, XOR with the MD5 of the whole document, substract the phase of= =20 the moon and use that to look the node's unique id's tag in a database. So far so good. It is assumed that "reasonable" people will use "reasonable= "=20 ways to designate types. If they don't, well, their problem - nobody forces= =20 you to use their schema, after all. Right? Wrong. As long as the schema is "pure", you can use a simple word for each= =20 tag. But when you want to mix "spaces", you have several very reasonable=20 choices. Note this is not an esoteric use case. 99% of the documents will mix types= =20 from the "common YAML type repository" and application specific types. At a= ny=20 rate, given you mix "spaces", several "reasonable" options are: =2D Use prefixes. Use ':' or '/' or '-' or '|' or '+' or whatevere to seper= ate=20 the prefix from the suffix. Note that the prefixes are defined by the schem= a,=20 not in directives. =2D Use nothing. Choose your type tags in each "space" so there's no collis= ion.=20 After all, you control the full set of types used in your schema, right? =2D Be case sensitive. =2D Be case insensitive. Now, imagine a world where some YAML documents use prefixes, some don't. So= me=20 use ':', some use '/'. Some use case-sensitive tag names (UNIX people). Som= e=20 don't (Windows people). You are now asked to write a new schema that should combine both Alice's=20 graphics objects schema and Bob's time-tracking schema, for representing=20 graphical time sheet data. Both these schemas make heavy use of tags to=20 identify elements (e.g., graphic objects are notoriously polymorphic and=20 their type can't always be deduced from their structure - an ellipse and a= =20 rectangle have the same properties). Alice uses case-insensitive tags with prefixes for "spaces", '/' to seperat= e=20 prefix from suffix, so it calls integers "!Yaml/Int". Bob uses no prefixes= =20 without prefixes and calls integers "!int". Both reasonable. What are _you_ going to do? =2D Say that graphic data chunks use Alice's notation and time sheet chunks= use=20 Bob's notation. Cut & paste works. And you don't have to write conversion=20 code! Everyone wins, except that new users of your system, who know nothing= =20 about Bob and Alice, are raising hell. "!Yaml/Int on even lines and !int on= =20 odd lines? Who let this guy design schemas, anyway?" =2D Define a new tagging scheme for your schema. Say, case-sensitive using = ':'.=20 Your integers are "!yaml:int". When imprting/exporting a chunk of graphics = or=20 time sheet data to Alice's and Bob's systems, convert all the tags back and= =20 forth. Your files are the most readable imaginable and interoperability is= =20 achieved. You finish debugging the conversion tables and install the system= =2E=20 The ungrateful users complain! "I can't read your files, all the tag names= =20 are different from what I'm used to". "Why can't I just cut & paste from my= =20 graphical theme file into your system using notepad?" "Wasn't YAML about=20 interoperability?" Arrgghh!!! =2D You curse YAML and all its ancestors. You heard there's an old technolo= gy=20 called XML that allows you to use a single, unified prefixing mechanism.=20 Sure, it has some warts, and you have to declare these prefixes at the=20 header, but at this point, anything looks good... :-) What would be a good solution for that? It should: =2D Have zero cost for people who couldn't care less about this problem. Fo= r=20 example, config file writers. Hence, no (mandatory) headers/directives. =2D It should allow everyone to define his own interoperable schema and sti= ll=20 have readable files. Hence, no central registry of short namespace ids with= =20 squatters and politics and so on. =2D It should minimize the pain of cut&paste between Bob's and Alice's sche= mas.=20 Hence, a unified mechanism for identifying "spaces" and "types" in spaces=20 rather than schema-specific tricks. The compromise Clark and I suggested becomes more-or-less inevitable, then: =2D You just want to write a config file and could care less about all this= =20 XML-head mix-and-match mumbo-jumbo? Fine. Write your tags any way you like = =2D=20 EXACTLY what Brian described in his post. Just please don't use tags that=20 start with "xxx:". That's all we ask. Hardly a sacrifice. =2D You are writing a schema that is meant to play nice with others (be emb= edded=20 in other, yet unwritten data types and applications)? Fine. Please mint a=20 taguri for it. Use prefixes to minimize the user's pain. I really don't see the problem. Everobody gets what they want! A quick pass through Brian's points: > 1) A YAML *document* cannot by itself contain all the semantics that all >=A0 =A0applications need for successful processing. +1 > 2) To make a YAML document semantically complete in a given context, > =A0 =A0requires a thingy we refer to as a "schema". +1 > 3) A YAML *parser* knows nothing about typing and tagging. It does not > =A0 =A0cook any data that it parses. It just reports it all to the consum= er > =A0 =A0of the parse events. +0.5. A parser "cooks" text styles. It strips away and indentation. It expa= nd=20 escape sequences. It doesn't know what the text means, but it "cooks" it=20 anyway. Similarly, having it concatenate two strings together in tags does= =20 _not_ imply the parser "knows" what a tag means. > 4) Implicit typing is all done outside the parser. +1 > 5) There are many many many applications where the producer of a yaml > =A0 =A0document is the sole consumer of it. +1. Just don't use 'xxx:' in your tags, please. > 6) There are applications like config files where there is no mechanical >=A0 =A0producer, just a single application consuming them. The application >=A0 =A0is usually so specific that the schema is just hardwired into it. +1. See point #5. You probably don't use tags at all, actually. > 7) There are other applications where documents are produced by one >=A0 =A0source and diseminated about the universe to be consumed by other >=A0 =A0processes. This seems to be the major thinking of XML heads (like >=A0 =A0Oren and Clark). In this case, it makes more sense for the documents >=A0 =A0to have a link of some kind, to a formal schema language that is >=A0 =A0readily understood by any potential consumer. +1. Use taguri tags to provide GUID wherever it is called for (in 90% of th= e=20 documents, just in the root node; if mix-and-matching, you also need GUIDs= =20 scattered through the data). Use %tag directives to minimize the pain of=20 people reading and writing the document. > 8) There are other cases where YAML is neither produced or consumed by >=A0 =A0machines. Like in friendly emails to other YAML speaking people. >=A0 =A0These use cases aren't really important though for this discussion. >=A0 =A0If the sender messes up the YAML the reader will likely still get i= t. +1. > 9) Tagging nodes in a YAML document is only important because we can't >=A0 =A0resolve schema ambiguity without them. If we could, I'd drop them l= ike >=A0 =A0a lead brick.=A0Tags tend to add a lot of noise to YAML. YAML is ab= out=20 >=A0 =A0eliminating noise. +10. See point #6. No tags at all! > 10) That said, tags can be minimal. They don't need to be globally >=A0 =A0 unique until they are comsumed by the loader. +1/-1. +1 for config files etc. (points #5/#6). -1 for interoperable schema= s=20 that are dcesigned to be mix-and-match (point#7). Hence, the dual mechanism= =20 Clark proposed. > 11) The prefix systems being discussed currently are used to make tags >=A0 =A0 globally unique in the document. Not only is this overkill, it's n= ot >=A0 =A0 necessary. I also tend to think it's not even desirable. +1/-1. See previous point. > 12) Oren stated that there needs to be at least one piece of information > =A0 =A0 in the document that states how to globally resolve everything. I > =A0 =A0 agree with that notion. There needs to be the concept of a GUID. > =A0 =A0 Global Universal ID. This is the only piece of information needed= to > =A0 =A0 resolve everything in the document.=20 +0.5 - when mix-and-matching, things get icky fast unless you can specify=20 several GUIDs. At any rate this is only applicable to "interoperable" schem= as=20 (point #7). > 13) A Parser must now report the GUID to the Loader. +1. > This is a new concept. =2D1. In the current spec, the "GUID" is the explicit tag of the root node.= =20 Which is reported by the parser to the loader. And yes, its optional :-) > 14) Say it with me 3 times. ALL SCHEMA RESOLUTION HAPPENS IN THE LOADER. +1. If "resolution" means "tag resolution" as defined in then spec :-) =2D1. If "resolution" means "any processing whatsoever that happens to tags= ,=20 including purely textual operations such as expanding escape sequences,=20 concatenating prefixes, removing surrounding white space, and so on". =2D1. If "resolution" actually means "tag globalization". Why? Because of t= hese=20 pesty "XML-head" needs as I explained above. > 15) A document can have 0 or 1 encoded GUIDs. In other words you can > =A0 =A0 specify it or not. +0.5. That's exactly the point of the dual mechanism Clark proposed. Use %t= ag=20 to give a GUID. Don't use %tag and don't give a GUID. Only +0.5 because, fo= r=20 XML-head reasons, we'd like to be able to use more than one GUID. But this = is=20 only relevant tro point #7. People using #5/#6 don't know and don't care. > 16) Many applications don't need the burden of an encoded GUID. The GUID >=A0 =A0 is implied by the application. +1. This is a for #5/#6. > 17) Other Loaders absolutely need a GUID. +1. This is for #7. > 18) We don't need any directives at all. All extra info including YAML > =A0 =A0 version can be found at the other end of the GUID rainbow.=20 =2D1. #5/#6 files have no GUID, so you can't deduce the %YAML version from = it! #7 files need a GUID. That's a directive. Sure you could say the %YAML depe= nds=20 on it but that's a hack. And potentially limiting. > NOTE: I could be wrong here. Directives are typically consumed by the > parser. At least that was the intent. %YAML %TABWIDTH, etc. I just don't > think we need any of these right now. We need %YAML for _later_. Hardly a burden, and optional anyway. I agree ab= out=20 %TABWIDTH :-) > 19) The taguri scheme is an acceptable encoding for GUID. We should proba= bly > stick with it. +1 > 20) This would allow completely clean YAML. Even when we need to add tag >=A0 =A0 crapola. Here is how dirty it would get: ... =2D1. While its true that limiting the % to a single GUID cleans things up = a bit=20 for people who do use #7, it is insufficient for their needs, as I tried to= =20 show above. > 21) Note that I abandon the directive syntax altogether. =2D1. Yes, you get some simplification. But I'd like to hand on to %YAML as= a=20 "safety valve" (YAGNI or no YAGNI), and as for having a single GUID (rather= =20 than several), I've covered that above. > 22) Let me reiterate! The GUID is *not* concatenated onto the tag by the >=A0 =A0 parser. There is NO RESOLUTION IN THE PARSER. I assume s/RESOLUTION/GLOBALIZATION/. =2D1. I fail to see why that's a big deal. This is a purely syntactical=20 operation, just like concatenating folded text lines. And the _only_ reason= =20 we want it this way is to solve the "graphical time sheet problem" above. > 23) Yes the YAML is always ambiguous within the document alone. That's >=A0 =A0 the whole point. We can't fit all the semantic information in the >=A0 =A0 document without ruining YAML so why even try? +1. The discussion is - what is the minimal input required to be given in t= he=20 document itself. A single GUID isn't quite enough for #7 schemas (I wish it= =20 were, but it just isn't). > 24) The cool thing is that we can consider any tag in use today as >=A0 =A0 valid. Messy but valid. +1 - in documents that are #5/#6. But there you hardly use tags anyway. =2D1 - in documents that are #7. A pity, really, but there's no helping it. > 25) This make backwards compatibility doable. So we get the best of >=A0 =A0 all worlds. +1/-1. Same as #24. I wish we could be more backward compatible... OK, much too long, but I'm late for work as it is. I think it covers=20 everything, though (at this length it had better). Have fun, Oren Ben-Kiki |
From: T. O. <tra...@ru...> - 2004-09-01 12:32:15
|
Thought it couldn't hurt to put together a few examples of what the possibilities offer. These example are real uses cases, albeit they are simple demos for a developing app. (If your interested it is called ArtML and is nearly usable) Here is the sample as it stands now: --- !artml.rubyforge.org,2004/art | hello_world_3 +--------------------------------+ | hello_text | +--------------------------------+ | "Your Name:" [your_name] | +--------------------------------+ --- !artml.rubyforge.org,2004/meta title: "Hello World Example" --- !artml.rubyforge.org,2004/style your_name: font_size: 12pt --- !artml.rubyforge.org,2004/data hello_text: value: !hobix.com,2004/redcloth | h2. This is a test! your_name: value: "Put your name here." hello_world_3: action: index.html This is the above example using Clark's and Oren's proposal: --- %tag:artml.rubyforge.org,2004:|artml !artml:art | hello_world_3 +--------------------------------+ | hello_text | +--------------------------------+ | "Your Name:" [your_name] | +--------------------------------+ --- %tag:artml.rubyforge.org,2004:|artml !artml:meta | title: "Hello World Example" --- %tag:artml.rubyforge.org,2004:|artml !artml:style | your_name: font_size: 12pt --- %tag:artml.rubyforge.org,2004:|artml %tag:hobix.com,2004:|hobix !artml:data | hello_text: value: !hobix:redcloth | h2. This is a test! your_name: value: "Put your name here." hello_world_3: action: index.html This is it under Brian's and MHO (I will use space rather then the more omninous GUID) : --- %space:artml.rubyforge.org,2004 !art | hello_world_3 +--------------------------------+ | hello_text | +--------------------------------+ | "Your Name:" [your_name] | +--------------------------------+ --- %space:artml.rubyforge.org,2004 !meta | title: "Hello World Example" --- %space:artml.rubyforge.org,2004 !style | your_name: font_size: 12pt --- %space:artml.rubyforge.org,2004 !data | hello_text: value: !redcloth h2. This is a test! your_name: value: "Put your name here." hello_world_3: action: index.html Presently there is no schema, so in all three cases these domains and tags are manually registered in consuming application. Eventually we might get a schema and we might actually be able to use it to read "external registeration" info (from the internet or locally) with very little manual intervention. In that case we can add a schema doc to Oren and MHO example that might look something like this: --- !yaml/schema name: artml.rubyforge.org,2004 tags: art: artml.rubyforge.org,2004/art meta: artml.rubyforge.org,2004/meta style: artml.rubyforge.org,2004/style data: artml.rubyforge.org,2004/data redcoth: hobix.com,2004/redcloth Presumably, Oren's and Clark's wouldn't need this, at least not this tags portion of the schema, b/c this is hard-coded into their doc itself. Any modifications needed to any of these? Thoughts? -- T. |
From: Clark C. E. <cc...@cl...> - 2004-09-01 16:23:31
|
On Wed, Sep 01, 2004 at 10:19:23AM +0300, Oren Ben-Kiki wrote: | The only disagreement left is narrowed to a single question. Is a | single GUID plus schema-specific "tag globalization" enough, or should | there be a standard purely syntactical mechanism for "tag globalization"? This is exactly the question. Also, I like breaking the process into tag 'globalization' and tag 'resolution'. The former part converts a local tag into one that is globally unique, and the latter finds an appropriate native type. Clearly 'resolution' happens in the loader. The status quo, is that GUID tags are used for some subset of the nodes in a document. We currently have a shortcuts and ^ to help the YAML user with this process. We could replace shortcuts and ^ with a %tag directive. But the intent is the same, tags for each node can be private, but may also be globally unique. The proposal by Brian and Onoma is to only have at most one global identifier per document, and then all other tags are 'globalized' through an external mechanism. This allows for terse "hints" to minimize noise; mixing-and-matching is delegated to a tag-remapping procedure, in effect a limited sort of transformation. My position is that the 'status quo' should remain possible, but that people who don't need globalization shouldn't have to pay for it. So, make Brian/Onoma's proposal easier, but still make sure that tag globalization can occur at the syntax level if one wishes. ... | [I]magine a world where some YAML documents use prefixes, some | don't. Some use ':', some use '/'. Some use case-sensitive tag names | (UNIX people). Some don't (Windows people). | | You are now asked to write a new schema that should combine both Alice's | graphics objects schema and Bob's time-tracking schema, for representing | graphical time sheet data. Both these schemas make heavy use of tags to | identify elements (e.g., graphic objects are notoriously polymorphic and | their type can't always be deduced from their structure - an ellipse and | a rectangle have the same properties). | | Alice uses case-insensitive tags with prefixes for "spaces", '/' to | seperate prefix from suffix, so it calls integers "!Yaml/Int". Bob uses | no prefixes without prefixes and calls integers "!int". Both reasonable. Good use case for 'mixing' namespaces, which is the core issue. | What would be a good solution for that? It should: | | - Have zero cost for people who couldn't care less about this problem. | For example, config file writers. Hence, no (mandatory) headers/directives. | | - It should allow everyone to define his own interoperable schema and | still have readable files. Hence, no central registry of short namespace | ids with squatters and politics and so on. | | - It should minimize the pain of cut&paste between Bob's and Alice's | schemas. Hence, a unified mechanism for identifying "spaces" and "types" | in spaces rather than schema-specific tricks. | | The compromise Clark and I suggested becomes more-or-less inevitable: | | - You just want to write a config file and could care less about all | this XML-head mix-and-match mumbo-jumbo? Fine. Write your tags any way | you like - EXACTLY what Brian described in his post. Just please don't | use tags that start with "xxx:". That's all we ask. Hardly a sacrifice. | | - You are writing a schema that is meant to play nice with others (be | embedded in other, yet unwritten data types and applications)? Fine. | Please mint a taguri for it. Use prefixes to minimize the user's pain. | | I really don't see the problem. Everobody gets what they want! |
From: Brian I. <in...@tt...> - 2004-09-01 08:55:43
|
On 01/09/04 10:19 +0300, Oren Ben-Kiki wrote: > OK, long answer... Sorry, I didn't have the time to make it shorter... > > Summary: I think the compromise (suggested by Clark) that I posted > is the best solution to the problem. It addresses all(most) > Brian's points. I'm not buying it. Looks like another irc session is in order. Prepare thine defenses... :P Cheerzzzzzz, Brian |
From: Clark C. E. <cc...@cl...> - 2004-09-01 14:34:19
|
T. Onoma writes: Ok, so with Brian's proposal, you would change a document that used two namespaces from, | --- !artml.rubyforge.org,2004/data | hello_text: | value: !hobix.com,2004/redcloth | | h2. This is a test! | your_name: | value: "Put your name here." | hello_world_3: | action: index.html to | --- %space:artml.rubyforge.org,2004 !data | | hello_text: | value: !redcloth | h2. This is a test! | your_name: | value: "Put your name here." | hello_world_3: | action: index.html If this is the case, why is there a problem with the current specification? Just write, --- !artml.rubyforge.org,2004/^data hello_text: value: !^redcloth | h2. This is a test! your_name: value: "Put your name here." hello_world_3: action: index.html Problem solved. They are logically equivalent. If you want a 'schema', just use the tag mechanism to mark the root node and then bind all of your tags to what ever sub-tags you delegate to in your 'schema' specific loading process, | --- !yaml/schema | name: artml.rubyforge.org,2004 | tags: | art: artml.rubyforge.org,2004/art | meta: artml.rubyforge.org,2004/meta | style: artml.rubyforge.org,2004/style | data: artml.rubyforge.org,2004/data | redcoth: hobix.com,2004/redcloth So... we actually _don't_ need a change to the spec afterall? Cheers! Clark -- Clark C. Evans Prometheus Research, LLC. http://www.prometheusresearch.com/ o office: +1.203.777.2550 ~/ , mobile: +1.203.444.0557 // (( Prometheus Research: Transforming Data Into Knowledge \\ , \/ - Research Exchange Database /\ - Survey & Assessment Technologies ` \ - Software Tools for Researchers ~ * |
From: T. O. <tra...@ru...> - 2004-09-01 15:08:20
|
On Wednesday 01 September 2004 10:34 am, you wrote: > > --- !artml.rubyforge.org,2004/^data > hello_text: > value: !^redcloth | > h2. This is a test! > your_name: > value: "Put your name here." > hello_world_3: > action: index.html > > Problem solved. They are logically equivalent. If you want a 'schema', > just use the tag mechanism to mark the root node and then bind all of > your tags to what ever sub-tags you delegate to in your 'schema' > specific loading process, [NOTE: clarify the above (see below about being "thrown off") ] > [snip] > > So... we actually _don't_ need a change to the spec afterall? I'll be damed! Full circle. It's basically the same spec only from a different *perspective*. Unless I'm missing something, (Brian?) that should work. So let me see if I have this straight. The domain on the root node can serve as the "GUID", then all other tags can ^ in ref. to that. I like that. I think eventually, if/when schemas do become the norm, some opinions might change, and full domain tags within the doc itself will be outlawed (i.e. only the root node will have it). But in the mean time it doesn't really matter; being able to put full domains through out suits you and Oren, yes? One question. What if the root node dosen't have an explicit type? i.e. just the implicit yaml. Doing this okay? -- !artml.rubyforge.org,2004/^ # ... This "new perspective" may still change some other things though. Private tags? The other shothands? How will implementors be effected? I have to admit, surprised but delighted! I hope I'm not misunderstanding. Your "tag mechanism" comment throws me off a bit, you're not proposing we add %tag: on top the current spec too, are you? Other then that... Looks Cool! T. -- T. |
From: Clark C. E. <cc...@cl...> - 2004-09-01 15:35:40
|
On Wed, Sep 01, 2004 at 11:08:11AM -0400, T. Onoma wrote: | I'll be damed! Full circle. It's basically the same spec only from a | different *perspective*. Unless I'm missing something, (Brian?) that | should work. The intent of the !tag mechanism was to provide unique handles that can be used to drive processing, e.g., locate handles, etc. In this proposal, one simply 'layers' on a schema system that retags stuff. | So let me see if I have this straight. The domain on the root node can | serve as the "GUID", then all other tags can ^ in ref. to that. I like | that. Yes. | I think eventually, if/when schemas do become the norm, some opinions might | change, and full domain tags within the doc itself will be outlawed (i.e. | only the root node will have it). I doubt it. For those that want UGLY YAML that directly reflects what is stored in the representataion model, !tag on every node is required. | One question. What if the root node dosen't have an explicit type? | i.e. just the implicit yaml. Doing this okay? | | -- !artml.rubyforge.org,2004/^ | # ... Should be legal. | This "new perspective" may still change some other things though. Private | tags? The other shothands? How will implementors be effected? Well, if the spec stays the same, shorthands and private tags remain as-is; current understanding may be a bit better, but the spec is unchanged. | I have to admit, surprised but delighted! Good. | I hope I'm not misunderstanding. Your "tag mechanism" comment throws | me off a bit, you're not proposing we add %tag: on top the current | spec too, are you? See my other thread. If we added %tag, we'd drop the shorthands and your document would be _all_ private types unless you had a %tag directive. Bings, Clark -- Clark C. Evans Prometheus Research, LLC. http://www.prometheusresearch.com/ o office: +1.203.777.2550 ~/ , mobile: +1.203.444.0557 // (( Prometheus Research: Transforming Data Into Knowledge \\ , \/ - Research Exchange Database /\ - Survey & Assessment Technologies ` \ - Software Tools for Researchers ~ * |
From: Sean O'D. <se...@ce...> - 2004-09-01 17:25:17
|
On Tuesday 31 August 2004 20:10, Brian Ingerson wrote: <snip> /agree with 1-10 > 11) The prefix systems being discussed currently are used to make tags > globally unique in the document. Not only is this overkill, it's not > necessary. I also tend to think it's not even desirable. <snip> > 15) A document can have 0 or 1 encoded GUIDs. In other words you can > specify it or not. If you don't then the parser just fires off a start- > document event with an *empty* GUID. This is valid. Why? Well, we will need more than one namespace per document. A single document can use more than one namespace. GUID is the unique ID of a namespace, I assume? We'll need more than one of those per document. > 19) The taguri scheme is an acceptable encoding for GUID. We should > probably stick with it. > > There can be all sorts of different ways that programs discover a schema, > given a GUID. Including a global registry. This is nice because it can > indicate when two GUIDS are actually compatible. How is the loader going to locate external schemas with just a taguri? If taguri is used instead of URLs, there's going to have to be, somewhere, a lookup table that associates a taguri with a resource location. That seems a bit troublesome when you could very easily just use URLs as your unique identifiers. What alternative is there with taguri other than a lookup table? > 21) Note that I abandon the directive syntax altogether. That's because > I feel that this is the only directive needed. In actuality it make > good sense to just use the directive syntax though: > > --- %GUID:ingy.net,2004/shiznit/ So, how would programmers hook their own private types into the loader? It's a lot of hassle to have to develop a schema just for a quickie private type. I use private types all the time, and that would be a major pain to not be able to create my own namespace for the purpose of creating my own private types. I would definitely markup my YAML with !mytype and then want to handle them loading as custom types; how would your proposed way prevent conflict with the schema of a document? I think we need multiple namespaces and possibly multiple schemas, so I think there needs to be some sort of directive to separate them up. Going without directives pretty much drops YAML to the point of, I feel, making not so much simple as primitive. Sean O'Dell |
From: Oren Ben-K. <or...@be...> - 2004-09-01 19:13:33
|
On Wednesday 01 September 2004 20:25, Sean O'Dell wrote: > How is the loader going to locate external schemas with just a taguri? If > taguri is used instead of URLs, there's going to have to be, somewhere, a > lookup table that associates a taguri with a resource location. That seems > a bit troublesome when you could very easily just use URLs as your unique > identifiers. What alternative is there with taguri other than a lookup > table? Too many to list. Going to http://yaml-taguri-service.org/tag=<the-taguri>, for example. There - you have your URL if you want it. > I think we need multiple namespaces and possibly multiple schemas, so I > think there needs to be some sort of directive to separate them up. Going > without directives pretty much drops YAML to the point of, I feel, making > not so much simple as primitive. I think so too, but realistically you can't force all the world to work this way. That's why we have "!!tag" in the spec (or any non-"xxx:" tag in the %tag proposal). Have fun, Oren Ben-Kiki |
From: Brian I. <in...@tt...> - 2004-09-01 19:22:04
|
On 01/09/04 10:25 -0700, Sean O'Dell wrote: > On Tuesday 31 August 2004 20:10, Brian Ingerson wrote: > > <snip> > > /agree with 1-10 > > > 11) The prefix systems being discussed currently are used to make tags > > globally unique in the document. Not only is this overkill, it's not > > necessary. I also tend to think it's not even desirable. > <snip> > > 15) A document can have 0 or 1 encoded GUIDs. In other words you can > > specify it or not. If you don't then the parser just fires off a start- > > document event with an *empty* GUID. This is valid. Why? > > Well, we will need more than one namespace per document. A single document > can use more than one namespace. GUID is the unique ID of a namespace, I > assume? We'll need more than one of those per document. No. GUID is the unique id of a *schema*, whatever/whereever that might be. The schema then provides all the namespaces. That's my whole point. In generic terms, you have one piece of unique information in the document (like the GUID) that can be used by the loader to resolve everything else. That is all I am really saying with all this. Instead of getting specific on all the details in the document, you add a "layer of abstraction" that leaves the document ambiguous but resolvable. This is my strawman. I have the gut feeling that trying to tie down everything in the document, prevents us from incorporating new ideas down the road without having to change everything. When you add a layer of abstraction you are freed from the handcuffs of trying to nail everything down now. That's how it goes with computer stuff. We started with machine code, but then someone said, "Hey let's make assembly language so we can abstract just a little". The someone said, "Let's make C so we can abstract a little more". And then on to higher and higher levels. (apologies for glossing over computer history in one analogy) I feel like trying to ensure global uniqueness of tags in the document, is trying to nail things down to soon. The benefit is that things are simpler for the time being. The detriment is that we are stuck with what we decide now. One thing I know from working with this group is that minds and requirements *will* change. > > 19) The taguri scheme is an acceptable encoding for GUID. We should > > probably stick with it. > > > > There can be all sorts of different ways that programs discover a schema, > > given a GUID. Including a global registry. This is nice because it can > > indicate when two GUIDS are actually compatible. > > How is the loader going to locate external schemas with just a taguri? If > taguri is used instead of URLs, there's going to have to be, somewhere, a > lookup table that associates a taguri with a resource location. That seems a > bit troublesome when you could very easily just use URLs as your unique > identifiers. What alternative is there with taguri other than a lookup > table? Good question. There are many options. I'll opt not to start this thread just yet. But we do need to figure this out in any scenario. > > 21) Note that I abandon the directive syntax altogether. That's because > > I feel that this is the only directive needed. In actuality it make > > good sense to just use the directive syntax though: > > > > --- %GUID:ingy.net,2004/shiznit/ > > So, how would programmers hook their own private types into the loader? It's > a lot of hassle to have to develop a schema just for a quickie private type. > I use private types all the time, and that would be a major pain to not be > able to create my own namespace for the purpose of creating my own private > types. I would definitely markup my YAML with !mytype and then want to > handle them loading as custom types; how would your proposed way prevent > conflict with the schema of a document? > > I think we need multiple namespaces and possibly multiple schemas, so I think > there needs to be some sort of directive to separate them up. Going without > directives pretty much drops YAML to the point of, I feel, making not so much > simple as primitive. > > Sean O'Dell > > > ------------------------------------------------------- > This SF.Net email is sponsored by BEA Weblogic Workshop > FREE Java Enterprise J2EE developer tools! > Get your free copy of BEA WebLogic Workshop 8.1 today. > http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click > _______________________________________________ > Yaml-core mailing list > Yam...@li... > https://lists.sourceforge.net/lists/listinfo/yaml-core |
From: Clark C. E. <cc...@cl...> - 2004-09-01 19:47:59
|
On Wed, Sep 01, 2004 at 11:44:50AM -0700, Brian Ingerson wrote: | No. GUID is the unique id of a *schema*, whatever/whereever that might | be. The schema then provides all the namespaces. That's my whole point. So you require an external document just to determine the full tag for a given node? | This is my strawman. I have the gut feeling that trying to tie down everything | in the document, prevents us from incorporating new ideas down the road | without having to change everything. The %tag proposal reserves a very small region from the 'tagspace' for globally unique identifiers; surely this will not prevent incorporating new and better mechanisms later. | When you add a layer of abstraction you are freed from the handcuffs of trying | to nail everything down now. On the contrary, you now must get agreement of all parties involved how to communicate the 'globalization' of tags. Also, you didn't answer the meat of Sean's post: | > So, how would programmers hook their own private types into the loader? It's | > a lot of hassle to have to develop a schema just for a quickie private type. | > I use private types all the time, and that would be a major pain to not be | > able to create my own namespace for the purpose of creating my own private | > types. I would definitely markup my YAML with !mytype and then want to | > handle them loading as custom types; how would your proposed way prevent | > conflict with the schema of a document? This is a real use case. %tag solves this. How does your proposal solve it? Clark |
From: Sean O'D. <se...@ce...> - 2004-09-01 19:52:11
|
On Wednesday 01 September 2004 11:44, you wrote: > > No. GUID is the unique id of a *schema*, whatever/whereever that might be. > The schema then provides all the namespaces. That's my whole point. Ah, thanks for explaining that. > In generic terms, you have one piece of unique information in the document > (like the GUID) that can be used by the loader to resolve everything else. > That is all I am really saying with all this. I think we're going to want multiple schemas, though. =A0I think the basic= =20 information to locate the schemas and delineate the namespaces should be wi= th=20 the document itself. =A0Namespaces and tags can do other things besides hel= p=20 schemas. =A0You can create private types for the loader, assemble documents= ,=20 break them apart. =A0Lots of things besides schemas. > Instead of getting specific on all the details in the document, you add a > "layer of abstraction" that leaves the document ambiguous but resolvable. I think namespaces, schema references and tags all need to be in the docume= nt. =A0 I know what sort of abstraction you are asking for, but handling types=20 entirely from an external reference is hard. =A0It's requires a level of br= ain=20 power that is really taxing and slow-going to develop. =A0I would prefer on= ly=20 schemas were external, and the tags and references inside the document=20 itself. > This is my strawman. I have the gut feeling that trying to tie down > everything in the document, prevents us from incorporating new ideas down > the road without having to change everything. Tied in the document is okay, but put in the document is not always okay. = =A0 Schemas should be described externally, and anything else that has to do wi= th=20 loading should be described externally except for markup/references. =A0It'= s a=20 lot easier for programmers/people to markup the document itself instead of = an=20 external document describing the document. =A0That hurts my head just imagi= ning=20 trying to write a document in one file, then describe its structure in=20 another and apply namespaces/schemas/etc. to specific points. =A0That would= not=20 be fun. > > How is the loader going to locate external schemas with just a taguri?= =20 > > If taguri is used instead of URLs, there's going to have to be, > > somewhere, a lookup table that associates a taguri with a resource > > location. =A0That seems a bit troublesome when you could very easily ju= st > > use URLs as your unique identifiers. =A0What alternative is there with > > taguri other than a lookup table? > > Good question. There are many options. I'll opt not to start this thread > just yet. But we do need to figure this out in any scenario. I just hope no one seriously considers a lookup table, like a MIME type=20 database. =A0Sean O'Dell |
From: Clark C. E. <cc...@cl...> - 2004-09-01 19:12:35
|
On Wed, Sep 01, 2004 at 10:25:11AM -0700, Sean O'Dell wrote: | > 11) The prefix systems being discussed currently are used to make tags | > globally unique in the document. Not only is this overkill, it's not | > necessary. I also tend to think it's not even desirable. | <snip> | > 15) A document can have 0 or 1 encoded GUIDs. In other words you can | > specify it or not. If you don't then the parser just fires off a | > start-document event with an *empty* GUID. This is valid. Why? | | Well, we will need more than one namespace per document. A single document | can use more than one namespace. GUID is the unique ID of a namespace, I | assume? We'll need more than one of those per document. Agreed. | > 21) Note that I abandon the directive syntax altogether. That's because | > I feel that this is the only directive needed. In actuality it make | > good sense to just use the directive syntax though: | > | > --- %GUID:ingy.net,2004/shiznit/ | So, how would programmers hook their own private types into the loader? | It's a lot of hassle to have to develop a schema just for a quickie | private type. I use private types all the time, and that would be a | major pain to not be able to create my own namespace for the purpose of | creating my own private types. I would definitely markup my YAML with | !mytype and then want to handle them loading as custom types; how would | your proposed way prevent conflict with the schema of a document? | | I think we need multiple namespaces and possibly multiple schemas, so I | think there needs to be some sort of directive to separate them up. | Going without directives pretty much drops YAML to the point of, I feel, | making not so much simple as primitive. Hear Hear! | > 19) The taguri scheme is an acceptable encoding for GUID. We should | > probably stick with it. | > | > There can be all sorts of different ways that programs discover a schema, | > given a GUID. Including a global registry. This is nice because it can | > indicate when two GUIDS are actually compatible. | | How is the loader going to locate external schemas with just a taguri? If | taguri is used instead of URLs, there's going to have to be, somewhere, a | lookup table that associates a taguri with a resource location. There are lots of problems with URLs as we've stated before, among them: - is http://www.clarkevans.com/schema and http://clarkevans.com/schema the same or different schema? - say I've got a ton of documents out there, and a court rules that clarkevans.com is actually violating the trademark of a well known musician -- and I lose my domain; or, if I simply forget to renew it and someone else takes the domain - what happens if my website is down or you don't have internet connectivity; or you are running in a secure environment that doesn't allow _any_ outside tcp connections -- does this change the parser's behavior? - what do you put at the end of the rainbow? is what you put at the end of the rainbow today going to be useful three years down the line? do you change what is at the end of the rainbow? - how do I have both a YASL schema, and also a YSchematron schema for my document; YASL will be good at checking structure, YSchematron may be perfect for checking particular semantics; assuming "one true schema language" is not scalable - if its a web address there is an expectation of human readability, namely that it return a web page... - what happens in 10 years when http falls out of style, and people start using xktp: -- do you force applications to continue to use http since that's how the schema is identified? what happens if you use https today, and in 4 years they find a protocol exploit making https unsafe? - you have to register and maintain a domain name to have a valid schema? how does one enforce what is at the end of the rainbow? or is it just 'try it and find out' ? | What alternative is there with taguri other than a lookup table? With http you will need a lookup table anyway. To solve most of these problems you'll have to create a a 'cache' or registry mechanism anyway... so the URL doesn't buy you anything. It _seems_ like it is a good idea, but in practice it just doesn't make life any easier; it just confuses the issue. The goal of taguri is quite simple... to provide a globally unique identifier that is _always_ globally unique -- forever, not just today; an identifier that does not imply any interaction model, access protocols, schema mechanism, or any other policy than just being an identifier. Best, Clark -- Clark C. Evans Prometheus Research, LLC. http://www.prometheusresearch.com/ o office: +1.203.777.2550 ~/ , mobile: +1.203.444.0557 // (( Prometheus Research: Transforming Data Into Knowledge \\ , \/ - Research Exchange Database /\ - Survey & Assessment Technologies ` \ - Software Tools for Researchers ~ * |
From: Sean O'D. <se...@ce...> - 2004-09-01 19:51:49
|
On Wednesday 01 September 2004 12:12, you wrote: > > There are lots of problems with URLs as we've stated before, among them: > > =A0 - is http://www.clarkevans.com/schema and http://clarkevans.com/schema > =A0 =A0 the same or different schema? Different; it's only in people's minds that www. is optional. =A0In theory,= it=20 was never optional, but people have become lazy and web masters have=20 accommodated them by making www.domain and domain resolve to the same IP. I= t=20 was always just a convenience mechanism that you would not need in this cas= e. > =A0 - say I've got a ton of documents out there, and a court rules that > =A0 =A0 clarkevans.com is actually violating the trademark of a well known > =A0 =A0 musician -- and I lose my domain; or, if I simply forget to renew= it > =A0 =A0 and someone else takes the domain We can't solve the problems of the internet. =A0Whether we use URLs as=20 namespaces, or use taguri and then lookup the external reference (a URL in= =20 some cases) from a table, if the resource isn't there, it isn't there. =A0T= he=20 URL would still be unique, and would serve the purpose of providing a uniqu= e=20 ID, plus it's the location of the resource itself. A taguri->URL lookup table creates a level of indirection that is going to = be=20 a pain to maintain. =A0I don't think it's something anyone is going to want= ,=20 either as a user or as the person who has to create/maintain them. =A0Tagur= i's=20 are fine now, when we don't have external references, but when you have to= =20 start resolving them to external resources, I think it's going to feel a bi= t=20 "disconnected". =A0Maintaining such a lookup table would be like maintainin= g a=20 MIME type database. =A0It's always changing, never correct and never the sa= me=20 as anyone else's. > =A0 - what happens if my website is down or you don't have internet > =A0 =A0 connectivity; or you are running in a secure environment that > =A0 =A0 doesn't allow _any_ outside tcp connections -- does this change > =A0 =A0 the parser's behavior? Don't take responsibility for internet connectivity issues; let people=20 organize their schemas into locations they are comfortable with. > =A0 - how do I have both a YASL schema, and also a YSchematron schema > =A0 =A0 for my document; YASL will be good at checking structure, > =A0 =A0 YSchematron may be perfect for checking particular semantics; > =A0 =A0 assuming "one true schema language" is not scalable I'm all for having multiple schemas applied to a single document. > =A0 - if its a web address there is an expectation of human readability, > =A0 =A0 namely that it return a web page... No, http only specifies a transport protocol, not html. =A0An http server=20 reports the MIME type it is emitting when a request is made. =A0That's why = it=20 works so well for XML-RPC, SOAP, XML Schemas, etc. > =A0 - what happens in 10 years when http falls out of style, and people > =A0 =A0 start using xktp: -- do you force applications to continue to use > =A0 =A0 http since that's how the schema is identified? =A0what happens if > =A0 =A0 you use https today, and in 4 years they find a protocol exploit > =A0 =A0 making https unsafe? Again, leave this to the people using YAML. =A0Let them decide how to manag= e=20 their needs and their future. =A0Transport is not something you should worr= y=20 about. =A0If http loses to another transport protocol in the future, then=20 everyone has to change their web pages, business cards, letterhead paper,=20 television ads, etc. =A0It's not something anyone needs to worry about. =A0= It=20 will be managed like such dramatic changes are always managed. =A0It's not = the=20 job of the YAML developers to worry about that. > =A0 - you have to register and maintain a domain name to have a valid > =A0 =A0 schema? how does one enforce what is at the end of the rainbow? or > =A0 =A0 is it just 'try it and find out' ? URL's can specify anything; web servers, local files, and you can even exte= nd=20 it to your own purposes, so you could use it to indicate another document i= n=20 the same stream if you wanted. =A0For example: http://yaml.org/2002/standard_schema file://usr/share/yaml/schemas/local_schema_name yaml_stream://quickie_schema_document_name The only real rules to URLs are the format. =A0There are only a couple of=20 well-known protocols and there are standard ways people use them, but you c= an=20 really do anything with them you like. > | What alternative is there with taguri other than a lookup table? > > With http you will need a lookup table anyway. To solve most of these > problems you'll have to create a a 'cache' or registry mechanism > anyway... so the URL doesn't buy you anything. It _seems_ like it is a > good idea, but in practice it just doesn't make life any easier; it > just confuses the issue. No, a URL doesn't require a lookup table. =A0The URL specifies exactly wher= e to=20 go to get the schema. =A0If you used a URL as the ID, you can go straight t= o=20 the resource, no lookups needed. > The goal of taguri is quite simple... to provide a globally unique > identifier that is _always_ globally unique -- forever, not just today; > an identifier that does not imply any interaction model, access > protocols, schema mechanism, or any other policy than just being an > identifier. So do URLs. =A0There is no difference between the two in terms of uniquenes= s. =A0Sean O'Dell |
From: Clark C. E. <cc...@cl...> - 2004-09-01 20:09:06
|
On Wed, Sep 01, 2004 at 12:51:44PM -0700, Sean O'Dell wrote: | > ? - say I've got a ton of documents out there, and a court rules that | > ? ? clarkevans.com is actually violating the trademark of a well known | > ? ? musician -- and I lose my domain; or, if I simply forget to renew it | > ? ? and someone else takes the domain ... | The URL would still be unique, and would serve the purpose of providing | a unique ID, plus it's the location of the resource itself. However, the musican who gets my old domain may also want to use http://clarkevans.com/bing for another schema, that is'nt mine. Since he owns the domain now, who is right? Does the URI identify _my_ schema or his? Furthermore, he may not play nicely and keep my resource there, so this URL doesn't infact locate the resource. Does this mean that my schema is now invalid? | Maintaining such a lookup table would be like maintaining a | MIME type database. ?It's always changing, never correct and never the same | as anyone else's. But, it has an advantage. Let's say I have a production process that uses your schema (located by a URL), when I launch my system it works great. Then, when I'm out for vacation, you change the schema... making my documents invalid, or something like that. This is brittle. | > ? - how do I have both a YASL schema, and also a YSchematron schema | > ? ? for my document; YASL will be good at checking structure, | > ? ? YSchematron may be perfect for checking particular semantics; | > ? ? assuming "one true schema language" is not scalable | | I'm all for having multiple schemas applied to a single document. How do you do it when you only have one resource pointed to? | > ? - what happens in 10 years when http falls out of style, and people | > ? ? start using xktp: -- do you force applications to continue to use | > ? ? http since that's how the schema is identified? ?what happens if | > ? ? you use https today, and in 4 years they find a protocol exploit | > ? ? making https unsafe? | | Again, leave this to the people using YAML. ?Let them decide how to manage | their needs and their future. ?Transport is not something you should worry | about. ?If http loses to another transport protocol in the future, then | everyone has to change their web pages, business cards, letterhead paper, | television ads, etc. ?It's not something anyone needs to worry about. ?It | will be managed like such dramatic changes are always managed. ?It's not the | job of the YAML developers to worry about that. So, you'd require everyone go back and change all of their archived documents to reflect the new namespace? | URL's can specify anything; web servers, local files, and you can even extend | it to your own purposes, so you could use it to indicate another document in | the same stream if you wanted. ?For example: | | http://yaml.org/2002/standard_schema | file://usr/share/yaml/schemas/local_schema_name this one isn't globally unique; so not only is the URL brittle, it isn't necessarly globally unique; seems to me you are after a competely different animal You are after a way to _directly_ resolve a schema from a YAML document, you are not after a globally unique identifier. |
From: Sean O'D. <se...@ce...> - 2004-09-01 21:15:58
|
On Wednesday 01 September 2004 13:09, Clark C. Evans wrote: > On Wed, Sep 01, 2004 at 12:51:44PM -0700, Sean O'Dell wrote: > | > ? - say I've got a ton of documents out there, and a court rules that > | > ? ? clarkevans.com is actually violating the trademark of a well known > | > ? ? musician -- and I lose my domain; or, if I simply forget to renew > | > it ? ? and someone else takes the domain > > | The URL would still be unique, and would serve the purpose of providing > | a unique ID, plus it's the location of the resource itself. > > However, the musican who gets my old domain may also want to use > http://clarkevans.com/bing for another schema, that is'nt mine. Since he > owns the domain now, who is right? Does the URI identify _my_ schema > or his? Furthermore, he may not play nicely and keep my resource there, > so this URL doesn't infact locate the resource. Does this mean that > my schema is now invalid? Again, I don't think this is something we should be managing. If you write schemas that are located at a server somewhere, either be sure you keep the domain, or don't use the domain at all because your YAML will outlive your domain name usage. It's up to you to manage where your resources are located and how long they need to be available. It's inevitable anyway, you know. Schemas will be external and there will be issues about whether or not they're still accessible. This isn't within YAML's sphere of responsibility. It's a management issue up to the people who use YAML. > | Maintaining such a lookup table would be like maintaining a > | MIME type database. ?It's always changing, never correct and never the > | same as anyone else's. > > But, it has an advantage. Let's say I have a production process > that uses your schema (located by a URL), when I launch my system > it works great. Then, when I'm out for vacation, you change > the schema... making my documents invalid, or something like > that. This is brittle. What if I get a document that refers to types by taguri and I don't have the table to lookup the schema its associated with? I see that as a much more common problem than domain names going away. People getting documents marked up with taguri's and not getting the lookup table that associates taguri's with resource location information. > | > ? - how do I have both a YASL schema, and also a YSchematron schema > | > ? ? for my document; YASL will be good at checking structure, > | > ? ? YSchematron may be perfect for checking particular semantics; > | > ? ? assuming "one true schema language" is not scalable > | > | I'm all for having multiple schemas applied to a single document. > > How do you do it when you only have one resource pointed to? You wouldn't, you would have multiple resources referenced in the document header, I assume. > | > ? - what happens in 10 years when http falls out of style, and people > | > ? ? start using xktp: -- do you force applications to continue to use > | > ? ? http since that's how the schema is identified? ?what happens if > | > ? ? you use https today, and in 4 years they find a protocol exploit > | > ? ? making https unsafe? > | > | Again, leave this to the people using YAML. ?Let them decide how to > | manage their needs and their future. ?Transport is not something you > | should worry about. ?If http loses to another transport protocol in the > | future, then everyone has to change their web pages, business cards, > | letterhead paper, television ads, etc. ?It's not something anyone needs > | to worry about. ?It will be managed like such dramatic changes are always > | managed. ?It's not the job of the YAML developers to worry about that. > > So, you'd require everyone go back and change all of their archived > documents to reflect the new namespace? No, I don't even think about it. It's not something I think we should worry about. If someone distributes a whole lot of YAML using a domain name as a namespace, and then lets the domain name go, that's their problem, not ours. I see the ID as having two purposes: to uniquely identify and to locate. If the domain goes away, unique identification still exists. If the domain goes away, the location of any external references are inaccessible. However, even with taguri, eventually you will still need to reference schemas externally, and you WILL use URLs to obtain them (even if you don't, you will use SOME location system). So, if you place your schemas on a server and refer to them using your domain name and then let your domain name go, you have the same problem. In simpler words: you will have that problem of resources being inaccessible due to "general internet failure" no matter what, and even when that happens, it doesn't change the unique identification property of the URL. The URL still works to uniquely identify. So, if you use URLs as a unique ID, or if you use taguri and then use a lookup table to obtain resource locations, you are going to have those very same problems either way. So if those problems are always going to exist anyway, why not skip that level of indirection, the lookup table, and make URLs the unique ids? Or are you saying you want all schemas to be completely written out in the same document as the data document? That's the only way you'll never have to obtain external resources, is if it's embedded in the document. If you are going to have schemas external to the document, then you are going to have those same problems. > | URL's can specify anything; web servers, local files, and you can even > | extend it to your own purposes, so you could use it to indicate another > | document in the same stream if you wanted. ?For example: > | > | http://yaml.org/2002/standard_schema > | file://usr/share/yaml/schemas/local_schema_name > > this one isn't globally unique; so not only is the URL brittle, it > isn't necessarly globally unique; seems to me you are after a competely > different animal > > You are after a way to _directly_ resolve a schema from a YAML > document, you are not after a globally unique identifier. It serves both purposes. Yes, all of those URLs I gave are completely unique. It's based on standard unix file paths. You can't have a single file path that refers to two different files, can you? What else could "http://yaml.org/2002/standard_schema" possibly refer to? How could it possibly be ambiguous? Sean O'Dell |
From: Oren Ben-K. <or...@be...> - 2004-09-01 21:17:17
|
On Wednesday 01 September 2004 22:51, Sean O'Dell wrote: > Don't take responsibility for internet connectivity issues; let people > organize their schemas into locations they are comfortable with. Sean, I do consulting work for the IDF on occasion. They have a huge internal network that is not and will not *ever* be connected to the internet. EVER. Does this mean that they are barred forver from using any of the schemas defined by normal intenet users (specifically, yaml.org types)? That's ridiculous. My boss has a laptop he likes to work on on long flights. Does he have to pay $1/minute to be able to open a YAML document, because its schema is specified as http://something? My friend's company has a narrow bandwidth internet connection (relative to the number of people, that is :-). Should they have to allocate part of that expensive bandwidth for fetching YAML schemas, that change only every blue moon? The server for yaml.org went down due to a malfunction. Does this mean every YAML users in the world can't open any document using !http://yaml.org until it is back up? My MP3-description schema is a huge hit. Millions of users downloaded kazamuletorrent, my (legal stuff only!) p2p program that uses it. Do you really expect all of them to fetch it from my poor ADSL based server? Besides, I got bored with it and someone else bought the domain name. He got so fed up with people hitting his server for schemas that he started sending them maliciously wrong ones. People are getting angry and my E-mail inbox is flooded with death threats from some lunatic who lost his entire 1000 CD tracks catalog. He knows where I live. HELP! I have this lovely YAML schema implemented using YSL (Yaml Schema Language) which I keep on my server. The applications using it are written in Python. Lately, many Perl users started using it, but Perl uses YASL (Yet another Schema Language). Which one do I put on my web server? I heard that Python is migrating to YSL 1.1 which has new features I want to support. But if I use them in the new schema version (same schema!!! better validation, that's all), how can I go on supporting the users of older Python YSL versions? I could come up with a dozen additional scenarios like that if it wasn't midnight here. It is COMPLETELY AND UTTERLY UNREASONABLE to expect schemas to be fetched directly from a URL GUID. It will NEVER happen. EVER. The solution for ALL the above is simple. Perform a step mapping the schema GUID to an accessible physical location. This isn't something "wierd" or "optional" or "only used by 0.01% of the users out there". It is a vital, crucial step for having any chance of a practical, usable system. Given this step is inevitable, there's absolutely no point whatsoever in using a URL as the GUID for a schema. It makes much more sense to use a well though out scheme like taguri, which is the result of a lot of design work for solving the question "how to I create a scheme for globally unique names that is scalable, open, robust, practical, distributed, time-resistant, etc.". Hint: http based URLs fail most of these criteria (again, I could list scenarios, but you get the idea). We use taguris _for a reason_. Now, if something better comes up, fine, we'll consider it. So far its the best we've got. Have fun, Oren Ben-Kiki |
From: Sean O'D. <se...@ce...> - 2004-09-01 20:58:07
|
On Wednesday 01 September 2004 12:13, Oren Ben-Kiki wrote: > On Wednesday 01 September 2004 20:25, Sean O'Dell wrote: > > How is the loader going to locate external schemas with just a taguri? > > If taguri is used instead of URLs, there's going to have to be, > > somewhere, a lookup table that associates a taguri with a resource > > location. That seems a bit troublesome when you could very easily just > > use URLs as your unique identifiers. What alternative is there with > > taguri other than a lookup table? > > Too many to list. Going to http://yaml-taguri-service.org/tag=<the-taguri>, > for example. There - you have your URL if you want it. Well, the only reason for a globally unique identifier that I can think of would be to locate something. If the location is optional, then why does the namespace need to be globally unique? As far as I can tell, it only needs to be unique within the document. Sean O'Dell |
From: Clark C. E. <cc...@cl...> - 2004-09-01 21:14:22
|
On Wed, Sep 01, 2004 at 01:57:58PM -0700, Sean O'Dell wrote: | Well, the only reason for a globally unique identifier that I can think | of would be to locate something. If the location is optional, then why | does the namespace need to be globally unique? As far as I can tell, | it only needs to be unique within the document. The only reason you want a globally unique identifier is to compare it with some other identifier to see if it matches; it provides 'sameness'. In other words, to identify the object under discussion. For example, tag:yaml.org,2002:int can be used in a human discussion - no real location involved at all. Certainly one could use identity comparision in a given registry to find related information... a unique identifier allows everyone to refer to the "same" thing. You can attach your stuff, I can attach mine. No built-in bias or protocol implied. Note that the identity operation is one way, if X and Y are the same character for character, you know they are the same identifier and both refer to the same set of thingies. However, if X and Y are different, then you don't necessarly know that they don't refer to the same set of thingies. Aliases are a possibility. Sean, I suggest you go to the XML-DEV list or the URI-DISCUSS list and spend some more time hearing alternative perspectives on this issue.. you don't need to take my or Oren's word for it. Clark |
From: Sean O'D. <se...@ce...> - 2004-09-01 21:25:18
|
On Wednesday 01 September 2004 14:14, Clark C. Evans wrote: > On Wed, Sep 01, 2004 at 01:57:58PM -0700, Sean O'Dell wrote: > | Well, the only reason for a globally unique identifier that I can think > | of would be to locate something. If the location is optional, then why > | does the namespace need to be globally unique? As far as I can tell, > | it only needs to be unique within the document. > > The only reason you want a globally unique identifier is to compare it > with some other identifier to see if it matches; it provides 'sameness'. > In other words, to identify the object under discussion. For example, > tag:yaml.org,2002:int can be used in a human discussion - no real > location involved at all. Certainly one could use identity comparision > in a given registry to find related information... a unique identifier > allows everyone to refer to the "same" thing. You can attach your > stuff, I can attach mine. No built-in bias or protocol implied. But why does it have to be global? So long as the namespaces don't clash within the document, what does it matter if a namespace is used for two different reasons in two different documents written by two different people? If you're not referring to anything external, what does it matter? > Sean, I suggest you go to the XML-DEV list or the URI-DISCUSS list and > spend some more time hearing alternative perspectives on this issue.. > you don't need to take my or Oren's word for it. Why would I need to do that? I don't see it as a complicated issue. We only have two choices. A unique ID that doesn't offer location, or a unique ID that does. taguri does not, URL does. We need to associate the unique ID with a location for certain purposes anyway, such as obtaining access to schemas. With taguri, we need a lookup system to associate the unique ID with a URL. With a URL, we get both a unique ID and location information. But also, like I said, whether those external references are accessible is always going to be an issue either with taguri or URL as the identifier. Sooner or later, a URL needed to obtain a schema or some other external reference is going to be missing, and the loader is going to have to make due without. That will happen even with taguri. Sean O'Dell |
From: T. O. <tra...@ru...> - 2004-09-01 21:57:03
|
On Wednesday 01 September 2004 05:25 pm, Sean O'Dell wrote: > But also, like I said, whether those external references are accessible is > always going to be an issue either with taguri or URL as the identifier. > Sooner or later, a URL needed to obtain a schema or some other external > reference is going to be missing, and the loader is going to have to make > due without. That will happen even with taguri. Hey Sean, Not too long ago I would have made the same argument as you. But, now, I think there are a couple of good reasons to use something that isn't a URL per se. The main good reason has to do with that lookup. See, if have a taguri and you do a lookup then the lookup table can be mirrored, and even locally mirrored. So you can find it. Also, that table might specify multiple mirrors to find one schema. If we limit ourselves to one URL then that's our only shot. Nonetheless, taguri isn't far off from URL, my bet is if you wanted too you could run it through a some simple transformation rules and get a URL, and vice-versa. You certainly do that if you want. URL may be more direct, but the taguri lookup is more versatile. Also on the plus side, the tag system is looking up. T. |
From: Sean O'D. <se...@ce...> - 2004-09-01 21:43:41
|
On Wednesday 01 September 2004 14:17, Oren Ben-Kiki wrote: > On Wednesday 01 September 2004 22:51, Sean O'Dell wrote: > > Don't take responsibility for internet connectivity issues; let people > > organize their schemas into locations they are comfortable with. > > Sean, I do consulting work for the IDF on occasion. They have a huge > internal network that is not and will not *ever* be connected to the > internet. EVER. Does this mean that they are barred forver from using any > of the schemas defined by normal intenet users (specifically, yaml.org > types)? That's ridiculous. This is going to be a problem anyway. Schemas will external documents whether you use taguri or not. In IDF's case, they will have to use a local copy of the schemas. A URL can refer to a local file. You can also load local schemas using remote URLs by simply directing the loader to look in a certain directory for the schema first. Web browsers cache web pages locally and can refer to them by their full remote URL. In fact, I would think it would be the norm to distribute a YAML implementation with a whole slew of common schemas that are referred to by their URL, but loaded locally. > The server for yaml.org went down due to a malfunction. Does this mean > every YAML users in the world can't open any document using > !http://yaml.org until it is back up? If they don't have a local copy, yes. <snip> > I could come up with a dozen additional scenarios like that if it wasn't > midnight here. It is COMPLETELY AND UTTERLY UNREASONABLE to expect schemas > to be fetched directly from a URL GUID. It will NEVER happen. EVER. They will be anyway, sooner or later. No one is going to want YAML with schemas embedded in their documents, so you are going to have to allow external references sooner or later, and if you don't build-in a delivery mechanism you are asking people to pass around multiple files; the document and its schema(s). Externally referred files are inevitable. Think about how schemas are distributed. What are you going to do? Maintain an archive of all schemas and distribute it? Ask people to email them around to each other? That's not a good solution, I don't think. I think these two statements are true: 1) Schemas will be external 2) All external documents need a delivery mechanism > The solution for ALL the above is simple. Perform a step mapping the schema > GUID to an accessible physical location. This isn't something "wierd" or > "optional" or "only used by 0.01% of the users out there". It is a vital, > crucial step for having any chance of a practical, usable system. But then you are still stuck with all those problems. You aren't solving the general problems of external documents becoming inaccessible over the internet. Unless you mean to say that schemas must always be locally available. In which case, you are saying that a document which refers to a local schema must reside on the same file system as the schema itself. Meaning, if I write a YAML document that uses a schema, and I send it to "bob", I must also send "bob" a copy of the schema. That's going to cause a lot of "inaccessibility" problems. People will forget to send the schema, or forget to send updates when a schema changes, etc. Also, for the people who have to do that, locate a copy of the schema and distribute it with their documents, that's a big pain in the butt. I don't see that as being too fun. > Given this step is inevitable, there's absolutely no point whatsoever in > using a URL as the GUID for a schema. It makes much more sense to use a > well though out scheme like taguri, which is the result of a lot of design I don't agree. I think coming up with a unique identifier scheme is child's play and not something anyone here should be afraid to tackle in favor of something that offers advantages such as what URLs offer. > work for solving the question "how to I create a scheme for globally unique > names that is scalable, open, robust, practical, distributed, > time-resistant, etc.". Hint: http based URLs fail most of these criteria > (again, I could list scenarios, but you get the idea). I would welcome even one example of an exception that somehow causes a URL to not be globally unique. > We use taguris _for a reason_. Now, if something better comes up, fine, > we'll consider it. So far its the best we've got. I think URLs would work a lot better. They're unique and they avoid the "lookup" indirection. Sean O'Dell |
From: Clark C. E. <cc...@cl...> - 2004-09-01 21:59:44
|
On Wed, Sep 01, 2004 at 02:43:37PM -0700, Sean O'Dell wrote: | In IDF's case, they will have to use a local copy of the schemas. A | URL can refer to a local file. Suppose I have two companies. Both of them use different accounting schemas, but since the sysadmins think alike, they use the same URL /usr/local/share/timesheet.yml to make sure that the schemas are always accessable. One day, the companies merge, and they merge their data. And they find out that the 'unique' URLs they were using wern't really unique afterall. | I think URLs would work a lot better. They're unique and they avoid the | "lookup" indirection. If you are claiming that we don't need global unique identifiers, this is a different claim; one that Brian's making. However, you can't have it both ways. Either they are unique, or they can be used to provide a global unique identifier. In any case, I'm tired of providing counter examples as to why URLs make poor global unique identifiers. Best, Clark |