From: Oren Ben-K. <or...@ri...> - 2002-07-22 20:17:31
|
Hi guys, I've been on vacation these last few days: lots of sun, water, and temperatures over 100 in the shade - if one could find some, that is :-) At any rate, I've just caught up with the messages. I have mixed feelings about Clark's proposal. I can see the bright side, of course, but also several potential shady issues (pardon the analogy - I do hope the sun hasn't muddled my brain too much :-). Here are some concerns, in no particular order: - HTTP only? Isn't this a bit restrictive? There *are* other protocols one can use to fetch web documents (e.g., ftp). And others might be added later on... What's wrong with keeping our shortcut notation, and merely limiting the URIs to URLs? - Optional end of rainbow => URI allowed? Clark mentioned making it optional (an obvious necessity for private types). If it is optional, what is the problem with using URIs, exactly? If one chooses to use an 'isbn:...' type family (Ugh), he's merely opted out of ever supplying a pot of gold at the end of the rainbow, forever. Which he is allowed to do anyway... - XML namespaces Would this mean giving up on using XML namespaces? That would make converting XML schemas to YAML that much harder. Now, we had this idea of *constructing* YAML type family names from the pair {namespace-URI, local-name}. If we could do that in a reasonable way - say, namespace-URI/local-name or whatever - we could probably find a way to preserve this "dual-personality" schema/namespace option for implementers. It may be vital for YAML gaining acceptance in the world. - Fragments. Clark has ruled them out, which is sensible. Supposing that any use of a fragment in an XML namespace means the author has gone the way of "no gold at the end of the rainbow, ever", and assuming that the above issue of mapping XML to YAML is resolved somehow (is '/' allowed in a fragment? have to check. Maybe we'll have to revert to '$')... anyway, assuming all this, can we view 'format' as a 'fragment'? int: !int#dec 7 # http://type.yaml.org/int#dec Presumably each format could be a top-level key so the above would make some sense... That's rather nicer than using '|' for this purpose. Again, only assuming the XML issues are resolved! - Minimal key set What's the minimum one would have to put in the "pot of gold"? I mean, if it is there at all. An empty document would presumably be legal (after all, this "pot of gold" *is* optional). But what must a non-empty document contain? *Should* there be a minimum set of keys, or just "recommended keys"? - Relationship with RDDL and other meta-data standards... Probably someone should set up some way they can be simply cut&pasted into this scheme, using appropriate top-level key(s) and structure. We should at least check that this is feasible/reasonable... - Requirement from a YAML processor. I think that accessing the "pot of gold" should be *very* optional, and it should be *crystal clear* it is perfectly possible to handle YAML without ever thinking about it. Further, I think it should also be made very clear that it is *very* unrealistic to expect, in the general case, that anything other than, say, schema validation, would be possible to achieve for a type family that is solely "known" by its "pot of gold". That is, any expectation that YAML application will magically understand "the semantics" of a type through its "pot of gold" is, to say it kindly, naive. Of course people could attach code in any interpreted language - scratch that, in *any* language (Windows X86 DLLs included - Ugh) - to the "pot of gold", thereby allowing dynamic loading of type family semantics. I find it to be more of a scary thought than a comforting one :-) - Relationship with the schema mechanism. Having a "pot of gold" document accessible via the type family immediatly suggests that this should be "the way" the schema would be fetched. That's nice and all, but... is it practical to chop the schema into multiple physical documents this way (one per type family used)? Putting aside efficiency issues, what about problems like version control and being easy to read/write? Keep in mind that a collection type family will constrain its contained sub-nodes to the n-th degree regardless of their type - or, I should say, *in addition* to the generic restrictions specified by their type. - The risk factor. There is a giant leap between "type families are unique IDs with human-readable definition" and this proposal. And unlike everything else we've done with YAML, this would be exploring into new lands because nobody I know of has done anything like it. Here we are speculating about what may or may not prove useful to application developers, and I for one do not have the personal experience in such dynamic-loading extendible-from-the-web-yet-strongly-typed systems to say whether this make sense or not. Actually, I have a great deal of experience (as do we all) in one such system, the HTML browser. And it is a terrible mess, a failure of standards to achieve anything like a sane system - the most we can learn from it is what *not* to do. It may make more sense to use DNS-like mechanism. Or just make direct use of DNS. Or LDAP. Or WebDav. Or something. It may make more sense to have each top-level key reside in its own physical document. I have no idea, because I don't have a good grasp of the use case. Speaking of which... - The use case? What *is* the use case (other than being able to answer the newbie about "what does a type family point to")? What is the class of applications that want to be schema-aware but not schema-specific? If the answer is "validating parsers and authoring tools" than I think that this proposal is a serious overkill. A simple schema language would do the trick for both. Is it something like "web services"? I have strong doubts about whether something like this proposal is actually useful for such services (given a schema language exists). Services require a much stronger knowledge of "semantics" than would be offered by the "pot of gold". IMVVHO, that is - since nobody ever saw "web services" actually working as hyped, that's all anybody has to offer, I'm afraid. On the other hand, using "point-to-point" or "client-to-server" schema-specific XML-RPC/SOAP/etc. *is* working in practice. Again this only requires a schema language (if that). - Effects on the spec? If we agree the "pot of gold" is optional, and if we make it easy to look at a URL and say whether it is a "pot of gold" or not (simplest way: give it a distinctive mime type), is there really any reason to change the spec? It seems to me we can safely define this whole thing in a separate spec - "A convention for using YAML type families as URLs for fetching meta-data". We can start by giving some meta-data for our type core families as *an example*. If people like it and build on it - great. If it is useless for 99% of the people in the world (my suspicion at this point - feel free to set me right), no great loss, either. We'd have merely over-formalized a bit how we define type families. Minor changes to the spec may still result (specifically, handling of fragments and formats - and mentioning that there *is* an *optional* convention for meta data planned/available at a separate spec). I would be more than happy to discuss them under such an approach. - Effects on time table? I suspect it will take ages to settle the issues this proposal raises. I'm less than enthused at the thought of wording such a chunk of functionality into our core spec. From the narrow point of view of "let's get a spec out the door", this proposal seems to be a serious problem. I could be wrong here - especially if it is worded as something optional, and would be rather loosly defined. By still, at this point, my vote is to otherwise steer away from this whole thing in the YAML 1.0 *CORE* spec. Let's create a separate YAML 1.0 *META* spec for this instead. Our current spec is big enough as it is anyway... Have fun, Oren Ben-Kiki |
From: Oren Ben-K. <or...@ri...> - 2002-07-23 18:51:35
|
Clark C . Evans [mailto:cc...@cl...] wrote: > | - HTTP only? > | - Optional end of rainbow => URI allowed? > > how about amending the spec to say: > > - All URIs must be from the HTTP scheme > - The parser should support other scheme so that > migration to a less restrictive mechanism is > possible. > > And then we wait for use cases that would require something > other than the http scheme. OK with a "minor" change: - All URIs SHOULD be from the HTTP scheme - The parser MUST support other scheme so that migration to a less restrictive mechanism is possible. The other way makes little sense... It is the same as the printable characters case. > | - XML namespaces > > I don't think this is a use case. If you are using XML for its > information model (mixed content, etc.) then you won't really want > to use YAML. I beg to differ. People are defining XML namespaces right and left, including things like XML-RPC and SOAP, which do *not* make much (or any) use of mixed content etc. Some do mix attributes and elements, though. It would be very nice if we had a way to make it easier for them to interoperate/migrate to YAML. > | - Fragments. > | > | Clark has ruled them out, which is sensible. > > ... snip stuff about mapping XML to YAML... I guess that making life easy for XMLers using fragments isn't high on the priority list :-) > | anyway, assuming all this, can we view 'format' as a 'fragment'? > | > | int: !int#dec 7 # http://type.yaml.org/int#dec > > I don't see why not. NOTE: That means a change in the productions and the syntax! A nice one, I think, but implementers should take notice (Brian? Neil? Anyone?). > | - Minimal key set > > I guess a title and summary would be good things to have, > otherwise there isn't a point of even having it. > > | - Relationship with RDDL and other meta-data standards... > > By allowing keys with a period to belong to domain holders, each > meta-data requirement can be done in its own bucket without prior > planning on our part. As far as RDDL; let's just stick with > something simple and let it grow from there. +1 there. > | - Requirement from a YAML processor. > | > | I think that accessing the "pot of gold" should be *very* > | optional... > > It may not, however, be possible for some YAML processes (such as a > validator) to proceed without access to the resource directory. Validators need *schemas*, not the full resources-pot-of-gold. As do all of the use cases. > I'd not like it to be too optional; it's ok if nothing is there, > but if *something* is there, it should (must?) be a yaml catalog. SHOULD. We should define a mime type for it - that would make it easy to detect. > | - Relationship with the schema mechanism. > | ... > | is it practical to chop the schema > | into multiple > | physical documents this way (one per type family used)? > | Putting aside > | efficiency issues, what about problems like version control > | and being easy to read/write? > > Good question. One approach is to only use the type family for > root nodes or for data islands. Which will not work for serializing data structures in Perl/Python/etc. > For version information, perhaps > if we allowed collection nodes to have a "format", aka "version". No thanks. Feel free to have !company.com/type/1 if you want to go that route. Out of scope for the YAML spec itself. I was more worried about the issues of version control in the physical sense (what happens if one gets an updated version of some of these "pots of gold", but not all, and so on). It seems like a potential sticky point. > ... I think that this is a huge discussion in and of itself > that we should probably start in on and will probably rage for a year > or more... I think we all have less experience in this domain and it > will take some time before we even know what all of the issues are. > > | - The risk factor. > > Right. The closest thing is RDDL which hasn't been adopted beyond > a few small groups. Exactly. > There is often a big debate over systems like > this over two points of failure: > > - Inheritently "centralized" approach, he who owns the domain > defines the resource file. This is all good, but just beacuse > we have one centralized approach, doesn't stop other registry > like mechanisms from emerging. Well, having just one physical file is certainly a problem this way. It allows the type family owner to censor what meta data is attached to the type family... except for if people set up "mirroring plus enhancement" versions of the pot-of-gold files. But if we assume this is going to be a common practice, the point of using only HTTP is greatly weakened. Any unique string can act as a key to such registries. > - Problems with efficency/cashing, either servers get beat up > badly or caches become stale. This is true, those who don't > use "expires" header in HTTP are bound to have pains. However, > I think that this isn't an architectural problem as much as it > is an educational one. Again using specialized registries solve most of this problem. Plus reliability issues, and predictability, and control over updates. If I was a corporate with a mission-critical YAML system, the last thing I'd do was to fetch the required meta data from an external HTTP server, regardless of any of the nice features this allows in theory. > From someone who used XML "extensively" for a spell, not having a > standard directory mechanism for items relating to a given vocabulary > was one of the sore points that I felt (hence by involvement > with RDDL). > It's nice not to have to hunt-down a schema or to be able to click > on a family name and retrieve a human-readable description of what > the type is all about. It sure is nice and I'm not against that. In fact, that is recommended in the spec even today. That's a far cry from the pot-of-gold proposal... > | - The use case? > | ... > ... I think the most important reason > is to solve these in a manner which allows for other information > about a type family to be provided by its 'owner'. I suppose so. I don't view it as a burning need, though. > | - Effects on the spec? > > It need not be in the specification proper, a > link to it from the spec would probably be a good idea. There's nothing to link to yet. We'll add a mention of a separate spec addressing this - that's the best we can do. > | Minor changes to the spec may still result (specifically, > | handling of > | fragments and formats - and mentioning that there *is* an *optional* > | convention for meta data planned/available at a separate > | spec). I would be more than happy to discuss them under such an approach. > > Ok. Fine. > | - Effects on time table? > | > | I suspect it will take ages to settle the issues this > | proposal raises... We seem to agree on this :-) Operationally: - Modify !type|format to !type#format - Add wording saying that the type family *should* be a URL pointing to a YAML document describing the type in a format to be specified in a separate spec; How do people feel about these specific changes to the spec? Brian? Have fun, Oren Ben-Kiki |
From: Clark C . E. <cc...@cl...> - 2002-07-23 20:07:26
|
On Tue, Jul 23, 2002 at 09:53:06PM +0300, Oren Ben-Kiki wrote: | > There is often a big debate over systems like | > this over two points of failure: | > | > - Inheritently "centralized" approach, he who owns the domain | > defines the resource file. This is all good, but just beacuse | > we have one centralized approach, doesn't stop other registry | > like mechanisms from emerging. | | Well, having just one physical file is certainly a problem this way. It | allows the type family owner to censor what meta data is attached to the | type family... except for if people set up "mirroring plus enhancement" | versions of the pot-of-gold files. But if we assume this is going to be a | common practice, the point of using only HTTP is greatly weakened. Any | unique string can act as a key to such registries. | | > - Problems with efficency/cashing, either servers get beat up | > badly or caches become stale. This is true, those who don't | > use "expires" header in HTTP are bound to have pains. However, | > I think that this isn't an architectural problem as much as it | > is an educational one. | | Again using specialized registries solve most of this problem. Plus | reliability issues, and predictability, and control over updates. | | If I was a corporate with a mission-critical YAML system, the last thing I'd | do was to fetch the required meta data from an external HTTP server, | regardless of any of the nice features this allows in theory. Ok. The primary problem that causes heart burn: "If it walks like a duck, why isn't it a duck?" There are two "extreme" solutions: (a) use only HTTP URL and specify what's at the end of the rainbow (b) use only URN or similar scheme using a reverse DNS/java like thingy that can't possibly look like a URL. FYI, it appears as if the UDDI people seem to be choosing the latter approach; for example in their specification they use "urn:uddi:bla-bla-bla" often, and also I saw: 'uddi:ubr.uddi.org:identifier:dnb.com:D-U-N-S' although this doesn't seem to be registered anywhere. Anyway, perhaps the second approach is good. The "duck" problem is just purely icky. It causes lots of problems on the XML list and I have no reason to believe that it would'nt cause a mess here as well (if we got big enough). So... perhaps we grab onto the UDDI coat-tail? Adopt URN? Perhaps we could get "rdns" as a reverse-dns URN at http://www.iana.org/assignments/urn-namespaces Best, Clark -- Clark C. Evans Axista, Inc. http://www.axista.com 800.926.5525 XCOLLA Collaborative Project Management Software |
From: Oren Ben-K. <or...@ri...> - 2002-07-23 18:54:56
|
Donnal Walter [mailto:don...@ya...] wrote: > > Second, I was thining about private types; they are in general > > rather useless since you can't even match on them; !!x in a > > query language isn't the same as !!x in a document. Thus, I'm > > questioning if we even need private types? My bet is that they'll prove useful. A YPATH/YQUERY would have to allow matching on them, of course. They are also very useful for creating "fire and forget" documents. And they may be other uses, for example: > Not sure I entirely understand the issues here, but I intend to use > the !!x notation extensively in my data documents, but the private > definitions will be in Python code, so perhaps this discussion does > not apply to my situation. > > Third, Steve says that _why has referred to "type family" as a > > domain; this actually sounds like a better word than type family. > > Yes, IMO "domain" is preferrable to "type family". "Domain" doesn't carry the "data type" connotations, which is what a "type family" defines. Likewise "vocabulary". "Type family" is still the best name I can think of that isn't too generic (like "data type" or "value type"). I'll think about this some more... Have fun, Oren Ben-Kiki |
From: Oren Ben-K. <or...@ri...> - 2002-07-24 07:38:08
|
Clark C . Evans [mailto:cc...@cl...] wrote: > ... The primary problem that causes heart burn: > > "If it walks like a duck, why isn't it a duck?" First, it still may be a duck. Second, it may not be a duck because it is a wild goose! Chasing it is futile in this case :-) That is: having a POG (Pot Of Gold) file at the end of a type family URL is nice and all, and I think that this approach should be "recommended". At the same time, this is optional: it is not required by the YAML spec, or by most YAML processing systems. Either way, it is certainly beyond the scope of the core YAML spec. I think we agree on the above points. Correct? > There are two "extreme" solutions: > > (a) use only HTTP URL and specify what's at the end > of the rainbow > > (b) use only URN or similar scheme using a reverse > DNS/java like thingy that can't possibly look > like a URL. Right. The thing is, nobody has the faintest idea which is the better way, since the actual use cases only require a document schema rather than anything fancier. Note it is clearly possible to implement (b) on top of (a), but not the other way around. At the same time, it seems clear to me that mission-critical systems will use a variant of (b) (whether or not the key is a URL or URI is besides the point). I think this is somewhere where we should not set down rules right now but explore the possibilities as applications evolve - be guided by demons ratable application need rather than try to predict it. I suggest we do the following: - Leave the spec as it is right now (specifically, forget about using '#' for formats). - Change the following in the spec: --- | 3.1.2 Type Family name A URI used as a globally unique identifier for the type family. YAML does not require that this URI point to anything in particular. However, where possible, it is considered good practice to have the URI point to some human-readable document providing information about the type family. ... # Into: --- | 3.1.2 Type Family name A URI used as a globally unique identifier for the type family. YAML does not require that this URI point to anything in particular. In the common case where a URL name is used, it should point to a document describing the type family. Having a conventional format for such documents may prove useful, or even required, for a certain class of applications. As the existence of a description document and its format must not impact a YAML processor in any way, a definition of such a conventional format is beyond the scope of this specification. ... - Note that I'm explicitly placing schema validation outside the scope of a "YAML processor" as defined in the core YAML spec. One *must* not depend on a schema to be able to process a YAML document (hence the "existence of a description document *must not* impact a YAML processor"). I'm aware that this is very different from XML. I feel rather strongly that this approach is a good thing... It was a main motivation for CommonXML, for example. - Let us continue thinking about this issue completely decoupled from the YAML core spec. Clark formulate his "pot of gold" proposal as an independent spec (a rather short one, I think). We could then create such documents for our core type families. There's nothing like playing with an idea with actual implementations to get some insights about it... - Keep it in mind when we start to tackle schema/ypath issues. There may be a dependency between the schema definition and the POG proposal... and we'll only be able to know how they affect each other when we get around to seriously discussing schemas. - Defer making any decisions about it until such a time when we have considered the implications on schemas and other related issues. Anything we do in the meanwhile (e.g., a hypothetical POG spec written by Clark) should be clearly marked as being "experimental". Would this work for you? Have fun, Oren Ben-Kiki |
From: Clark C . E. <cc...@cl...> - 2002-07-24 12:06:29
|
On Wed, Jul 24, 2002 at 10:39:39AM +0300, Oren Ben-Kiki wrote: | > (a) use only HTTP URL and specify what's at the end | > of the rainbow | > | > (b) use only URN or similar scheme using a reverse | > DNS/java like thingy that can't possibly look | > like a URL. | | Right. The thing is, nobody has the faintest idea which is the better way, | since the actual use cases only require a document schema rather than | anything fancier. | | Note it is clearly possible to implement (b) on top of (a), but not the | other way around. At the same time, it seems clear to me that | mission-critical systems will use a variant of (b) (whether or not the key | is a URL or URI is besides the point). In that case, I was thinking of having all of our short-cuts resolve to "urn:yaml:..." instead of "http://" | I think this is somewhere where we should not set down rules right now but | explore the possibilities as applications evolve - be guided by demons | ratable application need rather than try to predict it. The deamon here is that if people see "http://" examples for our type family, then they will start to do the same. And then we will have lots of ducks out there that don't quack. | --- | | 3.1.2 Type Family | | name | | A URI used as a globally unique identifier for the type family. YAML does | not require that this URI point to anything in particular. However, where | possible, it is considered good practice to have the URI point to some | human-readable document providing information about the type family. Ok. Perhaps also some verbage about URNs being preferred? | --- | | 3.1.2 Type Family | | name | | A URI used as a globally unique identifier for the type family. YAML does | not require that this URI point to anything in particular. In the common | case where a URL name is used, it should point to a document describing the | type family. Having a conventional format for such documents may prove | useful, or even required, for a certain class of applications. As the | existence of a description document and its format must not impact a YAML | processor in any way, a definition of such a conventional format is beyond | the scope of this specification. Ok. See above though. | - Note that I'm explicitly placing schema validation outside the scope of a | "YAML processor" as defined in the core YAML spec. One *must* not depend on | a schema to be able to process a YAML document (hence the "existence of a | description document *must not* impact a YAML processor"). I'm aware that | this is very different from XML. I feel rather strongly that this approach | is a good thing... It was a main motivation for CommonXML, for example. Right. | - Let us continue thinking about this issue completely decoupled from the | YAML core spec. Clark formulate his "pot of gold" proposal as an independent | spec (a rather short one, I think). We could then create such documents for | our core type families. There's nothing like playing with an idea with | actual implementations to get some insights about it... I actually did a bit of playing; its a pain in the arse to maintain all of those files; let alone giving them useful information or the right mime type.... and I'm pretty good at html/apache with my own server. I can imagine someone who wants to use their own types, but is an http/html newbie. | - Keep it in mind when we start to tackle schema/ypath issues. There may be | a dependency between the schema definition and the POG proposal... and we'll | only be able to know how they affect each other when we get around to | seriously discussing schemas. It might be that a schema defines multiple type families; in this case a URN would be preferred. | - Defer making any decisions about it until such a time when we have | considered the implications on schemas and other related issues. Anything we | do in the meanwhile (e.g., a hypothetical POG spec written by Clark) should | be clearly marked as being "experimental". | | Would this work for you? Yes; but I think I'd rather swing in the other direction. Right now our spec preferrs "http://" -- let's make it "urn:yaml:" or something similar. It can't be that hard to register a sub-type for urn: Best, Clark -- Clark C. Evans Axista, Inc. http://www.axista.com 800.926.5525 XCOLLA Collaborative Project Management Software |
From: Oren Ben-K. <or...@ri...> - 2002-07-24 13:24:46
|
Clark C . Evans [mailto:cc...@cl...] wrote: > | - Let us continue thinking about this issue... > | ... There's nothing like playing with > | an idea with > | actual implementations to get some insights about it... > > I actually did a bit of playing; its a pain in the arse to > maintain all of those files; let alone giving them useful > information or the right mime type.... and I'm pretty good > at html/apache with my own server. I can imagine someone > who wants to use their own types, but is an http/html newbie. What happened to the pain of not having a standard place to go to get data about the type family? :-) > | - Defer making any decisions about it... > | ... > | Would this work for you? > > Yes; but I think I'd rather swing in the other direction. Right now > our spec preferrs "http://" -- let's make it "urn:yaml:" or something > similar. It can't be that hard to register a sub-type for urn: We started with something like this (we considered having a top-level scheme 'yaml:'). I just looked up the relevant RFC (2717 and 2611). Neither seems very encouraging, but there are some interesting options for "alternate trees" such as 'yaml-type:...' and we could always use something like 'x-yaml:...'. Doing a "proper" registration it wouldn't be fast and wouldn't be easy. If you are feeling up to it, I suggest you examine these RFCs... In addition, the whole issue of whether people should use 'urn:schema:...' vs. 'scheme:...' seems to be in debate. It doesn't seem as though 'urn:...' has been used much in practice (that I know of). Rather than resolve it, people have just been using 'http://domain.tld/' as a prefix. Which is why the duck problem came up in the first place... If the *W3C* had defined a "urn:xml:domain.tld:whatever" namespace and used that for XHTML, XSLT etc., they would have set the tone and people would have followed. I've no idea why they didn't do it - and perhaps this is one of XML's faults we could fix - but I'd like more data on the subject first. What I'd really much prefer is to somehow be agnostic to the whole thing. It is just the shorthand mechanism that requires we take a stand. We either use a URL or a non-URL URI; if we use a URL we either provide a POG or we don't. And we have to decide "now" (as in this spec). I don't know. I see the sense in using a URN - it would "set the tone" for preferring option (b), and avoid the "duck" question. I'm also wary of going against common practice *using http://). Thoughts? Have fun, Oren Ben-Kiki |
From: Clark C . E. <cc...@cl...> - 2002-07-25 05:00:03
|
On Wed, Jul 24, 2002 at 04:26:04PM +0300, Oren Ben-Kiki wrote: | Clark C . Evans [mailto:cc...@cl...] wrote: | > I actually did a bit of playing; its a pain in the arse to | > maintain all of those files; let alone giving them useful | > information or the right mime type.... and I'm pretty good | > at html/apache with my own server. I can imagine someone | > who wants to use their own types, but is an http/html newbie. | | What happened to the pain of not having a standard place to go to get data | about the type family? :-) A few items to note: - type families arn't quite like XML namespaces in that a given application may have many of them, where the very end of the string changes frequently. This would mean alot of files to manage, yuck. - after some reading with UDDI and trusted systems, etc., the whole mechanism of resolving the correct schema version is probably a complicated chore to get right; simply leaving it to pure http isn't going to cut it beyond simple documentation. - finding documentation for a type family probably won't be that hard, if you are using a given type family chances are you know where you got it from, thus for simple documentation it is probably not worth the effort. - RDDL hasn't exactly "taken off" in the XML world, instead the only thing related that seems to be gaining traction is this UDDI thingy. So, given these considerations after a day or so of thinking and playing around; I respectfully withdraw my proposal. ... However, this leaves the duck issue unresolved. Given that I now think that ducks (http resources) are probably not a spectacular idea; we probably don't want anything looking like ducks (http uris) to be meanding around our spec confusing newbies. Therefore, I now propose: - We find a URN mechanism that uses DNS somehow to gaurentee uniqueness, but doesn't imply any sort of access mechanism or protocol. Thus, it is a "pure identifier". - If one isn't obvious, then we should propose a "dns" URN to the ITEF which accomplishes this, for example, urn:dns:clarkevans.com/whatever This can use HTTP semantics (including #fragments) for everything following the "urn:dns:" so that if someone does want to try and use HTTP, they are more than welcome to try... - We update our spec to replace "http://" with "urn:dns:" or its equivalent everwhere. Thus, "yaml.org/int" is equivalent to "urn:dns:yaml.org/int" - We veryify that a type family is a string identifier which uses the URI syntax (wording is important that the type family isn't a URI, but has the URI syntax). This wording change is something which a few old XML heads which they had in the XML namespace spec... - We recommend that only URNs, and in particular our chosen scheme, "urn:dns:" or its equivalent is the default. We say that if a URL is used, the item should be dereferncable by the general public. - Private types "!!x" should be mapped to "urn:dns:yaml.org/private/x" or its equivalent so that it can be compared (and used in a ypath) Ok. I think this would fix things up. It leaves a pretty big task (namely the second one) if something like "urn:dns" doesn't already exist. | We started with something like this (we considered having a top-level scheme | 'yaml:'). I just looked up the relevant RFC (2717 and 2611). Neither seems | very encouraging, but there are some interesting options for "alternate | trees" such as 'yaml-type:...' and we could always use something like | 'x-yaml:...'. Doing a "proper" registration it wouldn't be fast and wouldn't | be easy. If you are feeling up to it, I suggest you examine these RFCs... Right. I think it will be easier to get this to fit under the "urn:" top level scheme as a "dns" sub-scheme. I'd guess that top-level schemes are hard, but this second level scheme should be passable. | What I'd really much prefer is to somehow be agnostic to the whole thing. It | is just the shorthand mechanism that requires we take a stand. We either use | a URL or a non-URL URI; if we use a URL we either provide a POG or we don't. | And we have to decide "now" (as in this spec). Exactly. Best, Clark -- Clark C. Evans Axista, Inc. http://www.axista.com 800.926.5525 XCOLLA Collaborative Project Management Software |
From: Clark C . E. <cc...@cl...> - 2002-07-26 03:10:17
|
On Wed, Jul 24, 2002 at 04:26:04PM +0300, Oren Ben-Kiki wrote: | We started with something like this (we considered having a top-level scheme | 'yaml:'). I just looked up the relevant RFC (2717 and 2611). Neither seems | very encouraging, but there are some interesting options for "alternate | trees" such as 'yaml-type:...' and we could always use something like | 'x-yaml:...'. Doing a "proper" registration it wouldn't be fast and wouldn't | be easy. If you are feeling up to it, I suggest you examine these RFCs... URLs come with semantic expectations: - identify resources via a representation of their primary access mechanism (e.g., their network "location"), rather than identifying the resource by name or by some other attribute(s) - often named after protocols, but this does not imply that the only way to access the URL's resource is via the named protocol the rfc mentiones proxies and caching as examples - URLs allow the most varied use of the syntax and often have a hierarchical namespace - URLs allow for relative identifiers by describing the difference within a hierarchical namespace between the current context and an absolute identifier of the resource. URNs come with a different set of expectations: - primary purpose is persistent labeling of a resource with an identifier; not through a primary access mechanism - refers to URIs which are required to remain globally unique and persistent even when the resource ceases to exist or becomes available - Uniform Resource Names (URNs) are intended to serve as persistent, location-independent, resource identifiers. Thus, it's clear to me that, if there is a distinction between these two "kinds" of URIs, what we want for our type families is a URN and not a URL. A type family is quite abstract and may not be available at all times; also documentation on the type family isn't the type family. So. Now comes down to how this impacts YAML: - I think while we may allow for URLs (most URIs are URLs) we should strongly discourage them and provide URNs in most of our examples. - We should formalize a IETF URI scheme; either as a "urn" namespace or as a URN. If we do this; we may have to have a very quick "revisit" of our mapping rules. The only reason for the review is to verify that we are valid within the scope of a urn. - The URN characters forbid a few more chars, ? / # primarly so that they don't look like URLs. Unfortunately, the urn syntax (rfc2141) is more restrictive than the general uri syntax (rfc2396); and in particular exclude & <> [] ^ ` {} | ~ -- this is a problem - To be distinct from URLs and to be somewhat consistent with Java and python packages, perhaps we may want to use reverse DNS. | In addition, the whole issue of whether people should use 'urn:schema:...' | vs. 'scheme:...' seems to be in debate. It doesn't seem as though 'urn:...' | has been used much in practice (that I know of). Rather than resolve it, | people have just been using 'http://domain.tld/' as a prefix. After posting on the IETF list some, I think a URN is out. There is no way, for example, that Java or Python package names are going to meet their strict requirements for non-repudiation, etc. | Which is why the duck problem came up in the first place... If the *W3C* had | defined a "urn:xml:domain.tld:whatever" namespace and used that for XHTML, | XSLT etc., they would have set the tone and people would have followed. I've | no idea why they didn't do it - and perhaps this is one of XML's faults we | could fix - but I'd like more data on the subject first. I like our first pass approach of a <lang,identifier> pair "pkg" uri. This would probably be pallatable. For now we could do it as x-pkg till the whole thing went through. However, our need isn't unique to us. Alot of people feel the same pain. | What I'd really much prefer is to somehow be agnostic to the whole thing. It | is just the shorthand mechanism that requires we take a stand. We either use | a URL or a non-URL URI; if we use a URL we either provide a POG or we don't. | And we have to decide "now" (as in this spec). Yep. | I don't know. I see the sense in using a URN - it would "set the tone" for | preferring option (b), and avoid the "duck" question. I'm also wary of going | against common practice *using http://). Thoughts? After sitting on the XML-DEV, and after reading the URI rfc's I'm convinced that Tim Berners-Lee did us a huge dis-service by using http URIs for XML namespaces. It's just *wrong*, being able to fetch documentation or not. It's not clean.. and it sucks. That said, we arn't going to fix xml namespaces. We should focus on exactly what we need, a <language, class-identifier> pair. I'm sure it'll pass. It just has to be written correctly. Best, Clark -- Clark C. Evans Axista, Inc. http://www.axista.com 800.926.5525 XCOLLA Collaborative Project Management Software |
From: Oren Ben-K. <or...@ri...> - 2002-07-25 07:29:41
|
Clark C . Evans [mailto:cc...@cl...] wrote: > A few items to note: > > - type families arn't quite like XML namespaces in that > a given application may have many of them, where the > very end of the string changes frequently. This would > mean alot of files to manage, yuck. Right... But see below. > - after some reading with UDDI and trusted systems, etc., > the whole mechanism of resolving the correct schema version > is probably a complicated chore to get right; simply > leaving it to pure http isn't going to cut it beyond > simple documentation. +10 on this. > - finding documentation for a type family probably won't > be that hard, if you are using a given type family > chances are you know where you got it from, thus for > simple documentation it is probably not worth the effort. I'm not certain here. It certainly would be nice to be able to paste the type family into a browser and get a single-page HTML document saying something about it, even if it is only "this is part of the XYZZY schema which you can find more about _here_". Not that this is a major convenience - but it *is* very nice. > - RDDL hasn't exactly "taken off" in the XML world, instead > the only thing related that seems to be gaining traction > is this UDDI thingy. UDDI tries to solve a more-or-less well-defined problem (in the context of web services), which is related to, but different than our type family issue. All we need to worry about is that type families would "play nice" in a UDDI-like setting (ideally, with UDDI itself). We don't need to come up with our own. We all agree on this (now :-), I think. > So, given these considerations after a day or so of thinking > and playing around; I respectfully withdraw my proposal. OK. > ... > > However, this leaves the duck issue unresolved. Given that > I now think that ducks (http resources) are probably not a > spectacular idea; we probably don't want anything looking > like ducks (http uris) to be meanding around our spec > confusing newbies. Therefore, I now propose: Before responding to the details - I think that simply keeping things as they are today *is* a viable option. The spec recommends a human-readable document describing the type family in case the type family is a URL. The exact wording is "it is considered good practice". Which it is. One could just generate a small standard "stub" document for each type family that contains the name of the type and a link to a single overall schema description. Or even alias all the type family URLs to a single document describing the whole schema (or linking to such a description). Either way greatly reduces the pain of maintaining many files. I bet that using some trivial Perl code one could whip out a script that takes a file like: base-url: ... base-output-dir: ... template-file: ... real-documentation-url: ... type-families: - id-relative-to-base-url: ... fragment-in-real-documentation: ... other-param-for-use-in-template: ... - id-relative-to-base-url: ... fragment-in-real-documentation: ... other-param-for-use-in-template: ... ... And generate the whole set of files in a single call to the script. Perhaps we should offer something like that as a utility. This would make things convenient to a newbie. We could set the tone by doing something like this to our types - the "stub" approach isn't much of a problem (hey - a thought - our stubs could point directly at the YAML spec - with the fragment identifier pointing to the type family definition. NEAT!). Advantages of this approach: - No changes to the spec! :-) - "Everyone does it" (using http namespaces). - No need to register a new urn: sub-domain or whatever. - A duck would, in most cases, *be* a duck. If it isn't, the type family author isn't using "good practice". - Newbie friendly. Paste the URL to the browser, get some human-readable document. If you don't, the guy who write the type family is a jerk. Are you sure you want to use his types? I wouldn't count this option out yet. At any rate, back to your new proposal: > - We find a URN mechanism that uses DNS somehow to > gaurentee uniqueness, but doesn't imply any sort > of access mechanism or protocol. Thus, it is a > "pure identifier". > > - If one isn't obvious, then we should propose a "dns" > URN to the ITEF which accomplishes this, for example, > urn:dns:clarkevans.com/whatever OK. > - We update our spec to replace "http://" with "urn:dns:" > or its equivalent everwhere. Thus, "yaml.org/int" > is equivalent to "urn:dns:yaml.org/int" Right. > - We veryify that a type family is a string identifier > which uses the URI syntax (wording is important that > the type family isn't a URI, but has the URI syntax). > This wording change is something which a few old > XML heads which they had in the XML namespace spec... You'll have to expand on that I'm afraid. I have noticed some wording in the RFC about not every URI-syntax string actually being a URI. Is the idea that URIs are only "valid URIs" and strings in URI syntax that don't "point" to an "existing" resource aren't "really" URIs? That is, a urn:isbn:.... isn't "really" a URI unless there's a book with that ISBN number? That's pretty bizzare IMVHO. At any rate, we'll use whatever wording required to avoid such traps. If the above works, fine. BTW, this point also seems to apply to the current spec, whether we use urn:dns:... or http:// or anything else. > - We recommend that only URNs, and in particular our > chosen scheme, "urn:dns:" or its equivalent is the > default. We say that if a URL is used, the item > should be dereferncable by the general public. The approach only seems consistent if we were to *require* namespaces to be in this URN. There's no problem of people having to register etc. because it is all implicit through their DNS registration. What's the down side of this? > - Private types "!!x" should be mapped to > "urn:dns:yaml.org/private/x" or its equivalent > so that it can be compared (and used in a ypath) Hmmm. I don't see why. A type family is just a string, whether or not it is a URI. If it starts with a '!' it is private type and is *not* a URI; otherwise it is a URI. That's a pretty reasonable implementation strategy, I think. If someone must force all his type families to be URIs he can use whatever tricks he wants. The best one IMVHO (that applies to both your proposal and the current spec) is to say it is the URI 'x-private:<type-without-leading-!>'. That's a perfect use of the 'x-<schema>' concept - a private scheme requiring no registration with dubious, shifting semantics. > Ok. I think this would fix things up. It leaves a pretty > big task (namely the second one) if something like "urn:dns" > doesn't already exist. It doesn't that I know of. Comparing the two options - the current one and your proposal - I tend to think the current oen is better (due to the advantages I listed above). Thoughs? Have fun, Oren Ben-Kiki |
From: Clark C . E. <cc...@cl...> - 2002-07-25 14:42:38
|
On Thu, Jul 25, 2002 at 10:30:58AM +0300, Oren Ben-Kiki wrote: | This would make things convenient to a newbie. We could set the tone by | doing something like this to our types - the "stub" approach isn't much of a | problem (hey - a thought - our stubs could point directly at the YAML spec - | with the fragment identifier pointing to the type family definition. NEAT!). That's interesting. | Advantages of this approach: | | - No changes to the spec! :-) | - "Everyone does it" (using http namespaces). | - No need to register a new urn: sub-domain or whatever. | - A duck would, in most cases, *be* a duck. If it isn't, the type family | author isn't using "good practice". | - Newbie friendly. Paste the URL to the browser, get some human-readable | document. If you don't, the guy who write the type family is a jerk. Are you | sure you want to use his types? Ok. So you'd prefer having the statement that if you are going to use "http", there should be an index or something human readable at the end of the rainbow. | > - We find a URN mechanism that uses DNS somehow to | > gaurentee uniqueness, but doesn't imply any sort | > of access mechanism or protocol. Thus, it is a | > "pure identifier". | > | > - If one isn't obvious, then we should propose a "dns" | > URN to the ITEF which accomplishes this, for example, | > urn:dns:clarkevans.com/whatever | | OK. | | > - We update our spec to replace "http://" with "urn:dns:" | > or its equivalent everwhere. Thus, "yaml.org/int" | > is equivalent to "urn:dns:yaml.org/int" | | Right. | | > - We veryify that a type family is a string identifier | > which uses the URI syntax (wording is important that | > the type family isn't a URI, but has the URI syntax). | > This wording change is something which a few old | > XML heads which they had in the XML namespace spec... | | You'll have to expand on that I'm afraid. If our type family is going to have all of the semantics of the URI used, then we can leave it as is; but by being explicit that usage of a URI implies all of the semantics of the URI. | > - We recommend that only URNs, and in particular our | > chosen scheme, "urn:dns:" or its equivalent is the | > default. We say that if a URL is used, the item | > should be dereferncable by the general public. | | The approach only seems consistent if we were to *require* namespaces to be | in this URN. There's no problem of people having to register etc. because it | is all implicit through their DNS registration. What's the down side of | this? We can "recommend" their usage, but leave the door open for other URIs down the road. Other than the initial work with the ITEF, there isn't any registration since its based on DNS. | > - Private types "!!x" should be mapped to | > "urn:dns:yaml.org/private/x" or its equivalent | > so that it can be compared (and used in a ypath) | | Hmmm. I don't see why. A type family is just a string, whether or not it is | a URI. If it starts with a '!' it is private type and is *not* a URI; | otherwise it is a URI. That's a pretty reasonable implementation strategy, I | think. I'd rather have every type family be a URI, treating a type family as a string in one context and as a URI in another context is awfuly complicated. A simple mapping to yaml.org/private works well. This keeps things simple. IMHO, I don't see the use case for private types at all. None of the argumens thus far have been useful for me. | It doesn't that I know of. Comparing the two options - the current one and | your proposal - I tend to think the current oen is better (due to the | advantages I listed above). Thoughs? Yes; but the current situation has the duck problem and it is confusing. Is a type family a string or a URI? The current situation has both of the flaws XML has. Ick. Given that it's probably a bad idea to mandate "http" everwhere, then let's just use a URN and for those who want to provide an http accessable resource, they can use http URL. Our examples should use a URN to set precident, however. Most people will just start to use types without giving them a http resolution, and this is a problem. Using URNs throughout our spec and tutorials will alleviate this pain. Clark -- Clark C. Evans Axista, Inc. http://www.axista.com 800.926.5525 XCOLLA Collaborative Project Management Software |
From: Steve H. <sh...@zi...> - 2002-07-25 14:53:38
|
> > IMHO, I don't see the use case for private types at all. > None of the arguments thus far have been useful for me. > Private types are my way of saying that I don't have the energy or even need to register a public type: --- taxdata: !!taxdata salary: ~ dependents: 0 states: [WA, SC, VA] I don't want the whole world to know about my taxdata--all I want to do is mark the data as being "taxdata," and I want the implementation to prefer my "taxdata" implementation to any public types. I'm not crazy about the names "public" and "private." I am more interesting in distinguishing "global" and "local." Or, even "registered" and "unregistered." The use case for unregistered types is simple--humans are lazy. Also, if I'm never sharing a document, but I'm just using YAML as a serialization mechanism, then it also makes sense to use "unregistered" types. |
From: Clark C . E. <cc...@cl...> - 2002-07-25 16:12:20
|
On Thu, Jul 25, 2002 at 10:53:12AM -0400, Steve Howell wrote: | > IMHO, I don't see the use case for private types at all. | > None of the arguments thus far have been useful for me. | | Private types are my way of saying that I don't have the energy | or even need to register a public type: Ok. You want it to be "lazy". That's cool; the syntax is much shorter !!taxdata. Ok. Here's a compromise, this is mapped to "x-private:taxdata". The advantage is that our type families are all URIs. Best, Clark -- Clark C. Evans Axista, Inc. http://www.axista.com 800.926.5525 XCOLLA Collaborative Project Management Software |
From: Clark C . E. <cc...@cl...> - 2002-07-22 22:35:48
|
This is a great post Oren, I'd like to hear what others think. On Mon, Jul 22, 2002 at 11:18:59PM +0300, Oren Ben-Kiki wrote: | - HTTP only? | | Isn't this a bit restrictive? There *are* other protocols one can use to | fetch web documents (e.g., ftp). And others might be added later on... | What's wrong with keeping our shortcut notation, and merely limiting the | URIs to URLs? HTTP is 99% of the use cases and if this is the only protocol, it makes it easy to support. We can always extend this later to include other URI schema. | - Optional end of rainbow => URI allowed? | | Clark mentioned making it optional (an obvious necessity for private types). | If it is optional, what is the problem with using URIs, exactly? If one | chooses to use an 'isbn:...' type family (Ugh), he's merely opted out of | ever supplying a pot of gold at the end of the rainbow, forever. Which he is | allowed to do anyway... Extra flexibility that isn't really needed. Once again, we could fix YAML later on down the stream by opening these gates if someone gives us a good reason to do so; till then, why not be restrictive? The only thing that this flexibility seems to give XML is indigestion and confusion; debates over if a namespace is an identifier, location, both, etc. These debates chew up an awful lot of bandwith and don't produce value. | - XML namespaces | | Would this mean giving up on using XML namespaces? That would make | converting XML schemas to YAML that much harder. Now, we had this idea of | *constructing* YAML type family names from the pair {namespace-URI, | local-name}. If we could do that in a reasonable way - say, | namespace-URI/local-name or whatever - we could probably find a way to | preserve this "dual-personality" schema/namespace option for implementers. | It may be vital for YAML gaining acceptance in the world. I don't think this is a use case. If you are using XML for its information model (mixed content, etc.) then you won't really want to use YAML. The models are _so_ different that any generic embedding of XML into YAML will be butt ugly. I _thought_ that it was a use case, but after digging into the subject further, it's quite clear to me that it really isn't. Anyway, 95% of the XML namespaces out there are HTTP, so this isn't an issue. | - Fragments. | | Clark has ruled them out, which is sensible. ... snip stuff about mapping XML to YAML... | anyway, assuming all this, can we view 'format' as a 'fragment'? | | int: !int#dec 7 # http://type.yaml.org/int#dec I don't see why not. | - Minimal key set | | What's the minimum one would have to put in the "pot of gold"? I mean, if it | is there at all. An empty document would presumably be legal (after all, | this "pot of gold" *is* optional). But what must a non-empty document | contain? *Should* there be a minimum set of keys, or just "recommended | keys"? I guess a title and summary would be good things to have, otherwise there isn't a point of even having it. | - Relationship with RDDL and other meta-data standards... | | Probably someone should set up some way they can be simply cut&pasted into | this scheme, using appropriate top-level key(s) and structure. We should at | least check that this is feasible/reasonable... By allowing keys with a period to belong to domain holders, each meta-data requirement can be done in its own bucket without prior planning on our part. As far as RDDL; let's just stick with something simple and let it grow from there. | - Requirement from a YAML processor. | | I think that accessing the "pot of gold" should be *very* optional, and it | should be *crystal clear* it is perfectly possible to handle YAML without | ever thinking about it. It may not, however, be possible for some YAML processes (such as a validator) to proceed without access to the resource directory. I'd not like it to be too optional; it's ok if nothing is there, but if *something* is there, it should (must?) be a yaml catalog. | Further, I think it should also be made very clear that it is *very* | unrealistic to expect, in the general case, that anything other than, say, | schema validation, would be possible to achieve for a type family that is | solely "known" by its "pot of gold". That is, any expectation that YAML | application will magically understand "the semantics" of a type through its | "pot of gold" is, to say it kindly, naive. The goal of the resource index is to provide for information related to a particular node; what that information is or how it could be used is just not specified, let alone asserted. | Of course people could attach code in any interpreted language - scratch | that, in *any* language (Windows X86 DLLs included - Ugh) - to the "pot of | gold", thereby allowing dynamic loading of type family semantics. I find it | to be more of a scary thought than a comforting one :-) If someone wants to download "signed" code modules that can be used to "visualize" the data or some other process, so be it; to each his/her own. ;) | - Relationship with the schema mechanism. | | Having a "pot of gold" document accessible via the type family immediatly | suggests that this should be "the way" the schema would be fetched. That's | nice and all, but... is it practical to chop the schema into multiple | physical documents this way (one per type family used)? Putting aside | efficiency issues, what about problems like version control and being easy | to read/write? Good question. One approach is to only use the type family for root nodes or for data islands. For version information, perhaps if we allowed collection nodes to have a "format", aka "version". | Keep in mind that a collection type family will constrain its contained | sub-nodes to the n-th degree regardless of their type - or, I should say, | *in addition* to the generic restrictions specified by their type. Right. And I think that this is a huge discussion in and of itself that we should probably start in on and will probably rage for a year or more... I think we all have less experience in this domain and it will take some time before we even know what all of the issues are. | - The risk factor. | | There is a giant leap between "type families are unique IDs with | human-readable definition" and this proposal. And unlike everything else | we've done with YAML, this would be exploring into new lands because nobody | I know of has done anything like it. Right. The closest thing is RDDL which hasn't been adopted beyond a few small groups. There is often a big debate over systems like this over two points of failure: - Inheritently "centralized" approach, he who owns the domain defines the resource file. This is all good, but just beacuse we have one centralized approach, doesn't stop other registry like mechanisms from emerging. - Problems with efficency/cashing, either servers get beat up badly or caches become stale. This is true, those who don't use "expires" header in HTTP are bound to have pains. However, I think that this isn't an architectural problem as much as it is an educational one. | Here we are speculating about what may or may not prove useful to | application developers, and I for one do not have the personal experience in | such dynamic-loading extendible-from-the-web-yet-strongly-typed systems to | say whether this make sense or not. From someone who used XML "extensively" for a spell, not having a standard directory mechanism for items relating to a given vocabulary was one of the sore points that I felt (hence by involvement with RDDL). It's nice not to have to hunt-down a schema or to be able to click on a family name and retrieve a human-readable description of what the type is all about. | Actually, I have a great deal of experience (as do we all) in one such | system, the HTML browser. And it is a terrible mess, a failure of standards | to achieve anything like a sane system - the most we can learn from it is | what *not* to do. The biggest problem with the web is that it grew too fast. ;) | It may make more sense to use DNS-like mechanism. Or just make direct use of | DNS. Or LDAP. Or WebDav. Or something. It may make more sense to have each | top-level key reside in its own physical document. I have no idea, because I | don't have a good grasp of the use case. Speaking of which... HTTP is by far the best supported protocol out there... | - The use case? | | What *is* the use case (other than being able to answer the newbie about | "what does a type family point to")? What is the class of applications that | want to be schema-aware but not schema-specific? If the answer is | "validating parsers and authoring tools" than I think that this proposal is | a serious overkill. A simple schema language would do the trick for both. You hit most of them. But I think the most important reason is to solve these in a manner which allows for other information about a type family to be provided by its 'owner'. | Is it something like "web services"? I have strong doubts about whether | something like this proposal is actually useful for such services (given a | schema language exists). Services require a much stronger knowledge of | "semantics" than would be offered by the "pot of gold". IMVVHO, that is - | since nobody ever saw "web services" actually working as hyped, that's all | anybody has to offer, I'm afraid. On the other hand, using "point-to-point" | or "client-to-server" schema-specific XML-RPC/SOAP/etc. *is* working in | practice. Again this only requires a schema language (if that). It isn't that grand. | - Effects on the spec? | | If we agree the "pot of gold" is optional, and if we make it easy to look at | a URL and say whether it is a "pot of gold" or not (simplest way: give it a | distinctive mime type), is there really any reason to change the spec? It | seems to me we can safely define this whole thing in a separate spec - "A | convention for using YAML type families as URLs for fetching meta-data". We | can start by giving some meta-data for our type core families as *an | example*. This is true. It need not be in the specification proper, a link to it from the spec would probably be a good idea. | If people like it and build on it - great. If it is useless for 99% of the | people in the world (my suspicion at this point - feel free to set me | right), no great loss, either. We'd have merely over-formalized a bit how we | define type families. | | Minor changes to the spec may still result (specifically, handling of | fragments and formats - and mentioning that there *is* an *optional* | convention for meta data planned/available at a separate spec). I would be | more than happy to discuss them under such an approach. Ok. | - Effects on time table? | | I suspect it will take ages to settle the issues this proposal raises. I'm | less than enthused at the thought of wording such a chunk of functionality | into our core spec. From the narrow point of view of "let's get a spec out | the door", this proposal seems to be a serious problem. | | I could be wrong here - especially if it is worded as something optional, | and would be rather loosly defined. By still, at this point, my vote is to | otherwise steer away from this whole thing in the YAML 1.0 *CORE* spec. | Let's create a separate YAML 1.0 *META* spec for this instead. Our current | spec is big enough as it is anyway... Ok. But I think for now, I'd like to restrict the type family to be "http", its 95% or more of the use cases and we can always be more flexible later if required. Best, Clark -- Clark C. Evans Axista, Inc. http://www.axista.com 800.926.5525 XCOLLA Collaborative Project Management Software |
From: Clark C . E. <cc...@cl...> - 2002-07-22 22:37:22
|
This is a bit unrelated, but I was thinking that the whole name "type family" still sucks. Type is too generic, but type family just doesn't quite make it for me. I was thinking... how about calling this bugger a "vocabulary"? Best, Clark |
From: Clark C . E. <cc...@cl...> - 2002-07-23 16:28:01
|
On Mon, Jul 22, 2002 at 11:18:59PM +0300, Oren Ben-Kiki wrote: | - HTTP only? | | Isn't this a bit restrictive? There *are* other protocols one can use to | fetch web documents (e.g., ftp). And others might be added later on... | What's wrong with keeping our shortcut notation, and merely limiting the | URIs to URLs? | | - Optional end of rainbow => URI allowed? | | Clark mentioned making it optional (an obvious necessity for private types). | If it is optional, what is the problem with using URIs, exactly? If one | chooses to use an 'isbn:...' type family (Ugh), he's merely opted out of | ever supplying a pot of gold at the end of the rainbow, forever. Which he is | allowed to do anyway... First, I was talking with Steve about this one some; how about amending the spec to say: - All URIs must be from the HTTP scheme - The parser should support other scheme so that migration to a less restrictive mechanism is possible. And then we wait for use cases that would require something other than the http scheme. ... Second, I was thining about private types; they are in general rather useless since you can't even match on them; !!x in a query language isn't the same as !!x in a document. Thus, I'm questioning if we even need private types? ... Third, Steve says that _why has referred to "type family" as a domain; this actually sounds like a better word than type family. Thoughts? Best, Clark |
From: Donnal W. <don...@ya...> - 2002-07-23 17:02:56
|
[Clark C . Evans]: > Second, I was thining about private types; they are in general > rather useless since you can't even match on them; !!x in a > query language isn't the same as !!x in a document. Thus, I'm > questioning if we even need private types? Not sure I entirely understand the issues here, but I intend to use the !!x notation extensively in my data documents, but the private definitions will be in Python code, so perhaps this discussion does not apply to my situation. > Third, Steve says that _why has referred to "type family" as a > domain; this actually sounds like a better word than type family. > Thoughts? Yes, IMO "domain" is preferrable to "type family". __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com |
From: why t. l. s. <yam...@wh...> - 2002-07-23 19:46:17
|
Clark C . Evans (cc...@cl...) wrote: > Second, I was thining about private types; they are in general > rather useless since you can't even match on them; !!x in a > query language isn't the same as !!x in a document. Thus, I'm > questioning if we even need private types? I didn't think private types were necessary until you suggested the new domain type standards. Now, I see a place for private types. You have described domain types as very well-defined structures. For YOD, I plan on using domain types to describe the schema for my document structure. I also find them useful because they map directly to my YOD classes. So for serialization and self-documentation, domain types make sense. But I see people-- and I'm thinking of Ruby YAML users in particular-- wanting to serialize classes quickly. Classes that may change in structure frequently. My options for them are to have them use the http://ruby.yaml.org/Object type. Or we could have them use private types. So if I have a ruby class called 'Comedian', which has a first name property and a last name property, YAML4R could swing either way: # Example One: Domain type --- #YAML:1.0 !ruby/Object:Comedian first name: Bob last name: Odenkirk # Example Two: Private type --- #YAML:1.0 !!Comedian first name: David last name: Cross First of all, I think the syntax for the second is cleaner. Now, let's consider how this is unserialized by Ruby. This is where I see the huge gain with private types. This is this meat of the discussion I think. With Example One (the domain type), the parser will attempt to construct a new Object of the 'Comedian' class. If this class is not available, the parser will be forced to throw an error, as the class is not defined anywhere. I can see cases where this is advantages. If the class isn't found, then neither are the methods of the class and everything that makes classes so wonderful. On the other hand, when Example Two is parsed, the parser will create a new class 'Comedian' descended from YAML4R::PrivateType. This class will be an object without methods or proper inheritance, a generic class with the structure supplied by the hash. Now, this may not seem useful, but think of how useful this could be when passing objects between languages which have no understanding of each other's classes. I could pass the Comedian class to Perl. Perl could correct the actor's name or something and return the Comedian class back to me with changes. To Perl it's a private type, so it's structure needs to stay intact. To Ruby it's the Comedian class, methods and all. I think this is tremendously cool and we should work to keep it. > Third, Steve says that _why has referred to "type family" as a domain; > this actually sounds like a better word than type family. Thoughts? I thought I got that from the spec. Agg! I'm reading the spec so much that it's saying things to me that it's not saying to the rest of you! _why |
From: Steve H. <sh...@zi...> - 2002-07-23 20:04:32
|
---- Original Message ----- From: "why the lucky stiff" <yam...@wh... > > But I see people-- and I'm thinking of Ruby YAML users in particular-- > wanting to serialize classes quickly. Classes that may change in > structure frequently. My options for them are to have them use the > http://ruby.yaml.org/Object type. Or we could have them use private > types. > > So if I have a ruby class called 'Comedian', which has a first name property > and a last name property, YAML4R could swing either way: > > # Example One: Domain type > --- #YAML:1.0 !ruby/Object:Comedian > first name: Bob > last name: Odenkirk > > # Example Two: Private type > --- #YAML:1.0 !!Comedian > first name: David > last name: Cross > > First of all, I think the syntax for the second is cleaner. Now, let's > consider how this is unserialized by Ruby. This is where I see the > huge gain with private types. This is this meat of the discussion I think. > > With Example One (the domain type), the parser will attempt to construct > a new Object of the 'Comedian' class. If this class is not available, > the parser will be forced to throw an error, as the class is not defined > anywhere. I can see cases where this is advantages. If the class isn't > found, then neither are the methods of the class and everything that makes > classes so wonderful. > > On the other hand, when Example Two is parsed, the parser will create > a new class 'Comedian' descended from YAML4R::PrivateType. This class > will be an object without methods or proper inheritance, a generic > class with the structure supplied by the hash. Now, this may not seem > useful, but think of how useful this could be when passing objects > between languages which have no understanding of each other's classes. > > I could pass the Comedian class to Perl. Perl could correct the actor's > name or something and return the Comedian class back to me with changes. > To Perl it's a private type, so it's structure needs to stay intact. To > Ruby it's the Comedian class, methods and all. > > I think this is tremendously cool and we should work to keep it. > I like the lightweight syntax of private types. The current Python implementation has an extremely naive approach to types. If Python's loader sees a structure that's preceded by a bang specifier (either private or public), it hands it off to a user-provided object that supports a resolveType method. The user can then convert that into an object as needed. Here is an example usage: def testDomainType(self): class MyYamlConfig: def resolveType(self, data, url): if url == '!!name': return 'Foo ' + data elif url == '!!coords': return { 'x': data[0], 'y': data[1]} else: raise 'url not passed in correctly' data = YamlTest.loadHere(""" name: !!name Barson coords: !!coords - 10 - 20 """, MyYamlConfig()) self.assertEquals(data[0], {'name': 'Foo Barson', 'coords': { 'x': 10, 'y': 20 } } ) A more sophisticated application would dispatch its private types with something a little more elegant than an "if" statement, but the basic idea is that application developers will often want control over their types. I am seeing the type conversion as being somewhat orthogonal to the main Python loader. A more web-savvy application might use a smarter type-resolver plugin that understands HTTP and actually goes looking for Python code on the web to convert the raw YAML data structures into real Python objects. I don't think that's too far fetched a use case. But the more common use case, I think, is that folks are gonna have YAML documents with a few private types, and they're just gonna write some throwaway code to deal with those types. |
From: why t. l. s. <yam...@wh...> - 2002-07-23 20:17:09
|
Steve Howell (sh...@zi...) wrote: > ---- Original Message ----- > From: "why the lucky stiff" <yam...@wh... > > I could pass the Comedian class to Perl. Perl could correct the actor's > > name or something and return the Comedian class back to me with changes. > > To Perl it's a private type, so it's structure needs to stay intact. To > > Ruby it's the Comedian class, methods and all. > > I like the lightweight syntax of private types. The current Python > implementation has an extremely naive approach to types. If Python's loader > sees a structure that's preceded by a bang specifier (either private or public), > it hands it off to a user-provided object that supports a resolveType method. > The user can then convert that into an object as needed. Here is an example > usage: > > def testDomainType(self): > class MyYamlConfig: > def resolveType(self, data, url): > if url == '!!name': > return 'Foo ' + data > elif url == '!!coords': > return { 'x': data[0], 'y': data[1]} > else: > raise 'url not passed in correctly' > data = YamlTest.loadHere(""" > name: !!name Barson > coords: !!coords > - 10 > - 20 > """, > MyYamlConfig()) > self.assertEquals(data[0], > {'name': 'Foo Barson', > 'coords': { 'x': 10, 'y': 20 } } > ) So we agree on private types, but I think we need to work out the details, because our implementations differ severely. You pass the following... coords: !!coords - 10 - 20 ...into PyYaml. Round trip it and you get back out... coords: x: 10 y: 20 But I'd like to see private types stay intact. The same care should be given to domain types. XML people would want to see their RSS go in one way and come out another! I think you can implement the resolveType in such a way that it doesn't loose it's transfer method. I'm totally with you on everything else you brought up. _why |
From: why t. l. s. <yam...@wh...> - 2002-07-23 20:26:43
|
why the lucky stiff (yam...@wh...) wrote: > But I'd like to see private types stay intact. The same care should be given to > domain types. XML people would want to see their RSS go in one way and come out > another! Err.. correction. They "wouldn't want to see their RSS.." I write like egg yolk. _why |