This is a great post Oren, I'd like to hear what others think.
On Mon, Jul 22, 2002 at 11:18:59PM +0300, Oren Ben-Kiki wrote:
| - HTTP only?
|
| Isn't this a bit restrictive? There *are* other protocols one can use to
| fetch web documents (e.g., ftp). And others might be added later on...
| What's wrong with keeping our shortcut notation, and merely limiting the
| URIs to URLs?
HTTP is 99% of the use cases and if this is the only
protocol, it makes it easy to support. We can always
extend this later to include other URI schema.
| - Optional end of rainbow => URI allowed?
|
| Clark mentioned making it optional (an obvious necessity for private types).
| If it is optional, what is the problem with using URIs, exactly? If one
| chooses to use an 'isbn:...' type family (Ugh), he's merely opted out of
| ever supplying a pot of gold at the end of the rainbow, forever. Which he is
| allowed to do anyway...
Extra flexibility that isn't really needed. Once again, we
could fix YAML later on down the stream by opening these gates
if someone gives us a good reason to do so; till then, why
not be restrictive? The only thing that this flexibility seems
to give XML is indigestion and confusion; debates over if a
namespace is an identifier, location, both, etc. These debates
chew up an awful lot of bandwith and don't produce value.
| - XML namespaces
|
| Would this mean giving up on using XML namespaces? That would make
| converting XML schemas to YAML that much harder. Now, we had this idea of
| *constructing* YAML type family names from the pair {namespace-URI,
| local-name}. If we could do that in a reasonable way - say,
| namespace-URI/local-name or whatever - we could probably find a way to
| preserve this "dual-personality" schema/namespace option for implementers.
| It may be vital for YAML gaining acceptance in the world.
I don't think this is a use case. If you are using XML for its
information model (mixed content, etc.) then you won't really want
to use YAML. The models are _so_ different that any generic
embedding of XML into YAML will be butt ugly. I _thought_ that
it was a use case, but after digging into the subject further, it's
quite clear to me that it really isn't. Anyway, 95% of the XML
namespaces out there are HTTP, so this isn't an issue.
| - Fragments.
|
| Clark has ruled them out, which is sensible.
... snip stuff about mapping XML to YAML...
| anyway, assuming all this, can we view 'format' as a 'fragment'?
|
| int: !int#dec 7 # http://type.yaml.org/int#dec
I don't see why not.
| - Minimal key set
|
| What's the minimum one would have to put in the "pot of gold"? I mean, if it
| is there at all. An empty document would presumably be legal (after all,
| this "pot of gold" *is* optional). But what must a non-empty document
| contain? *Should* there be a minimum set of keys, or just "recommended
| keys"?
I guess a title and summary would be good things to have,
otherwise there isn't a point of even having it.
| - Relationship with RDDL and other meta-data standards...
|
| Probably someone should set up some way they can be simply cut&pasted into
| this scheme, using appropriate top-level key(s) and structure. We should at
| least check that this is feasible/reasonable...
By allowing keys with a period to belong to domain holders, each
meta-data requirement can be done in its own bucket without prior
planning on our part. As far as RDDL; let's just stick with
something simple and let it grow from there.
| - Requirement from a YAML processor.
|
| I think that accessing the "pot of gold" should be *very* optional, and it
| should be *crystal clear* it is perfectly possible to handle YAML without
| ever thinking about it.
It may not, however, be possible for some YAML processes (such as a
validator) to proceed without access to the resource directory.
I'd not like it to be too optional; it's ok if nothing is there,
but if *something* is there, it should (must?) be a yaml catalog.
| Further, I think it should also be made very clear that it is *very*
| unrealistic to expect, in the general case, that anything other than, say,
| schema validation, would be possible to achieve for a type family that is
| solely "known" by its "pot of gold". That is, any expectation that YAML
| application will magically understand "the semantics" of a type through its
| "pot of gold" is, to say it kindly, naive.
The goal of the resource index is to provide for information
related to a particular node; what that information is or how
it could be used is just not specified, let alone asserted.
| Of course people could attach code in any interpreted language - scratch
| that, in *any* language (Windows X86 DLLs included - Ugh) - to the "pot of
| gold", thereby allowing dynamic loading of type family semantics. I find it
| to be more of a scary thought than a comforting one :-)
If someone wants to download "signed" code modules that can be
used to "visualize" the data or some other process, so be it; to
each his/her own. ;)
| - Relationship with the schema mechanism.
|
| Having a "pot of gold" document accessible via the type family immediatly
| suggests that this should be "the way" the schema would be fetched. That's
| nice and all, but... is it practical to chop the schema into multiple
| physical documents this way (one per type family used)? Putting aside
| efficiency issues, what about problems like version control and being easy
| to read/write?
Good question. One approach is to only use the type family for
root nodes or for data islands. For version information, perhaps
if we allowed collection nodes to have a "format", aka "version".
| Keep in mind that a collection type family will constrain its contained
| sub-nodes to the n-th degree regardless of their type - or, I should say,
| *in addition* to the generic restrictions specified by their type.
Right. And I think that this is a huge discussion in and of itself
that we should probably start in on and will probably rage for a year
or more... I think we all have less experience in this domain and it
will take some time before we even know what all of the issues are.
| - The risk factor.
|
| There is a giant leap between "type families are unique IDs with
| human-readable definition" and this proposal. And unlike everything else
| we've done with YAML, this would be exploring into new lands because nobody
| I know of has done anything like it.
Right. The closest thing is RDDL which hasn't been adopted beyond
a few small groups. There is often a big debate over systems like
this over two points of failure:
- Inheritently "centralized" approach, he who owns the domain
defines the resource file. This is all good, but just beacuse
we have one centralized approach, doesn't stop other registry
like mechanisms from emerging.
- Problems with efficency/cashing, either servers get beat up
badly or caches become stale. This is true, those who don't
use "expires" header in HTTP are bound to have pains. However,
I think that this isn't an architectural problem as much as it
is an educational one.
| Here we are speculating about what may or may not prove useful to
| application developers, and I for one do not have the personal experience in
| such dynamic-loading extendible-from-the-web-yet-strongly-typed systems to
| say whether this make sense or not.
From someone who used XML "extensively" for a spell, not having a
standard directory mechanism for items relating to a given vocabulary
was one of the sore points that I felt (hence by involvement with RDDL).
It's nice not to have to hunt-down a schema or to be able to click
on a family name and retrieve a human-readable description of what
the type is all about.
| Actually, I have a great deal of experience (as do we all) in one such
| system, the HTML browser. And it is a terrible mess, a failure of standards
| to achieve anything like a sane system - the most we can learn from it is
| what *not* to do.
The biggest problem with the web is that it grew too fast. ;)
| It may make more sense to use DNS-like mechanism. Or just make direct use of
| DNS. Or LDAP. Or WebDav. Or something. It may make more sense to have each
| top-level key reside in its own physical document. I have no idea, because I
| don't have a good grasp of the use case. Speaking of which...
HTTP is by far the best supported protocol out there...
| - The use case?
|
| What *is* the use case (other than being able to answer the newbie about
| "what does a type family point to")? What is the class of applications that
| want to be schema-aware but not schema-specific? If the answer is
| "validating parsers and authoring tools" than I think that this proposal is
| a serious overkill. A simple schema language would do the trick for both.
You hit most of them. But I think the most important reason
is to solve these in a manner which allows for other information
about a type family to be provided by its 'owner'.
| Is it something like "web services"? I have strong doubts about whether
| something like this proposal is actually useful for such services (given a
| schema language exists). Services require a much stronger knowledge of
| "semantics" than would be offered by the "pot of gold". IMVVHO, that is -
| since nobody ever saw "web services" actually working as hyped, that's all
| anybody has to offer, I'm afraid. On the other hand, using "point-to-point"
| or "client-to-server" schema-specific XML-RPC/SOAP/etc. *is* working in
| practice. Again this only requires a schema language (if that).
It isn't that grand.
| - Effects on the spec?
|
| If we agree the "pot of gold" is optional, and if we make it easy to look at
| a URL and say whether it is a "pot of gold" or not (simplest way: give it a
| distinctive mime type), is there really any reason to change the spec? It
| seems to me we can safely define this whole thing in a separate spec - "A
| convention for using YAML type families as URLs for fetching meta-data". We
| can start by giving some meta-data for our type core families as *an
| example*.
This is true. It need not be in the specification proper, a
link to it from the spec would probably be a good idea.
| If people like it and build on it - great. If it is useless for 99% of the
| people in the world (my suspicion at this point - feel free to set me
| right), no great loss, either. We'd have merely over-formalized a bit how we
| define type families.
|
| Minor changes to the spec may still result (specifically, handling of
| fragments and formats - and mentioning that there *is* an *optional*
| convention for meta data planned/available at a separate spec). I would be
| more than happy to discuss them under such an approach.
Ok.
| - Effects on time table?
|
| I suspect it will take ages to settle the issues this proposal raises. I'm
| less than enthused at the thought of wording such a chunk of functionality
| into our core spec. From the narrow point of view of "let's get a spec out
| the door", this proposal seems to be a serious problem.
|
| I could be wrong here - especially if it is worded as something optional,
| and would be rather loosly defined. By still, at this point, my vote is to
| otherwise steer away from this whole thing in the YAML 1.0 *CORE* spec.
| Let's create a separate YAML 1.0 *META* spec for this instead. Our current
| spec is big enough as it is anyway...
Ok. But I think for now, I'd like to restrict the type family
to be "http", its 95% or more of the use cases and we can always
be more flexible later if required.
Best,
Clark
--
Clark C. Evans Axista, Inc.
http://www.axista.com 800.926.5525
XCOLLA Collaborative Project Management Software
|