From: Oren Ben-K. <or...@ri...> - 2001-07-17 09:12:24
|
Clark C . Evans [mailto:cc...@cl...] wrote: > | So what you seem to be saying is that it isn't the > | information model which says whether a value is binary > | or text, it is the application which decides. But in > | this case, there's no need to provide a special > | syntax/model/API for binary values. A trivial wrapper > | around the incremental read/write methods, applied at > | the discretion of the application, would do the trick > | just as well. > > Not quite. I'm saying that you can consider all scalars > binary by fixing on an encoding of UTF8 for unicode scalars. > Thus, given a scalar you could have two methods: > > asBytes() -- The scalar's binary value (UTF8 if string) > asString() -- The scalar's native unicode value So why not also: asInteger() -- The scalar's integer value asReal() -- The scalar's real value asDate() -- The scalar's date value Etc.? Why is "byte[]" such a special data type that we have to provide special syntax for it? > | Again, this is the same problem as being forced to > | pass strings instead of integer values. Which is probably > | a much more common problem then binary values. > > Good point, but this is what class is for; perhaps > the "C" API needs to have one more growth spurt to > natively understand class. (A registered class > can have a reader/writer pair from the byte stream > to provide one more function: > > asNative() > > This begs the question if Unicode string is a > class... from the information model perspective, > and UTF8, UTF16LE, UTF16BE, etc., are just > transfer encodings like BASE64. You could argue the same thing about dates - DYM, YMD, ISO, etc. all being "transfer encodings". Does that mean YAML has to have explicit syntax, in the core spec, for date formats? Ugh. > | Seems good enough for me. Of course, > | > | value: [!gif %base64] <value> > | > | Seems even better :-) > > Hmm. So the % color is the transfer encoding. > I think you've shown with this example why > "base64" isn't a class... It isn't, but "byte[]" is. A "raw" binary value would be written as: value: [!bytes %base64] <value> > as it is a transfer > encoding. Why not just build-in one binary > transfer encoding -- base64 -- and be done > with it? - Because other transfer encodings make sense (gzip etc.). - Because 'bytes' is a rather useless data type by itself - definitely less useful then 'int'. Why should 'bytes' be a first-class data type and not 'int'? - Because mapping 'bytes' to the native data structures in scripting languages is a mess (e.g., for JavaScript it is flatly impossible; possibly also for TCL). In general, the only data type we can always rely on is the text string. So at minimum output of 'bytes' will be shaky. - Because I think that YAML should have an "everything is a string" approach. Sure, we should allow for converting these strings/maps/lists to whatever class you want. But I strongly feel that the information model should be based on a single scalar type - text string. Just an example: this makes the mapping between the serialized format and the information model 1-1. So when one is working on a YAML file as text one stays within the YAML information model (of course, there's the application information model - the "schema" - to consider as well). - Because applications *will* need to handle multiple date formats etc. It may seem a trivial issue, but multiple formats for the same data is a common, problematic issue. I'd rather we had a way to tackle it. - Because I think we should be pushing for 'color' as a way to solve such issues and adding a special 'bytes' data type is a cop-out to avoid it. In short, I think that having a 'bytes' scalar data type in the core data model is a bad idea (and yes, I realize this is different from what I said a month ago :-) As for the shorthand notation I proposed. Jason Diamond [mailto:ja...@in...] wrote: > I think that I'm in the minority here but I believe that YAML > is currently already flexible enough to express just about > any concept while still being simple enough for most people > to grasp and work with. I tend to agree, with a qualification - don't think YAML can do it in a *readable* way at the moment. Hence the shorthand notation (which is just that). > Every time I read a message about > color or typing or formatting I get worried that YAML will get > bogged down in complexity. Don't get me wrong--I think that > the concept of color is great and should be used but I don't > believe that it needs to be mandated by any spec. I completely agree. Nothing in the spec should mandate that color be used. What I suggest is that we provide a shorthand notation for a certain set of map keys, so that people who do want to use color could do it in a readable fashion. This shorthand notation won't effect the information model (simple map/list/scalar) at all. It would just be an alternative, more readable way to write certain types of maps. Nothing more. > And I also > don't believe that any more semantics need to be layered onto > YAML other than the simple scalars, lists, and maps that > it currently has. I agree as long as you qualify it by "be layered ... in the core spec". Such layering has its place - in separate spec(s). > And while I'm at it, I no longer believe > that the API should be part of the spec, either. Let's be > realistic--one API will not fit everybody's needs. Again I agree. I think we should have a format/info model spec, roll it out the door, and make the API spec(s) separately. We've done one step towards it by breaking the spec into almost- independent sections. I think that the format/info model section is close to being complete (barring the new block syntax proposals and the issue of a shorthand notation). I have some free time this weekend. I could modify the block syntax to whatever we agree on, add the shorthand notation for "color" maps, and create a standalone "YAML 1.0 Core Spec" release candidate. Clark? Have fun, Oren Ben-Kiki |