From: T. O. <tra...@ru...> - 2003-12-04 15:21:34
|
I'm using YAML to marhsal objects and I am needing some means for post initialization once the object is loaded. Loading a marshalled object has similarties to object cloning so I was thinking the perhaps this could be easily achieved by having YAML::load call initialize_copy after it loads a Ruby object. That would certainly work for me. Otherwise I have to call the post init method separately every time I load the object. Of course I can always build my own class method, but this seem like something that would be worth having in general. T. |
From: Oren Ben-K. <or...@be...> - 2003-12-05 17:31:48
|
I just went through the types specs, by way of preparing them for the coming-soon-now release candidate draft. I'd like to suggest the following two changes: - Rename 'special' keys to 'yaml' keys. That's a better capturing of their intent - the value associated with each such key is a YAML "thing" (a tag, an alias or an anchor). So the example becomes: # The following node should NOT be serialized this way. encoded YAML node : !yaml '!' : '!type' !yaml '&' : 12 = : value # The proper way to serialize the above node is as follows: node : !!type &12 value - Change the semantics of a timestamp without a time part to 00:00 UTC instead of 12:00 UTC. The current spec puts it at noon UTC of that date, on the grounds that no matter what time zone is applied to it, it remains within the same date. It turns out this isn't quite correct - there are both +12 and -12 time zones, believe it or not. So it only _almost_ works (though personally I feel that whoever lives at time zone +12 deserves all the time-related software bugs he gets. I mean, what were they *thinking*?). The main problem with the current interpretation is that it isn't how most code is written; people are used to truncating timestamp to obtain dates, which works fine, but to multiply dates by 24h to obtain time - which doesn't. Of course, this gets them into trouble with time zones... But time zones are a pain whatever we do. (Side rant: I was discussing date/time formats with a friend the other day and I hit upon the following format: yyyy-mm-dd[+hh:mm:ss[+tz]] For example: 2003-12-05+18:33:00+02 Using a "+" makes the timestamp a "single thing" again and is rather readable. It is also tempting to look at the result as an expression for computing the time in seconds in UTC: <day> + <time-of-day> + <time-zone>. Alas, the time zone sign is reversed... It always bugged me to write <time>+<tz>. You never add the <time> and the <tz>, it makes no sense. You have to compute <time> *minus* <tz> to get the UTC time. I guess having a standard, sensible date format has been a lost cause for decades. Given that we already promote a non-standard format (using white space), adding another non-standard format (using '+') is useless - an uncomfortable compromise between being standard and being readable. Oh well. But deciding on the interpretation of a date-only timestamp _is_ important. We'll be freezing things soon and I'd much rather not freeze it the wrong way. OK, off the soap box... :-) Thoughts, feedback, anything else you don't like in the spec? Have fun, Oren Ben-Kiki |
From: Oren Ben-K. <or...@be...> - 2003-12-05 18:03:49
|
Another issue: I dislike the name "omap". We have seq(uenc), map(ping), pairs and set, which are all nice and descriptive names. "omap" stands out like a sore thumb. Can we reconsider calling it dict(ionary)? It is a perfect match for what we mean - an ordered list of unique keys and their associated values. I know that "dictionary" is used in some languages as an (unordered) mapping. I don't think that merely because they misuse the word, we should avoid using it. And speaking of names - we had this discussion about what we call a YAML "document". We finally settled on leaving it as a "document". It is OK as far as YAML itself is concerned, but it is weak when it comes to discussing YAML vs. XML. "YAML is for data, XML is for documents" loses it punch when one discusses YAML "documents". So... Just a notion - how about using the word "object"? It is a good match for what we mean, even better than "graph" because "object" automatically implies being some programming data and having a single root. Ten years ago using the term "object" would have been controversial due to the argument of whether "everything is an object". I think that today, people have come to accept that outside the restricted context of a particular programming language, a "data object" can refer to any piece of data. Thoughts? Have fun, Oren Ben-Kiki |
From: why t. l. s. <yam...@wh...> - 2003-12-05 20:04:02
|
On Friday 05 December 2003 11:03 am, Oren Ben-Kiki wrote: > > So... Just a notion - how about using the word "object"? It is a good > match for what we mean, even better than "graph" because "object" > automatically implies being some programming data and having a single > root. > > Ten years ago using the term "object" would have been controversial due > to the argument of whether "everything is an object". I think that > today, people have come to accept that outside the restricted context o= f > a particular programming language, a "data object" can refer to any > piece of data. > I'm hesistant to mention this, because I don't know all the semantical=20 implications (you're 9 levels above, Oren), but often the specific object= is=20 referred to as an "instance". I have no objections to any of the proposa= ls=20 so far, I'm just assisting in the brainstorm. I believe that "instance" has been used in the database world to refer to= a=20 particular available portion of data. A script is executed to create the= =20 instance. These instances, however, often include schema and data togeth= er. =20 Is this right? I think it could be safe to refer to a file containing YAML as an instanc= e. =20 Although one could also see YAML as the script which creates the instance= ,=20 available within the language's environment. _why |
From: Oren Ben-K. <or...@be...> - 2003-12-05 21:23:24
|
why the lucky stiff wrote: > I believe that "instance" has been used in the database world > to refer to a > particular available portion of data. A script is executed > to create the > instance. These instances, however, often include schema and > data together. > Is this right? I don't know the database world that well. To me, "instance" means "instance of a class" and is (paradoxically) more strongly connected with the OO world than "object", which I've grown to view as a generic term. A YAML document/object/instance certainly doesn't carry the schema with it. > I think it could be safe to refer to a file containing YAML > as an instance. > Although one could also see YAML as the script which creates > the instance, > available within the language's environment. Hmmm. Interesting view - I can see how it would seem that way from a database perspective. I guess we are stuck with "document", when all is said and done. Everything else just carries too many connotations. I was just trying "object" on for size, as it were. It isn't that bad. After all, the YAML (text) file is a (text) document... Have fun, Oren Ben-Kiki |
From: Randy W. S. <RandyS@ThePierianSpring.org> - 2003-12-05 18:53:30
|
On 12/5/2003 12:31 PM, Oren Ben-Kiki wrote: > (Side rant: > > I was discussing date/time formats with a friend the other day and I hit > upon the following format: > > yyyy-mm-dd[+hh:mm:ss[+tz]] > > For example: > > 2003-12-05+18:33:00+02 > > Using a "+" makes the timestamp a "single thing" again and is rather > readable. It is also tempting to look at the result as an expression for > computing the time in seconds in UTC: <day> + <time-of-day> + > <time-zone>. Alas, the time zone sign is reversed... It always bugged me > to write <time>+<tz>. You never add the <time> and the <tz>, it makes no > sense. You have to compute <time> *minus* <tz> to get the UTC time. > > I guess having a standard, sensible date format has been a lost cause > for decades. Given that we already promote a non-standard format (using > white space), adding another non-standard format (using '+') is useless > - an uncomfortable compromise between being standard and being readable. > Oh well. > > But deciding on the interpretation of a date-only timestamp _is_ > important. We'll be freezing things soon and I'd much rather not freeze > it the wrong way. > Why not use the ISO 8601 Date Format? <http://www.google.com/search?q=ISO+8601> > OK, off the soap box... :-) > > Thoughts, feedback, anything else you don't like in the spec? > > Have fun, > > Oren Ben-Kiki > |
From: Oren Ben-K. <or...@be...> - 2003-12-05 19:15:51
|
Randy W. Sims wrote: > Why not use the ISO 8601 Date Format? We are (see www.yaml.org/type/timestamp). We aren't using _all_ of iso8601 (mercifully). And we do allow one "extension" (using white space instead of 'T'), for readability. The question of the timestamp meaning of a time-less date is beyond the scope of ISO8601. Have fun, Oren Ben-Kiki |
From: Rick M. <r.m...@au...> - 2003-12-06 20:06:04
|
Here's a few minor points about the spec: o B4 example has some html hieroglyphs o C6 example: is a '>' implied? o E1 example: is a '>' implied for comments text? o 'the the serialized text' - double 'the' o example at end of 1.3.1 needs '{' and '}' for invoice map on one line I find the grammar difficult to read, even though I know about grammars and have read and written many of them. My PhD was related to programming languages. I've tried several times to understand the yaml grammar, as I have a parser in progress, and each time gave up :-( . I don't understand why, except maybe it's because it mixes lexical and syntactical stuff altogether. Maybe because I don't understand the point of some of the ideas and why they ended up in that form. Maybe I'm missing the right way to look at it? How do others find it? Cheers, Rick |
From: Oren Ben-K. <or...@be...> - 2003-12-07 00:06:03
|
Rick Mugridge wrote: > o B4 example has some html hieroglyphs Yeah, I caught that. > o C6 example: is a '>' implied? > o E1 example: is a '>' implied for comments text? Nope. It is a plain scalar. This is a bit surprising, I guess... Think about it like this: a plain scalar is "flow" (in-line), true, but it can span multiple lines. And there's no requirement preventing the part of it which is on the first line to be empty. Hence, it can start in a following line. We should have examples covering this. In generals the examples need a thorough going-over. > o 'the the serialized text' - double 'the' Got it, Thanks. > o example at end of 1.3.1 needs '{' and '}' for invoice map > on one line Right. It uses '[' ']' which is a mistake, of course. Nice catch. Thanks for the proof-reading. It is so hard to do when you write the text yourself - you keep seeing what you intend instead of you have written. Anyone know of a good spell-checker for HTML/XML? > I find the grammar difficult to read, even though I know > about grammars > and have read and written many of them. My PhD was related to > programming languages. I've tried several times to understand > the yaml > grammar, as I have a parser in progress, and each time gave > up :-( . I had a rough time with it myself, writing it. So, "if it was hard to write"... :-) But seriously, this is a problem. > I don't understand why, except maybe it's because it mixes > lexical and syntactical stuff altogether. Yeah. It isn't like C where you define your tokens and then mix them together. In C, you don't even mention white space in the BNF! YAML is more context-dependent - it is very Perl-ish in that sense. Which makes it hard to write a context-*free* grammar for it :-) > Maybe because I don't understand the point > of some of the ideas and why they ended up in that form. A lot of the time it is because of having to "carry the context". For example, the various forms of the plain scalar. Or the whole distinction between "flow-*" and "top-*" rules. Things just work differently inside a flow collection and outside it... BTW, the name "top" is a sore point. "Flow" seems good enough a name, but "top" is really a bad name for "not flow". A better name would be appreciated. > Maybe I'm missing the right way to look at it? The Hungarian notation helps. l-* rules match whole lines. *-node includes the node properties (anchor, tag) while *-value does not; top-* only match outside flow collections, while flow-* also matches inside them. This leaves white space, which is tracked by noticing ns-* vs. s-* names. b-* for break, i-* for indentation... That about covers it. Once you start paying attention to this, it becomes a bit easier. A thing that hurts readability and we can improve on is the use of things like 'c-mapping-entry' instead of simply writing ':'. This means that the reader has no visible way to may each BNF production to the physical characters. I just fixed that all over the productions, which again should help at least a bit. Compare: c-ns-directive ::= c-directive ns-ns-directive-name c-mapping-entry ns-ns-directive-value ns-ns-directive-name ::= ( ns-char - c-mapping-entry )+ ns-ns-directive-value ::= ns-char+ With the new version: c-ns-directive ::= "%" ns-ns-directive-name ":" ns-ns-directive-value ns-ns-directive-name ::= ( ns-char - ":" )+ ns-ns-directive-value ::= ns-char+ (the "%" and ":" work as links to the c-directive and c-mapping-entry productions, fo the hyper-text quality of the BNF is not harmed). But when all is said and done, the core problem - YAML not being context-free - can't be helped, though. Of course, if anyone has a notion of how to better re-arrange the productions, I'll be happy to use it. > How do others find it? Difficult, I'm certain. I doubt most readers of the spec go beyond the text and the examples. Or I would have got more complaints :-) Have fun, Oren Ben-Kiki |
From: T. O. <tra...@ru...> - 2003-12-09 20:47:35
|
On Thursday 04 December 2003 07:14 am, T. Onoma wrote: > I'm using YAML to marhsal objects and I am needing some means for post > initialization once the object is loaded. Loading a marshalled object has > similarties to object cloning so I was thinking the perhaps this could be > easily achieved by having YAML::load call initialize_copy after it loads a > Ruby object. That would certainly work for me. Otherwise I have to call > the post init method separately every time I load the object. Of course I > can always build my own class method, but this seem like something that > would be worth having in general. was this not a good idea? or am i missing some obvious way to already do this? thanks, t. |