From: Kenneth D. <ke...@se...> - 2007-04-17 14:59:30
|
Hello! I am a recent convert to YAML, using spyc to load YAML files to PHP arrays. Very very nice. There are many places where we have assignment pairs that look like "primary_key: Y" or "uisearch: Y", and YAML is interpreting these to be booleans and making them into 1/0 assignments. It seems that with TAGs I ought to be able to force those particular keys to read as strings. But alas, the documentation, while very complete, is also very abstract and lacks (as far as I could find) an example of this basic task. Can anybody clue me in? Thanks. -- Kenneth Downs Secure Data Software, Inc. www.secdat.com www.andromeda-project.org 631-379-7200 Fax: 631-689-0527 |
From: Clark C. E. <cc...@cl...> - 2007-04-17 21:41:26
|
Kenneth, short answer: You should be able quote those Y/N items to force them to be a string. long answer: You might be able to configure your parser to not "implicitly type" Y/N as a boolean value. longest answer: > You've hit the #1 usability prooblem with YAML, it's called "implicit type resolution" and different implementations are doing it differently. The original goal was to make it easy to type in integers and have them show up as integers w/o littering your text with "!!int". Unfortuntely, where that line should be drawn is a bit hard. In the next pass of YAML, I am going to recommend that all parsers _only_ do implicit typing on: (a) symbolic values, such as <<, which can be used to augment the YAML syntax /w very nice hooks (b) numbers, "true" "false" and "null", following the JSON standard (for compatibility) At least, this should, IMHO, be the default. I think Ingy begs to differ and believes the default should be *all strings* with no implicit typing. What ever we come up with, getting there is sure to be unpleasent; but probably far less unpleasant than the current state of affairs. Cheers, Clark On Tue, Apr 17, 2007 at 10:59:16AM -0400, Kenneth Downs wrote: | Hello! | | I am a recent convert to YAML, using spyc to load YAML files to PHP | arrays. Very very nice. | | There are many places where we have assignment pairs that look like | "primary_key: Y" or "uisearch: Y", and YAML is interpreting these to be | booleans and making them into 1/0 assignments. | | It seems that with TAGs I ought to be able to force those particular | keys to read as strings. But alas, the documentation, while very | complete, is also very abstract and lacks (as far as I could find) an | example of this basic task. Can anybody clue me in? | | Thanks. | | -- | Kenneth Downs | Secure Data Software, Inc. | www.secdat.com www.andromeda-project.org | 631-379-7200 Fax: 631-689-0527 | | | ------------------------------------------------------------------------- | This SF.net email is sponsored by DB2 Express | Download DB2 Express C - the FREE version of DB2 express and take | control of your XML. No limits. Just data. Click to get it now. | http://sourceforge.net/powerbar/db2/ | _______________________________________________ | Yaml-core mailing list | Yam...@li... | https://lists.sourceforge.net/lists/listinfo/yaml-core |
From: Kenneth D. <ke...@se...> - 2007-04-18 10:41:34
|
Clark C. Evans wrote: > Kenneth, > > short answer: > You should be able quote those Y/N items to force them > to be a string. > OK, this is what I'm not doing now. > long answer: > You might be able to configure your parser to not "implicitly > type" Y/N as a boolean value. > This had crossed my mind, but I was hoping you wouldn't say that :( > longest answer: > > You've hit the #1 usability prooblem with YAML, it's called "implicit > type resolution" and different implementations are doing it > differently. > > The original goal was to make it easy to type in integers and have > them show up as integers w/o littering your text with "!!int". > Unfortuntely, where that line should be drawn is a bit hard. > Hard to argue with the original goal. > In the next pass of YAML, I am going to recommend that all parsers > _only_ do implicit typing on: > > (a) symbolic values, such as <<, which can be used to > augment the YAML syntax /w very nice hooks > > (b) numbers, "true" "false" and "null", following > the JSON standard (for compatibility) > What about adding the feature which I thought from the docs was already present, a kind of variable declaration that tells the parser how to handle certain names. In my case I would want to say "For name:value combinations where name=="primary_key" make value a string." This ought to give the best of all worlds, in that it would allow a) implicit typing, which has some historical weight and strong arguments b) name-level overrides, perhaps even descendent names like CSS c) value-level overrides, which I believe are already in there (putting !!int everywhere?) > At least, this should, IMHO, be the default. I think Ingy begs > to differ and believes the default should be *all strings* with > no implicit typing. What ever we come up with, getting there > is sure to be unpleasent; but probably far less unpleasant than > the current state of affairs. > > Cheers, > > Clark > > > > On Tue, Apr 17, 2007 at 10:59:16AM -0400, Kenneth Downs wrote: > | Hello! > | > | I am a recent convert to YAML, using spyc to load YAML files to PHP > | arrays. Very very nice. > | > | There are many places where we have assignment pairs that look like > | "primary_key: Y" or "uisearch: Y", and YAML is interpreting these to be > | booleans and making them into 1/0 assignments. > | > | It seems that with TAGs I ought to be able to force those particular > | keys to read as strings. But alas, the documentation, while very > | complete, is also very abstract and lacks (as far as I could find) an > | example of this basic task. Can anybody clue me in? > | > | Thanks. > | > | -- > | Kenneth Downs > | Secure Data Software, Inc. > | www.secdat.com www.andromeda-project.org > | 631-379-7200 Fax: 631-689-0527 > | > | > | ------------------------------------------------------------------------- > | This SF.net email is sponsored by DB2 Express > | Download DB2 Express C - the FREE version of DB2 express and take > | control of your XML. No limits. Just data. Click to get it now. > | http://sourceforge.net/powerbar/db2/ > | _______________________________________________ > | Yaml-core mailing list > | Yam...@li... > | https://lists.sourceforge.net/lists/listinfo/yaml-core > -- Kenneth Downs Secure Data Software, Inc. www.secdat.com www.andromeda-project.org 631-379-7200 Fax: 631-689-0527 |
From: Ionous <li...@io...> - 2007-04-19 17:00:06
|
i've been writing about YAML on my blog on and off, and wrote a little bit on this topic<http://www.ionous.net/2007/01/31/yamls-missing-type/>if anyone's interested.... my argument there was that since any language using YAML to parse data, already has to do some minimal negotiation of types as data gets read in ( for error handling for instance ) treating strings and scalars uniformly could help smooth out some of the implicit typing problems that can occur. rather than eliminate implicit typing tho, my thought was you implicitly type both scalars and strings. basically just treating strings as a method to allow the inclusion of YAML operators in a scalar value.... meaning both "3" and 3 are the same thing. i know, as programmers, that sounds a bit unusual, but i think it would actually make a bunch of sense to the average, non-programmer / document author.... and even has a little bit of precedence some places.... html attributes, for instance, are always quoted whether the dom is going to treat the value as a number, a string, or enum -simon. On 4/18/07, Kenneth Downs <ke...@se...> wrote: > > Clark C. Evans wrote: > > Kenneth, > > > > short answer: > > You should be able quote those Y/N items to force them > > to be a string. > > > > OK, this is what I'm not doing now. > > > long answer: > > You might be able to configure your parser to not "implicitly > > type" Y/N as a boolean value. > > > > This had crossed my mind, but I was hoping you wouldn't say that :( > > > longest answer: > > > You've hit the #1 usability prooblem with YAML, it's called "implicit > > type resolution" and different implementations are doing it > > differently. > > > > The original goal was to make it easy to type in integers and have > > them show up as integers w/o littering your text with "!!int". > > Unfortuntely, where that line should be drawn is a bit hard. > > > > Hard to argue with the original goal. > > > In the next pass of YAML, I am going to recommend that all parsers > > _only_ do implicit typing on: > > > > (a) symbolic values, such as <<, which can be used to > > augment the YAML syntax /w very nice hooks > > > > (b) numbers, "true" "false" and "null", following > > the JSON standard (for compatibility) > > > > What about adding the feature which I thought from the docs was already > present, a kind of variable declaration that tells the parser how to > handle certain names. In my case I would want to say > > "For name:value combinations where name=="primary_key" make value a > string." > > This ought to give the best of all worlds, in that it would allow > > a) implicit typing, which has some historical weight and strong arguments > b) name-level overrides, perhaps even descendent names like CSS > c) value-level overrides, which I believe are already in there (putting > !!int everywhere?) > > > > At least, this should, IMHO, be the default. I think Ingy begs > > to differ and believes the default should be *all strings* with > > no implicit typing. What ever we come up with, getting there > > is sure to be unpleasent; but probably far less unpleasant than > > the current state of affairs. > > > > Cheers, > > > > Clark > > > > > > > > On Tue, Apr 17, 2007 at 10:59:16AM -0400, Kenneth Downs wrote: > > | Hello! > > | > > | I am a recent convert to YAML, using spyc to load YAML files to PHP > > | arrays. Very very nice. > > | > > | There are many places where we have assignment pairs that look like > > | "primary_key: Y" or "uisearch: Y", and YAML is interpreting these to > be > > | booleans and making them into 1/0 assignments. > > | > > | It seems that with TAGs I ought to be able to force those particular > > | keys to read as strings. But alas, the documentation, while very > > | complete, is also very abstract and lacks (as far as I could find) an > > | example of this basic task. Can anybody clue me in? > > | > > | Thanks. > > | > > | -- > > | Kenneth Downs > > | Secure Data Software, Inc. > > | www.secdat.com www.andromeda-project.org > > | 631-379-7200 Fax: 631-689-0527 > > | > > | > > | > ------------------------------------------------------------------------- > > | This SF.net email is sponsored by DB2 Express > > | Download DB2 Express C - the FREE version of DB2 express and take > > | control of your XML. No limits. Just data. Click to get it now. > > | http://sourceforge.net/powerbar/db2/ > > | _______________________________________________ > > | Yaml-core mailing list > > | Yam...@li... > > | https://lists.sourceforge.net/lists/listinfo/yaml-core > > > > > -- > Kenneth Downs > Secure Data Software, Inc. > www.secdat.com www.andromeda-project.org > 631-379-7200 Fax: 631-689-0527 > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Yaml-core mailing list > Yam...@li... > https://lists.sourceforge.net/lists/listinfo/yaml-core > |
From: Kenneth D. <ke...@se...> - 2007-04-26 22:53:58
|
Ingy dot Net wrote: > On 17/04/07 17:40 -0400, Clark C. Evans wrote: > >> Kenneth, >> >> short answer: >> You should be able quote those Y/N items to force them >> to be a string. >> >> long answer: >> You might be able to configure your parser to not "implicitly >> type" Y/N as a boolean value. >> >> longest answer: > >> You've hit the #1 usability prooblem with YAML, it's called "implicit >> type resolution" and different implementations are doing it >> differently. >> >> The original goal was to make it easy to type in integers and have >> them show up as integers w/o littering your text with "!!int". >> Unfortuntely, where that line should be drawn is a bit hard. >> >> In the next pass of YAML, I am going to recommend that all parsers >> _only_ do implicit typing on: >> >> (a) symbolic values, such as <<, which can be used to >> augment the YAML syntax /w very nice hooks >> >> (b) numbers, "true" "false" and "null", following >> the JSON standard (for compatibility) >> >> At least, this should, IMHO, be the default. I think Ingy begs >> to differ and believes the default should be *all strings* with >> no implicit typing. What ever we come up with, getting there >> is sure to be unpleasent; but probably far less unpleasant than >> the current state of affairs. >> > > I disagree but in specifics, not in spirit. The "Parser" should not do typing > of any form. It reports for each scalar, a char-string value, and whether the > scalar was plain or not. > > A yaml "Load" operation consists of at least 3 steps, "parse", "compose", > "construct". According to the spec, introduction of node "tag" (aka "type") > happens in the composer. > > Whatever... > > Your point is that we got too cute with the default implicit types. String, > Integer and Number are fine as a default. > > I would reccomend that all implementations support a everything is string mode. > I would ask, what is simplest and most consistent. And also, how did the conversation start? The conversation started because these two elements (is that the right word?) give different results: item: prop_1st: value # yields the string "value" prop_2nd: Y # yields a numeric 1! Newbie says huh?? What options are available? 1) Tweak default behavior. Pro: Might satisfy this case. Con: Might break older files, or require them to be version-stamped. Con: Will just be an invitation to more tweaking and nobody will ever be happy. Con: will create a list of incompatible versions with incomprehensible variations (this is the end-result of taking this road). Think HTML 3, html 3 for ie, html 4, html 4 for ie, html 4 for mozilla pre 6, mozilla 6, etc etc etc. 2) Support for header directives for the possible options. Possible directives: none: follow behavior from before directives became available, so my Y above becomes a 1 "booltrue: Y, 1, Yes, YES", some kind of explicit list of values that will be treated as boolean true. If my Y is not listed it wont be treated as a boolean. boolfalse: same as booltrue, list of false values date: xx-xx-xxxx, anything that fits the picture is treated as a date. numdigits: Treat any string composed only of numerals as a number ...others as they come to mind. Those are off the top of my head, a real effort would have to be made to seek the list of directives that served all purposes without overlap or missing possibilities. 3) Declaring types for named properties. In the above example I would declare in the header that "prop_2nd" is a string. 4) Type-casting at the definition, which I believe is supported now with "tags". If Ken were calling the shots, option 1 would be thrown out, option 4 is already supported, so supporting options 2 and 3 would produce the general solution, and then it becomes a matter of programmer preference and then you wait for best practices to emerge through community use of the various approaches. -- Kenneth Downs Secure Data Software, Inc. www.secdat.com www.andromeda-project.org 631-379-7200 Fax: 631-689-0527 |
From: Kenneth D. <ke...@se...> - 2007-04-30 14:19:17
|
Ingy dot Net wrote: > On 26/04/07 18:53 -0400, Kenneth Downs wrote: > >> Ingy dot Net wrote: >> >>> On 17/04/07 17:40 -0400, Clark C. Evans wrote: >>> >>> >>>> Kenneth, >>>> >>>> short answer: >>>> You should be able quote those Y/N items to force them >>>> to be a string. >>>> >>>> long answer: >>>> You might be able to configure your parser to not "implicitly >>>> type" Y/N as a boolean value. >>>> >>>> longest answer: > >>>> You've hit the #1 usability prooblem with YAML, it's called "implicit >>>> type resolution" and different implementations are doing it >>>> differently. >>>> >>>> The original goal was to make it easy to type in integers and have >>>> them show up as integers w/o littering your text with "!!int". >>>> Unfortuntely, where that line should be drawn is a bit hard. >>>> >>>> In the next pass of YAML, I am going to recommend that all parsers >>>> _only_ do implicit typing on: >>>> >>>> (a) symbolic values, such as <<, which can be used to >>>> augment the YAML syntax /w very nice hooks >>>> >>>> (b) numbers, "true" "false" and "null", following >>>> the JSON standard (for compatibility) >>>> >>>> At least, this should, IMHO, be the default. I think Ingy begs >>>> to differ and believes the default should be *all strings* with >>>> no implicit typing. What ever we come up with, getting there >>>> is sure to be unpleasent; but probably far less unpleasant than >>>> the current state of affairs. >>>> >>>> >>> I disagree but in specifics, not in spirit. The "Parser" should not do >>> typing >>> of any form. It reports for each scalar, a char-string value, and whether >>> the >>> scalar was plain or not. >>> >>> A yaml "Load" operation consists of at least 3 steps, "parse", "compose", >>> "construct". According to the spec, introduction of node "tag" (aka "type") >>> happens in the composer. >>> >>> Whatever... >>> >>> Your point is that we got too cute with the default implicit types. String, >>> Integer and Number are fine as a default. >>> >>> I would reccomend that all implementations support a everything is string >>> mode. >>> >> I would ask, what is simplest and most consistent. And also, how did >> the conversation start? >> >> The conversation started because these two elements (is that the right >> word?) give different results: >> >> item: >> prop_1st: value # yields the string "value" >> prop_2nd: Y # yields a numeric 1! Newbie says huh?? >> >> What options are available? >> >> 1) Tweak default behavior. Pro: Might satisfy this case. Con: Might >> break older files, or require them to be version-stamped. Con: Will >> just be an invitation to more tweaking and nobody will ever be happy. >> Con: will create a list of incompatible versions with incomprehensible >> variations (this is the end-result of taking this road). Think HTML 3, >> html 3 for ie, html 4, html 4 for ie, html 4 for mozilla pre 6, mozilla >> 6, etc etc etc. >> >> 2) Support for header directives for the possible options. Possible >> directives: >> >> none: follow behavior from before directives became available, so my Y >> above becomes a 1 >> >> "booltrue: Y, 1, Yes, YES", some kind of explicit list of values that >> will be treated as boolean true. If my Y is not listed it wont be >> treated as a boolean. >> >> boolfalse: same as booltrue, list of false values >> >> date: xx-xx-xxxx, anything that fits the picture is treated as a date. >> >> numdigits: Treat any string composed only of numerals as a number >> >> ...others as they come to mind. Those are off the top of my head, a >> real effort would have to be made to seek the list of directives that >> served all purposes without overlap or missing possibilities. >> >> 3) Declaring types for named properties. In the above example I would >> declare in the header that "prop_2nd" is a string. >> >> 4) Type-casting at the definition, which I believe is supported now with >> "tags". >> >> If Ken were calling the shots, option 1 would be thrown out, option 4 is >> already supported, so supporting options 2 and 3 would produce the >> general solution, and then it becomes a matter of programmer preference >> and then you wait for best practices to emerge through community use of >> the various approaches. >> > > It's a little more involved than picking a typing solution. Actually my problem is a typing problem, and solving my own problem is exactly as involved as picking a typing solution. The only situations it touches for other parties are typing situations. > YAML is intended > to be used in both closed systems and open. In situations with a single > producer and consumer, and in those with many. In single proramming languages > and multi. etc. > And this is relevant how? A general solution that allows a file to be self-describing solves all cases. Different languages can make use of the typing directives as needed/able. The typing method used by YAML cannot change the fundamental typing abilities of any given language, so the best way to handle lots of languages is to have the most flexible way to describe the data. Actually, to round out the general solution, the processor itself might accept run-time parameters that override the directives inside of the file. > In small closed systems the YAML tool in question should be assumed to do the > right thing. Except when it doesn't, and you end up squeezing the balloon and always watching it pop out somewhere else. The problem is that one person's Right Thing is another person's Wrong Thing. You can never assume except in trivial cases that code which is making assumptions will always make the right assumptions. > If it doesn't this can easily be fixed by local code. > Yikes! Pushing the problem to code! A very strange approach in a data-serialization project. I would expect more focus on the possibilities of data-driven configurations. > There is also the times when a document should be considered appropriate for > the masses, and data typing must be perfect and enforced. So far we really > only have tags for this. But we have talked here for years about defining a > "Schema" language for yaml. And documents could be given a schema, perhaps in > a header directive... > > So we need to do that. But that will take effort... > > But the real question for to answer and make clear for now, is when to apply > implicit typing. > Given two parties using it, you'll get two opinions. Three parties, three opinions. Good luck. -- Kenneth Downs Secure Data Software, Inc. www.secdat.com www.andromeda-project.org 631-379-7200 Fax: 631-689-0527 |
From: Oren Ben-K. <or...@be...> - 2007-05-01 23:11:39
|
My $.02 about implicit typing, as it seems to be biting many people... We seem to have different interpretations of how the YAML type repository should be used. This ambiguity has leaked into the YAML implementations, and we need to clear things up. The following is MVHO, Ingy/Clark may have a different angle on things: The YAML type repository is NOT intended to be interpreted as "here is a set of types to use by default". While _some_ of the types there certainly should be available by default, not all should. Instead, the repository should be interpreted as "here are some types, if you need them, use them in this way to maximize portability between applications". This is why the types are not part of the spec itself - this set of types is dynamic and can grow without bound, without affecting parser implementations. This punts the question of "what should be the default set of implicit types"? We have intentionally avoided defining one so far. As Clark points out, however, we now have a "gold standard" to work with, JSON, which defines a particular set of types that "everyone" has learned to accept. It is small, and limited, but very "unsurprising". Therefore, if implementations reduced the set of built-in types to JSON-compatible types (plus possibly some useful types we designate as default that are sufficiently "unsurprising", such as << etc.), the problem would - mostly - go away. The boolean case raises a second issue however. There is a conflict between "correctly" guessing the types of data with utter lack of schema information, and "correctly" interpreting the value of data when the schema is known. In the boolean case, this means that when reading a random-off-the-street file without any knowledge what it means, interpreting Y as a string seems to be the right thing to do. At the same time, when a server application is reading reading a boolean field in its own dedicated configuration file, interpreting the same Y character as the boolean value true also seems as the right thing to do. This leaves us in something of a bind. The immediate solution that comes to mind is: - First, for the same type (e.g. boolean), there would be two sets of regexps. One would be used to detect the type (assuming it is supported) in the basic set of types. The second would allow for additional variants but would only be applicable if the application "knows" the field type already. - Second, we specify that unless the schema is explicitly specified (using a tag), a parser should only use the basic set of types. Specifying a tag for a node (such as the root node) informs the parser of the expected type of the node, and if this type is a complex type, it may induce types on some of the contained nodes. For example the type 'map' gives the parser no knowledge of the contained types, but the type 'point' type specified the type 'float' for its 'x' and 'y' values. - Finally, we need a format way to specify schemas/tags/types that will allow parsers to do the right thing. An immediate objection problem to this approach is the infamous DTD problem - parsing a document with and without a DTD yielded different results, which was (and still is) a big pain when working with XML. In our case, parsing a document containing tags with and without knowledge of what they mean would yield different results. YAML side-steps this problem, neatly or messily depending on your point of view. The spec clearly defines what it means to create a "partial" representation of a YAML file. Specifically, if the parser does not recognize the explicit tags used in the document, it is expected to report this fact and not attempt to assign a type to these nodes anyway. In contrast, in XML, parsing an XML document with and without a DTD yields the same information model, so there is no way for the application to realize it is missing information. That said, a YAML parser may choose to forge ahead and use the basic set of types on the data, hoping for the best. Doing so, however, potentially changes the semantics of the data. For example, Y becomes a string instead of a boolean). Hence this "damn the torpedoes, full steam ahead" approach should only be used with care and never as the default option (for a parser library anyway). Hope this helps, Oren Ben-Kiki P.S. About the spec status - I have tinkered with my YamlReference parser implementation, and it is now a fully streaming parser that can handle arbitrarily large inputs (Haskell is a tricky language...). I am waiting for some technical work on the yaml.org servers to upload an HTML interface that will allow people to view the results of parsing YAML fragments, and use these to report bugs or dispute the spec. Once this is up I'll start working on an updated spec version - hopefully this would be "the" final 1.1 spec, period. |
From: Oren Ben-K. <or...@be...> - 2007-05-03 18:19:53
|
On Wed, 2007-05-02 at 01:09 -0700, Ingy dot Net wrote: > > The boolean case raises a second issue however. There is a conflict > > between "correctly" guessing the types of data with utter lack of schema > > information, and "correctly" interpreting the value of data when the > > schema is known. In the boolean case, this means that when reading a > > random-off-the-street file without any knowledge what it means, > > interpreting Y as a string seems to be the right thing to do. At the > > same time, when a server application is reading reading a boolean field > > in its own dedicated configuration file, interpreting the same Y > > character as the boolean value true also seems as the right thing to do. > > > > This leaves us in something of a bind. The immediate solution that comes > > to mind is: > > Why does this leave us in a bind? > > I would say that your server application would be aware that its > configuration file had implicit booleans, and would then set the appropriate > option in the YAML Loader module/object/whatever to DTRT. So far, so good. The problem is that, say, a YAML-pretty printer would feel justified in stripping away quotes around a "Y", not knowing that the server application will then start treating it as a boolean (assume a polymorphic field for a second). The challenge is to define rules that allow the specific application to do what it wants to without a fuss, while keeping the generic programs safe. The spec addresses this for the most part; the missing piece is defining the "basic set of types". How do you feel about defining such a basic set and requiring an explicit tag if venturing outside it? > One more thing... could you stop referring to the "Parser" when talking bout > typing. IMO, no typing activity happens at the parser level. Use the term > Composer or at least Loader. My bad, you are absolutely correct. The name of the relevant processing phase is "compose" so I suppose I should have said "composer" :-) Have fun, Oren Ben-Kiki |