On 28/02/04 13:10 +0200, Oren Ben-Kiki wrote:
> Brian wrote:
> > To get the right thing, you'd need to do:
> >
> > use YAML;
> > use YAML::int;
> > use YAML::str;
> > $map = {int => 12, str => '13'};
> > YAML::int->tag($map->{int});
> > YAML::str->tag($map->{str});
> > print Dump $map;
> >
> > Producing:
> >
> > ---
> > int: 12
> > str: '13'
>
> Two questions.
>
> First: If I load the above document, using YAML::int, doesn't YAML::int
> call "YAML::int->tag" automatically for all the integer nodes created by
> "Load"? I thought that was the whole point of using it. If it doesn't do
> that, what _does_ it do?
YAML::int is intended to load a YAML implicit integer as a real Perl
integer. And doing this is almost pointless from a Perl applications
point of view. I suppose you're right, it should shadow all the loaded
integers.
But this doesn't help the N->Y case. Example:
use YAML;
use YAML::int;
print Dump {int => 12, str => '13'};
yields:
---
int: 12
str: 13
To get the desired result I can do one of two things:
1) I can force the programmer to call YAML::int->tag on the integers.
2) I can use C to determine if the scalar was stored as an integer.
I kind of like 2 better, because most of the time it just works. Like in
the example above it would just work. If the programmer happened to use
the integer in string context, then she would have to tag it back. But
for the most part it option 2 would result in less burden on the
programmer.
> Second: If I use YAML::int, doesn't YAML.pm call "YAML::str->tag" for
> all the plain scalars that were _not_ integers? Or, alternatively, just
> implicitly assume on output that all scalars that are not
> "Something->tag"-ed are strings?
When YAML::int (or any other YAML casting class) is used, *all* nodes
are passed to it for casting.
It would probably call YAML::str->tag on all the scalars that looked
(regexp) like integers but actually were not (C API). And that would be
a cue to the dumper to quote those strings. And untagged scalars would
be emitted plain.
> It seems to me that the answer for both questions should be "yes". The
> semantics I suggest are:
>
> Case 0: If I don't use YAML::int or anything like that, then the parser
> implicitly assumes Perl semantics. All implicit scalars are loaded and
> dumped as strings, no tags are necessary, etc. If I load and dump the
> above document in this mode, '13' will be converted to 13, and that's
> OK.
+1
> Case 1: If I use YAML::int and similar modules, then on loading the
> parser invokes "YAML::int->tag" for all integer nodes it creates. On
> output, it emits integers as plain scalars. Scalars that are _not_
> shadow-tagged, and that could be misinterpreted as integers, are quoted;
> otherwise they may be emitted as plain. So '13' will remain quoted but
> 'foo' may lose its quotes.
+1 although things that could be misinterpreted would be shadow-tagged
as str. ie None of these heuristics happen at the emitter level. They
are all handled by either the application or the casting classes.
Here is my view of how and where things happen.
For loading YAML:
0) The application loads/registers the appropriate casting classes.
1) The parser returns cooked string values, tags, and the plain-
flag.
2) The loader passes each node to each casting class in order, until
one of them performs a cast. The result is loaded into the graph.
The casting class is allowed to perform a side-effect (like shadow-
tagging) as part of the cast.
For dumping to YAML:
0) The application loads/registers the appropriate casting classes.
00) The application may also perform any shadowing/tagging it deems
necessary before the dump.
1) The dumper passes each node to each casting class in order, until
one of them performs a cast.
11) The dumper takes into consideration, results of casts and also
any shadowing done by the application; it then makes appropriate
calls to the emitter.
2) The emitter takes dump calls consisting of a string value, a tag,
and a plain-flag.
22) The emitter can then do heuristics to determine the specific
quoting style, how it wants to uncook the string value, etc. But
it must obey the plain-flag. And it should have no direct
intervention with the casting classes or the application shadow
tables. All that work has been done by then.
Doing things this way, lets you hook a parser to a emitter with no
loader or dumper involved. It is a clean separation of concepts.
> Programmatically case 0 and case 1 are just one case, where there's a
> list of implicit types that's empty in case 0 and non-empty in case 1.
>
> Case 2: If I use YAML::int and similar modules, but also YAML::str, then
> on loading the parser shadow-tags *all* scalar nodes, using either
> "YAML::int->tag" or "YAML::str->tag". On output, the parser complains if
> it encounters a node that is not explicitly shadow-tagged. It emits
> integers as plain scalars, and quotes all strings that would be mistaken
> for integers.
I can see what you want to accomplish, but I would want to think more
about this specific manner of accomplishing it. Anyway, see below.
> Programmatically case 2 is almost the same as case 1 except for a
> additional "Croak" calls here and there verifying that all scalar nodes
> are properly tagged.
>
> > Now that is ghastly.
>
> What is "ghastly" is creating a document (or a value) with a non-Perl
> schema from scratch, or performing "major" modifications (that change
> the type) on a node within such a document. These require explicitly
> tagging the scalar nodes (in case 1, just the non-str nodes, in case 2,
> all the nodes). So you can't "trivially" write a non-Perl schema from a
> Perl program. Hardly a surprise.
>
> Read-only operations are always trivial (using case 0, or possibly case
> 1 if you are interested in the non-Perl schema distinctions). So is safe
> round-tripping. Just "use YAML::plain". It behaves like YAML::int, but
> unlike it, YAML::plain matches every value. Given the semantics of case
> 1 or case 2 above, emitting the loaded document will preserve the plain
> vs. non-plain status of each scalar node. This allows you to trivially
> write a YAML pretty printer in Perl and be 100% certain that no matter
> what the original schema is, the semantics will be preserved.
>
> > Sigh. This seems like a fragile way to do things, but it is
> > explainable. I also hate to introduce C but I can figure that
> > part out. (The C is easy; keeping it from hindering YAML.pm
> > adoption is harder).
>
> I don't see that using C will solve the problem, because it goes beyond
> integers (and floats). It also affects *any* type that anyone may ever
> decide to use as an implicit type (Booleans, prices, whatever). The only
> safe solution is something along the lines I described above. Which
> doesn't require C at all.
No. You're not getting it. Booleans, prices, whatever get loaded into
objects. Then I can key off the object class to determine what to do.
But the Perl SV is a differnt animal. It's three types in one, and you
can't tell it's state without going to C. And even if you go to C and
find out, you have to *assume* that since the application last used this
value as a string, then it was meant to be *typed* as a string.
> > It all comes down to specific use cases. The programmer must
> > be aware of what she is trying to accomplish, and use the
> > right settings. There is no perfect setting that always does
> > the right thing.
>
> +1 on that.
And I'm not sure that boiling everything down to 3 use cases today, is
really worth the cycles. Let's just see what use cases arise.
> > Hopefully I will be able to keep the easy use cases easy and
> > the hard use cases possible. But then again I'm a Perl programmer :)
>
> The 3 "settings" I suggest above should do just that:
>
> Case 0 is the "Perl Native" schema case, which should be the most common
> (for Perl programmers, that is). It is as easy as it gets - just "use
> YAML.pm" and ignore shadows and tags. As you pointed out, any implicit
> type that is implemented as an object (e.g. dates) should work fine in
> this case.
>
> Case 1 is the "Lazy non-Perl" case, which should be less common. It
> minimizes the pain (you only tag implicit non-str types that aren't
> implemented as objects). For read-only applications, there's no pain at
> all. Using YAML::plain, it allows you to trivially implement generic
> YAML tools such as Y-Pretty-Print.
>
> Case 3 is the "Strict non-Perl" case, which would be the least common.
> It incurs the most pain (tagging all scalars), and would only be used by
> people doing things like schema processing that want to ensure no
> untagged nodes slip through the cracks of heavy structural
> transformations and so on.
>
> There's no C required, it satisfies the YAML spec requirements, and it
> basically requires only a single implementation mechanism (with a few
> conditional Croaks for the strict people).
>
> Have fun,
>
> Oren Ben-Kiki
>
>
>
> -------------------------------------------------------
> SF.Net is sponsored by: Speed Start Your Linux Apps Now.
> Build and deploy apps & Web services for Linux with
> a free DVD software kit from IBM. Click Now!
> http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
> _______________________________________________
> Yaml-core mailing list
> Yaml-core@...
> https://lists.sourceforge.net/lists/listinfo/yaml-core
|