Joe Lapp [mailto:joe@...] wrote:
> I think we do need a low-level incremental API, but I don't
> think it should be random access.
Well, we disagree. Somewhat. The low-level incremental API
should allow for both push and pull modes; it should be
possible to write efficient single-pass programs based on
it. But I'd also like to be able to easily mix it with
the "DOM" API. Specifically, I'd rather view the low-level
API as an iterator (pull)/visitor (push) applied to it,
with the possibility for creative mixing of all three
modes - node access, iterator and visitor.
Clark has done a great job in unifying the iterator and
visitor APIs into a coherent scheme. I feel that with a
little bit of effort it should be possible to unify it
with the node API. But I haven't had the time to really
work on it (I focused on the format spec). What I'd like
to do is get the format spec out of the way and focus on
this issue... Perhaps I should take some time this
weekend to work on it anyway.
> Once it's random access it
> has an object tree/graph. A layer that builds on it may
> choose to discard that tree/graph, making waste of all the
> time and memory given to the low-level objects. For example,
> the YamlParser you describe would likely throw it all away.
There's a neat "sliding window" approach you can take which
avoids most of this. A "streaming parser" could only keep
the nodes along the path from the root to the current node
at any point in time...
> I agree that the high-level APIs test the value of the model,
> but from the little coding exploration I just did, I find it
> impossible to write the high-level APIs without the low-level
> ones. I think the low-level ones ought to be a brainless
> barebones node extractor. You have to extract a node's
> information before you can build something from it.
Obviously you need a low-level API in order to write a
higher level one. What I suggested was that we don't settle
on "the" low-level API yet, even though we write one (several,
in fact). It is much easier to settle on "the" highest-level
API (load/save) first.
> Are you sure you want a Vector/Hash/String implementation?
As a start, yes. That's because it trivially allows every
YAML document to be read into the system.
> What about an array/member-variable/String implementation?
This can't read arbitrary YAML documents. However, if
doing something like JSX, then you'd be able to round-trip
Java objects this way, which would be nice.
> What about an implementation that allows for traversing
> arbitrarily sized (or even unending) YAML streams?
That would be a future goal, requiring the "incremental" API.
> If a YAML
> scalar is somehow representing binary data (that is not a
> serialized class), wouldn't you want to deserialize this into
> a byte?
If I was forced to treat an object as a blob, I'd convert
it to byte, base64 it, and emit that together with an
accompanying class name.
> What do you think of allowing many different serialization options?
I don't think there are that many, actually. String, Vector
and Hash have a natural (de)serialization. So that's the
first thing. Next we can layer a (de)serialization mechanism
on top of that - ideally interacting with Java's (JSX-like).
That's all there's to it, really.
> >... Think of classes such as 'Date' - I doubt Perl
> >would like it to
> >be called java.lang.Date, right?
> Hmmm. How far do you take this? I'm all for it, but I see
> YAML-schema creeping in.
The way I see it, every application has its vocabulary of
types, formats, valid YAML elements, valid YAML structure,
etc. Call this "schema" for lack of a better word. It is
spread around the application in various ways. The fact
we haven't created a "YAML-Schema" spec doesn't change this
The mapping from class-in-YAML-document to class-in-native-code
is definitely a part of the application's "schema". If/when
we should formalize "schemas", this will be an issue, yes.
> >A much simpler way would be to avoid the serialization
> >mechanism and instead
> >require some sort of a YAML specific interface from the
> >classes [...]
> I like it! I think we ought to do registered
> serializers/deserializers initially, if only because it gives
> us all the functionality without giving us all the headache.
Yes, but interacting with the Java serialization is so
much more powerful and elegant :-) and more work :-(
> I think it's safe to finalize version 1 if there's enough
> implementation experience for everything in version 1. I
> don't think we want to finalize anything before we have
> proven implementations. We could make that another lesson
> learned from the W3C. I know work is already underway in
> other languages, so there's no need to wait for the Java.
If you are hinting at the serialization issue, then note
Brian is working on YAML.pm and he's definitely going
to include serialization in there! He wants to be able
to round trip arbitrary perl data - which also means
he'll be using references (for graphs). So there's
nothing in the spec which isn't going to be implemented.