Re: [Yaml-core] Streams vs. Trees

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Tue, Jun 05, 2001 at 11:43:57PM -0700, Jason Diamond wrote:
| 
| >   public interface IYamlFlatIterator
| >   {
| >     YamlNodeType NodeType { get; }
| >     string Key { get; }
| >     int Read(...);
| >
| > *   bool Next();
| > *   IYamlFlatIterator FirstChild();
| > *   IYamlFlatIterator ParentIterator();
| >   }
| 
| Given following example code, how can you possibly implement this interface
| so that it's forward only over the YAML document?
| 
| IYamlFlatIterator document = new YamlParser("foo.yaml");
| IYamlFlatIterator child1 = document.FirstChild();
| IYamlFlatIterator parent = child.ParentIterator();
| IYamlFlatIterator child2 = parent.FirstChild();

Ahh.  Yes, there is an implicit assumption that
FirstChild() can only be called once on a Map or List, 
if it is called a Scalar or a second time, then a
FunctionSequenceError, or NodeNotSequence or 
ForwardOnlyException would be raised...

Sorry for not making this clear.  This is very similar 
to the fact that ReadCharacters() is called until
it returns 0 (no more characters), right?  

| Parsing is always done with at least two layers. Lexical analysis and then
| syntactic analysis. Usually we add a third layer--semantic analysis--but
| that's application specific so we won't go there.
|
| Lexical analysis (scanning, tokenizing, whatever) is always (to my
| knowledge) exposed through a forward-only sequence of tokens (not counting
| lookahead). Syntactic analysis (often referred to as just parsing) usually
| builds the tree so that it can be processed later. These are basically the
| two layers that I'm talking about.

Right.  We are on the same page... this interface should not
provide "random access". If you add this limitation, then
the above is sequential access (plus the current ancestor stack).

| What I'm trying to ensure is that we have the flexibility to build the kind
| of tree that we want to build without building some intermediary and
| sometimes unnecessary tree in the middle.

Yep!  We are on the same page here!

| Did my tokenizer/parser analogy above make sense? You can't parse without
| tokenizing. You can't build a YAML tree without knowing what type of node
| you're currently pointing at in the middle of a character stream. We need
| both of these layers, regardless. If we expose the lower level interface, we
| give the developers the ability to choose how they process YAML data. Some
| data can be processed more efficiently as a stream and some as a tree. We
| can't make that decision for them.

Once again, we agree.  However, I'd like to have the 
interface "identical".  

Perhaps you could "allow" FirstChild() to be called
more than once if the input source had random access.
So... FirstChild() could be allowed to throw a
ForwardOnly exception, but wouldn't be mandated to
do so.  Hmmm.

| To me, though, Iterator implies that you can move in only
| one direction: forward. I would gladly change the interface
| name to use Iterator instead of Reader.

Iterator has the same implication for me as well.  ;)

| 
| I absolutely agree with this and would go so far to say 
| that if more than one interface was required to read or 
| iterate a stream of YAML nodes of different types then 
| the design is broken. But iterating a stream isn't as a
| capable as navigating a tree and so requires a restricted 
| interface.

Right. 

| >   Why?  Fundamentally, they are both forward-only
| >   sequential access interfaces.
| 
| The ParentIterator() method effectively makes it a random access iterator
| when looking at the document as a whole. Am I missing something?

Yep.  FirstChild() is only callable once... *smile*

| The hierarchical iterator is a lower level interface simply because it does
| not impose a higher level structure on the data source--not because it's
| more efficient or easier to implement.

The only thing which the Hierarchy of Iterators implies
is that the stack of (Type, Index/Key, Anchor) is available.
This is a rather minimal amount of informaiton.

| We need a minimum of two interfaces: stream and tree.

Ok.  First, YAML isn't a tree, it's a graph.  Thus 
a "Parent" object may exist on a given node, but it
would have to throw a "MultipleParent" exception if
the node had two incoming arrows.   The "Parent" 
property *only* makes sense in the context of a given
iterator.

Second, by designing our Iterator well, we can 
merge both interfaces...  so that the random access
interface is an *extension* of the sequential 
access interface.  

SAX implements the visitor pattern.  It is at the
*same* level that you are focusing on, only that
it is a push interface instead of a pull interface.

The difference between push or pull is who has
the "while loop".  In push, it is the producer,
in pull it is the consumer. 

| The Visitor pattern is definitely my favorite from 
| the GoF

It's great book isn't it!

| I need to look at your Event class more as I don't quite 
| grasp how it fits into things as of yet.

Yes!

| I, too, only wish to see two simple interfaces exposed. 
| The names that we're using are unfortunate, however, and 
| it may be clouding the issue. I prefer to think of them 
| as the stream-based interface and the tree-based interface.

Understood.  I'm trying to have a *single* interface
that can be used to (forward-only) *iterate* over a 
random access structure as well as over an incoming 
text stream.

| If we wanted to give these interfaces names, Iterator strikes me as being
| forward only and appropriate for streams whereas something like Navigator
| might be more appropriate for an "iterator" over a tree. All it would take
| to turn an Iterator into a Navigator would be to extend it with a single
| Parent property--thus enabling random access to the whole tree.

The problem isn't the parent property, which only gives
you access to the ancestor stack.  The problem is the
FirstChild() method appears as if it can be called twice
on the same sequence (map or list) node.  

Certainly the ( Type, Key/Index, Anchor) tuple on
the stack may take some memory... but not enough
to be concerned about.  And it certainly doesn't
give random access! 

| No, thank you for allowing me to try to contribute.

Well... alot of implementers implement instead of
humoring fellas like me.  I implement too... but
often after I've talked the subject to death.

Kind Regards,

;) Clark