From: Clark C . E. <cc...@cl...> - 2001-06-06 12:46:58
|
On Tue, Jun 05, 2001 at 11:43:57PM -0700, Jason Diamond wrote: | | > public interface IYamlFlatIterator | > { | > YamlNodeType NodeType { get; } | > string Key { get; } | > int Read(...); | > | > * bool Next(); | > * IYamlFlatIterator FirstChild(); | > * IYamlFlatIterator ParentIterator(); | > } | | Given following example code, how can you possibly implement this interface | so that it's forward only over the YAML document? | | IYamlFlatIterator document = new YamlParser("foo.yaml"); | IYamlFlatIterator child1 = document.FirstChild(); | IYamlFlatIterator parent = child.ParentIterator(); | IYamlFlatIterator child2 = parent.FirstChild(); Ahh. Yes, there is an implicit assumption that FirstChild() can only be called once on a Map or List, if it is called a Scalar or a second time, then a FunctionSequenceError, or NodeNotSequence or ForwardOnlyException would be raised... Sorry for not making this clear. This is very similar to the fact that ReadCharacters() is called until it returns 0 (no more characters), right? | Parsing is always done with at least two layers. Lexical analysis and then | syntactic analysis. Usually we add a third layer--semantic analysis--but | that's application specific so we won't go there. | | Lexical analysis (scanning, tokenizing, whatever) is always (to my | knowledge) exposed through a forward-only sequence of tokens (not counting | lookahead). Syntactic analysis (often referred to as just parsing) usually | builds the tree so that it can be processed later. These are basically the | two layers that I'm talking about. Right. We are on the same page... this interface should not provide "random access". If you add this limitation, then the above is sequential access (plus the current ancestor stack). | What I'm trying to ensure is that we have the flexibility to build the kind | of tree that we want to build without building some intermediary and | sometimes unnecessary tree in the middle. Yep! We are on the same page here! | Did my tokenizer/parser analogy above make sense? You can't parse without | tokenizing. You can't build a YAML tree without knowing what type of node | you're currently pointing at in the middle of a character stream. We need | both of these layers, regardless. If we expose the lower level interface, we | give the developers the ability to choose how they process YAML data. Some | data can be processed more efficiently as a stream and some as a tree. We | can't make that decision for them. Once again, we agree. However, I'd like to have the interface "identical". Perhaps you could "allow" FirstChild() to be called more than once if the input source had random access. So... FirstChild() could be allowed to throw a ForwardOnly exception, but wouldn't be mandated to do so. Hmmm. | To me, though, Iterator implies that you can move in only | one direction: forward. I would gladly change the interface | name to use Iterator instead of Reader. Iterator has the same implication for me as well. ;) | | I absolutely agree with this and would go so far to say | that if more than one interface was required to read or | iterate a stream of YAML nodes of different types then | the design is broken. But iterating a stream isn't as a | capable as navigating a tree and so requires a restricted | interface. Right. | > Why? Fundamentally, they are both forward-only | > sequential access interfaces. | | The ParentIterator() method effectively makes it a random access iterator | when looking at the document as a whole. Am I missing something? Yep. FirstChild() is only callable once... *smile* | The hierarchical iterator is a lower level interface simply because it does | not impose a higher level structure on the data source--not because it's | more efficient or easier to implement. The only thing which the Hierarchy of Iterators implies is that the stack of (Type, Index/Key, Anchor) is available. This is a rather minimal amount of informaiton. | We need a minimum of two interfaces: stream and tree. Ok. First, YAML isn't a tree, it's a graph. Thus a "Parent" object may exist on a given node, but it would have to throw a "MultipleParent" exception if the node had two incoming arrows. The "Parent" property *only* makes sense in the context of a given iterator. Second, by designing our Iterator well, we can merge both interfaces... so that the random access interface is an *extension* of the sequential access interface. | > Issue #3: Push vs Pull | > ~~~~~~~~~ | > | > I'm glad you understand the need for a pull | > interface. This is great. I hope you understand | > the need for a push interface as well, right? | > The printer (emitter) should be using the Visitor | > interface. Is this clear? | | Yes and I have no problems with it. But it operates | at a higher level than I was focusing on. SAX implements the visitor pattern. It is at the *same* level that you are focusing on, only that it is a push interface instead of a pull interface. The difference between push or pull is who has the "while loop". In push, it is the producer, in pull it is the consumer. | The Visitor pattern is definitely my favorite from | the GoF It's great book isn't it! | I need to look at your Event class more as I don't quite | grasp how it fits into things as of yet. Yes! | I, too, only wish to see two simple interfaces exposed. | The names that we're using are unfortunate, however, and | it may be clouding the issue. I prefer to think of them | as the stream-based interface and the tree-based interface. Understood. I'm trying to have a *single* interface that can be used to (forward-only) *iterate* over a random access structure as well as over an incoming text stream. | If we wanted to give these interfaces names, Iterator strikes me as being | forward only and appropriate for streams whereas something like Navigator | might be more appropriate for an "iterator" over a tree. All it would take | to turn an Iterator into a Navigator would be to extend it with a single | Parent property--thus enabling random access to the whole tree. The problem isn't the parent property, which only gives you access to the ancestor stack. The problem is the FirstChild() method appears as if it can be called twice on the same sequence (map or list) node. Certainly the ( Type, Key/Index, Anchor) tuple on the stack may take some memory... but not enough to be concerned about. And it certainly doesn't give random access! | No, thank you for allowing me to try to contribute. Well... alot of implementers implement instead of humoring fellas like me. I implement too... but often after I've talked the subject to death. Kind Regards, ;) Clark |