From: Brian I. <in...@tt...> - 2003-03-12 05:23:08
|
Since I am now interested in pluggable parsers and emitters, it behooves us to discuss a standard API for these. For scripting language tools I am just interested in "push" style parsing and "push" style emitting. (As opposed to "pull"). This means that when you have a situation like this: parser ==> application ==> emitter the parser pushes data to the application (in the form of callbacks) and the application pushes data to the emitter (in the form of method calls) Since processing is always initiated by the application; the application has to register event-callbacks to the parser and then turn over control to the parser. In contrast, the emitter functions are simply called to by the application. A "pull" parser environment, by contrast, requires that the application simply ask for the next event and then the application needs to decide what to do with the event. For the purposes of user API's, I tend to like push parsers better. (Clark probably has something to say here :) Here is my idea of the API I would like to see for a parser: - start_stream() - end_stream() - start_document(directives) - end_document() - start_mapping(anchor, typeuri) - end_mapping() - start_key() - end_key() - start_sequence(anchor, typeuri) - end_sequence() - start_entry() - end_entry() - start_scalar(anchor, typeuri, string) - more_scalar(string) - end_scalar() - node_alias(anchor) optional optimizations: - scalar_key(string): - start_key() - start_scalar(NULL, NULL, string) - end_scalar() - end_key() - scalar_value(string): - start_scalar(NULL, NULL, string) - end_scalar() Also some or all of the end_* methods might be reduced to: - end_scope() Then the parser wouldn't have to assign types to ending tokens. The nice thing about this API is that it reads perfectly as an Emitter API as well. And that is important, because chaining parsers to emiiters to form "filters" is important. Thoughts? Cheers, Brian |
From: Clark C. E. <cc...@cl...> - 2003-03-12 08:19:24
|
| For scripting language tools I am just interested in "push" style | parsing and "push" style emitting. (As opposed to "pull"). definitions: push: > in this interface, the sender has control of the program stack, and the caller typically registers 'callback' structure pull: > in this interface, the receiver has control of the program stack, and the sender provides a 'iterator' structure notes: - > The standard Unix IO interface uses a 'pull' model for the read functions, and a 'push' model for write operations, with an end result that the application has the program stack - > it is a cakewalk to change 'pull' interface into a 'push' interface, a standard YAML library could provide such an adapter with very little effort - > converting from 'push' to 'pull' requires that either the whole tree is read into memory, or that a threaded solution (with inter-thread communication) is needed - > it is much easier to code if you can use the program stack and don't have to keep your state in a 'heap' object (such as a callback or iterator structure) | This means that when you have a situation like this: | | parser ==> application ==> emitter | | the parser pushes data to the application (in the form of callbacks) and | the application pushes data to the emitter (in the form of method calls) | | Since processing is always initiated by the application; the application | has to register event-callbacks to the parser and then turn over control | to the parser. In contrast, the emitter functions are simply called to | by the application. | | A "pull" parser environment, by contrast, requires that the application | simply ask for the next event and then the application needs to decide | what to do with the event. | | For the purposes of user API's, I tend to like push parsers better. | (Clark probably has something to say here :) I suggest that the parser API be a 'pull' iterator, and that the emitter API be a 'push' callback-structure. If someone *really* wants to use a 'push' interface, the library could also provide an adapter which converts them: convert(pull-iterator, push-callback) while (nextnode = pull-iterator.next()) push-callback(nextnode) A real 'converter' isn't more than say 20-30 lines of very reuseable code... the one I wrote for a prototype interface over two years ago was around 20 lines of C. | Here is my idea of the API I would like to see for a parser: | | - start_stream() | - end_stream() | - start_document(directives) | - end_document() | - start_mapping(anchor, typeuri) | - end_mapping() | - start_key() | - end_key() | - start_sequence(anchor, typeuri) | - end_sequence() | - start_entry() | - end_entry() | - start_scalar(anchor, typeuri, string) | - more_scalar(string) | - end_scalar() | - node_alias(anchor) Hmm. I'd report key/value/items as a single event without the start/end pair. Each of these events can have a 'contination' flag if the given scalar won't fit... ... For a pull-iterator interface there are two options: a single iterator which is hierarchical -- in this case, only a next() function is provided, when the next() function returns a sequence/mapping, then following calls to next() give the children, untill a special 'endBranch' marker is returned a hierarchy of flat iterators -- in this case, a given iterator only visits children of a given node (but not grand children); each 'tree' node has a 'getChildren()' which returns an iterator to visit its children; this allows the application to skip over children if it wishes ... the getChildren() call may throw an exception if the parent's next() is used... ie, if the caller has moved on to the next collection. Of the two, the latter is the most user-friendly, but requires a bit of use, further the latter is exactly the interface you'd want for a random-access data structure; the only difference between the sequential-access pull structure and the random-access structure is that some methods can raise a AlreadyPassed() event or some other notion that the information requested has already been asked for or skipped. So, in conclusion, I'd like... 1. A pull interface using a hierarchy of flat iterators for the parser 2. A single hierararchical callback interface for the emitter 3. A 'no-op' converter which pulls from the iterators and pushes to the callbacks But then again, I'm not writing the parsers... ;) | The nice thing about this API is that it reads perfectly as an Emitter | API as well. And that is important, because chaining parsers to emiiters | to form "filters" is important. Yes, but you don't need to limit people to just a 'push' interface. Many times a pull interface, or even a pull based filter is useful... Parser -> (pull) -> Converter -> (push) -> Emitter parser -> PULL -> (pull-filter) -> PULL -> (pull/push app) -> PUSH -> (push-filter) -> PUSH -> emitter In this way you have the best of both worlds, depending on your requirements... ... That said, I won't fault you for a push based parser ... it is significantly easier than a pull parser. *winks* ;) Clark |
From: Clark C. E. <cc...@cl...> - 2003-03-12 09:21:37
|
On Wed, Mar 12, 2003 at 08:35:52AM +0000, Clark C. Evans wrote: | | For scripting language tools I am just interested in "push" style | | parsing and "push" style emitting. (As opposed to "pull"). | | I'd like... | | 1. A pull interface using a hierarchy of flat iterators | for the parser | 2. A single hierararchical callback interface for | the emitter | 3. A 'no-op' converter which pulls from the iterators | and pushes to the callbacks ... | | That said, I won't fault you for a push based parser ... it | is significantly easier than a pull parser. *winks* I meant that it is harder to write a pull parser. Basically, implementing pull parser's arn't that hard conceptually (although it requires dicipline); you just have to simulate your own call-stack and execute method... so that you can resume your 'call-stack' later on when the iterator's next() method is called. I'll post what I mean with Python code in a few days... ;) Clark |
From: Brian I. <in...@tt...> - 2003-03-12 17:08:58
|
On 12/03/03 08:35 +0000, Clark C. Evans wrote: > | For scripting language tools I am just interested in "push" style > | parsing and "push" style emitting. (As opposed to "pull"). > > definitions: > push: > > in this interface, the sender has control of > the program stack, and the caller typically > registers 'callback' structure > pull: > > in this interface, the receiver has control > of the program stack, and the sender provides > a 'iterator' structure Sure. I was just trying to phrase the descriptions without using terms like "program stack", which might not be familiar to all. > notes: > - > > The standard Unix IO interface uses a 'pull' model for > the read functions, and a 'push' model for write operations, > with an end result that the application has the > program stack > - > > it is a cakewalk to change 'pull' interface into a > 'push' interface, a standard YAML library could provide > such an adapter with very little effort > - > > converting from 'push' to 'pull' requires that either > the whole tree is read into memory, or that a threaded > solution (with inter-thread communication) is needed For parsing? Could you explain why? It seems like you just need to have one or more events in your stack (from push) to return in your pull interface. No? > - > > it is much easier to code if you can use the program > stack and don't have to keep your state in a 'heap' > object (such as a callback or iterator structure) > > | This means that when you have a situation like this: > | > | parser ==> application ==> emitter > | > | the parser pushes data to the application (in the form of callbacks) and > | the application pushes data to the emitter (in the form of method calls) > | > | Since processing is always initiated by the application; the application > | has to register event-callbacks to the parser and then turn over control > | to the parser. In contrast, the emitter functions are simply called to > | by the application. > | > | A "pull" parser environment, by contrast, requires that the application > | simply ask for the next event and then the application needs to decide > | what to do with the event. > | > | For the purposes of user API's, I tend to like push parsers better. > | (Clark probably has something to say here :) > > I suggest that the parser API be a 'pull' iterator, and that > the emitter API be a 'push' callback-structure. If someone > *really* wants to use a 'push' interface, the library could > also provide an adapter which converts them: > > convert(pull-iterator, push-callback) > while (nextnode = pull-iterator.next()) > push-callback(nextnode) > > A real 'converter' isn't more than say 20-30 lines of very > reuseable code... the one I wrote for a prototype interface > over two years ago was around 20 lines of C. Well let's get a few things clear. I am talking about the standard API I am going to require for Perl plugins. But just because YAML::Parser::libyaml is a push parser, doesn't require libyaml itself to be one. > > | Here is my idea of the API I would like to see for a parser: > | > | - start_stream() > | - end_stream() > | - start_document(directives) > | - end_document() > | - start_mapping(anchor, typeuri) > | - end_mapping() > | - start_key() > | - end_key() > | - start_sequence(anchor, typeuri) > | - end_sequence() > | - start_entry() > | - end_entry() > | - start_scalar(anchor, typeuri, string) > | - more_scalar(string) > | - end_scalar() > | - node_alias(anchor) > > Hmm. I'd report key/value/items as a single event > without the start/end pair. Each of these events > can have a 'contination' flag if the given scalar > won't fit... Think past scalars. Keys can be collections. > ... > > For a pull-iterator interface there are two options: > > a single iterator which is hierarchical -- in this case, > only a next() function is provided, when the next() > function returns a sequence/mapping, then following > calls to next() give the children, untill a special > 'endBranch' marker is returned > > a hierarchy of flat iterators -- in this case, a given > iterator only visits children of a given node (but not > grand children); each 'tree' node has a 'getChildren()' > which returns an iterator to visit its children; this > allows the application to skip over children if it > wishes ... the getChildren() call may throw an exception > if the parent's next() is used... ie, if the caller > has moved on to the next collection. > > Of the two, the latter is the most user-friendly, but > requires a bit of use, further the latter is exactly > the interface you'd want for a random-access data > structure; the only difference between the sequential-access > pull structure and the random-access structure is that > some methods can raise a AlreadyPassed() event or some > other notion that the information requested has already > been asked for or skipped. > > So, in conclusion, I'd like... > > 1. A pull interface using a hierarchy of flat iterators > for the parser > 2. A single hierararchical callback interface for > the emitter > 3. A 'no-op' converter which pulls from the iterators > and pushes to the callbacks > > But then again, I'm not writing the parsers... ;) True that. Cheers, Brian |