| For scripting language tools I am just interested in "push" style
| parsing and "push" style emitting. (As opposed to "pull").
in this interface, the sender has control of
the program stack, and the caller typically
registers 'callback' structure
in this interface, the receiver has control
of the program stack, and the sender provides
a 'iterator' structure
The standard Unix IO interface uses a 'pull' model for
the read functions, and a 'push' model for write operations,
with an end result that the application has the
it is a cakewalk to change 'pull' interface into a
'push' interface, a standard YAML library could provide
such an adapter with very little effort
converting from 'push' to 'pull' requires that either
the whole tree is read into memory, or that a threaded
solution (with inter-thread communication) is needed
it is much easier to code if you can use the program
stack and don't have to keep your state in a 'heap'
object (such as a callback or iterator structure)
| This means that when you have a situation like this:
| parser ==> application ==> emitter
| the parser pushes data to the application (in the form of callbacks) and
| the application pushes data to the emitter (in the form of method calls)
| Since processing is always initiated by the application; the application
| has to register event-callbacks to the parser and then turn over control
| to the parser. In contrast, the emitter functions are simply called to
| by the application.
| A "pull" parser environment, by contrast, requires that the application
| simply ask for the next event and then the application needs to decide
| what to do with the event.
| For the purposes of user API's, I tend to like push parsers better.
| (Clark probably has something to say here :)
I suggest that the parser API be a 'pull' iterator, and that
the emitter API be a 'push' callback-structure. If someone
*really* wants to use a 'push' interface, the library could
also provide an adapter which converts them:
while (nextnode = pull-iterator.next())
A real 'converter' isn't more than say 20-30 lines of very
reuseable code... the one I wrote for a prototype interface
over two years ago was around 20 lines of C.
| Here is my idea of the API I would like to see for a parser:
| - start_stream()
| - end_stream()
| - start_document(directives)
| - end_document()
| - start_mapping(anchor, typeuri)
| - end_mapping()
| - start_key()
| - end_key()
| - start_sequence(anchor, typeuri)
| - end_sequence()
| - start_entry()
| - end_entry()
| - start_scalar(anchor, typeuri, string)
| - more_scalar(string)
| - end_scalar()
| - node_alias(anchor)
Hmm. I'd report key/value/items as a single event
without the start/end pair. Each of these events
can have a 'contination' flag if the given scalar
For a pull-iterator interface there are two options:
a single iterator which is hierarchical -- in this case,
only a next() function is provided, when the next()
function returns a sequence/mapping, then following
calls to next() give the children, untill a special
'endBranch' marker is returned
a hierarchy of flat iterators -- in this case, a given
iterator only visits children of a given node (but not
grand children); each 'tree' node has a 'getChildren()'
which returns an iterator to visit its children; this
allows the application to skip over children if it
wishes ... the getChildren() call may throw an exception
if the parent's next() is used... ie, if the caller
has moved on to the next collection.
Of the two, the latter is the most user-friendly, but
requires a bit of use, further the latter is exactly
the interface you'd want for a random-access data
structure; the only difference between the sequential-access
pull structure and the random-access structure is that
some methods can raise a AlreadyPassed() event or some
other notion that the information requested has already
been asked for or skipped.
So, in conclusion, I'd like...
1. A pull interface using a hierarchy of flat iterators
for the parser
2. A single hierararchical callback interface for
3. A 'no-op' converter which pulls from the iterators
and pushes to the callbacks
But then again, I'm not writing the parsers... ;)
| The nice thing about this API is that it reads perfectly as an Emitter
| API as well. And that is important, because chaining parsers to emiiters
| to form "filters" is important.
Yes, but you don't need to limit people to just a 'push'
interface. Many times a pull interface, or even a pull
based filter is useful...
Parser -> (pull) -> Converter -> (push) -> Emitter
PULL -> (pull-filter) ->
PULL -> (pull/push app) ->
PUSH -> (push-filter) ->
PUSH -> emitter
In this way you have the best of both worlds, depending
on your requirements...
That said, I won't fault you for a push based parser ... it
is significantly easier than a pull parser. *winks*