From: Andreas L. <no...@sb...> - 2001-10-21 20:29:03
|
On Sun, Oct 21, 2001 at 08:31:01PM +0000, Franck Arnaud wrote: > > Regarding the XML parser interface discussions, I agree with > Andreas that the bridge pattern seems to be a bit overkill > and leads to (interface) code duplication, plus it makes > reading the code a bit of a pain (OO spaghetti), especially > because of the callback interface of the parser as it goes > through indirections in all directions. > > In Nenie XML, I have a similar event interface without the > bridge pattern. Still it suffers from some issues mentionned > by Berend. Very interesting idea. What do you think about not using your propriatary XE_INTERFACE as a _streaming source_, but rather KI_OUTPUT_STREAM? Or rather let XE_INTERFACE inherit from KI_OUTPUT_STREAM. The generic type for KI_OUTPUT_STREAM would then be some kind of (named or unamed) TUPLE that holds all parameters. > First, you inherit from a specific implementation (not an > issue in nxml but not nice generally). > > Also, I do various post-processing layers through the > inherited event interface. These layers add more event > procedures, sometimes building on parent events with > Precursor but not always very clearly. This is cumbersome, > layers really have dependencies -- you cannot really > reorder them -- and combining them is not very convenient. > In the end the layering is not very reusable. > > I had some loose thoughts about trying to solve those > issues differently, and with the discussion on this list > starting, I actually wrote a prototype yesterday which > may help. It is available at: > > http://www.nenie.org/eiffel/xml/xmlprototype-oct2001.zip > > It's only compiled with SmallEiffel, not extensively > tested, and uses Nenie XML in one class, but the point > is to show the design rather than the details: it's code > for reading rather than executing (but it does run). You > don't need to know Nenie XML to understand it. > > The two ideas I have implemented are: > > (1) Event processing layer composition is implemented with > delegation rather than inheritance. > > (2) The event interface is simplified to one callback > routine with simple types (enumerations+string). > > Both ideas are to some extent (but not completely, more > later) independent. > > The event interface tries to model all events with three > variable, 2 enumerations and a string: > * location = Attribute | Element | Comment .... > * type = Start | Finish | Name_prefix | Name_local | Data ... > * value = [UC_]STRING > > A short example: > > <!-- comment --> > <a> > <x:b attr='value'>content</b> > </a> > > Would lead to the following event stream: > > LOCATION TYPE VALUE > ------------------------------------------- > Comment Start "" > Comment Data " comment " > Comment Finish "" > Element Start "" > Element Name_local "a" > Attribute Start "" > Attribute Finish "" > Element Start "" > Element Name_prefix "x" > Element Name_local "b" > Attribute Start "" > Attribute Name_local "attr" > Attribute Data "value" > Attribute Finish "" > Element Data "content" > Element Name_prefix "x" > Element Name_local "b" > Element Finish "" > Element Name_local "a" > Element Finish "" > > See XE_INTERFACE for a pseudo-grammar of the event flow. > > Note this does not try to do DTDs and probably wouldn't > scale. I think DTDs are legacy and the issue of having a > clean interface for DTDs is not important, and as Berend > said it's worth having a distinct subset for the core events. > > Layered events are implemented by each descendant forwarding > events to the same interface it responds to: > > deferred class XE_INTERFACE > feature -- Event interface > > on_event (a_where: XE_LOCATION; a_what: XE_TYPE; a_value: STRING) is > -- XML event. > -- See invariant for allowed event sequence. > deferred > .... > > deferred class XE_FILTER_INTERFACE > inherit > XE_INTERFACE > > feature -- Next > next: XE_INTERFACE > > > Actual event processors inherit from XE_FILTER_INTERFACE, and call > next.on_event (...) within the own implementation of on_event. > > Then, using the filters is just a question of making a pipe, > given functions that create filters and bind 'next': > > a_parser.set_interface (debug_printer (null)) > -- XML parser event source -> print events -> null > > a_parser.set_interface (namespace_resolver (pretty_printer (null))) > -- parser -> resolve namespaces -> print canonical xml -> null > > 'null' is a XE_INTERFACE that does nothing, to finish the pipe. > 'a_parser' is the event source starting the pipe. > > This is where the simplified interfaces comes in: filters are > allowed to change or add events. So, the namespace resolver > adds an event with the namespace URI before each 'name, local > part' event. Without the simplified interface, any downstream > event interface would need to be typed with the new event -- e.g. > an extra routine or a replacement routine with a different > signature. > > At the cost of somewhat weaker typing, although it is still > as checkable as before dynamically with contracts, we get > easier composition of filter pipes. This is useful for > filters which can do generic processing and forwards > generically unknown events. An example is a value sharing > filter which reduces the number of live strings: > > class XE_VALUE_SHARER > inherit XE_FILTER_INTERFACE > creation set_next > feature > > on_event (a_where: XE_LOCATION; a_what: XE_TYPE; a_value: STRING) is > -- Event. > do > -- ... > if not values.has (a_value) then > values.force (a_value) > end > next.on_event (a_where, a_what, values.item (a_value)) > end > > feature {NONE} > values: DS_HASH_SET[STRING] > > end > > This filter can be placed before or after the namespace > resolver, it would not be possible if the namespace resolver > changed the static types of the event interface. > > With a simple interface and this composition scheme, no > client code would inherit from a parser, it would just call > set(XE_INTERFACE). If it does use the bridge pattern, the > small size of the interface and forwarding for one routine > makes it more acceptable. > > > So in the end, it seems this design allows: > > - easy composition of layers of event processing > - clean client interface (no inheritance from parser) > - bridge vs. factory does not matter for the parser Yep, all we would have are sources and sinks. And since a source can be a sink, combining and reusing components becomes very easy. Man, I like the stream pattern (; > At the cost of: > > - more dynamic typing of event types > - somewhat different coding style If static typing is really an issue, we can still provide a sink that has the old interface, at the cost of one more dispatch. This would be needed anyway to preserve backward compatibility. > On coding style, the namespace resolver I wrote for the > prototype is 2/3 of the code of the one in Nenie XML while > being more reusable. I'm not sure it is due to the API or > some other reason. The API is different enough, and > the problem small, so that I quickly stopped trying to > copypaste my original code into the new API. > > One thing I considered was having a single polymorphic > parameter in on_event, and then processing either using > something like the visitor pattern or reverse assignments. > The visitor pattern would be dangerously close than the > existing event interfaces (worse with more indirections?) > and reverse assignment type dispatching is ugly. > > A thing I didn't investigate but which may be worth looking > at, is seeing how this integrates with the pipe classes > in Gobo. Well, then we would only have one polymorphic parameter. Whatever that would be. Actually I don't see the problem of the visitor pattern plus stream pattern. It would be just one additional sink, that one can use or choose not to use. I agree that reverse asignement would be ugly. > > I hope people can have a look at the code and comment > whether it could be a good API. It's just a few hours > hack, nothing set in stone even if it were adopted for > Gobo XML (rewriting a similar idea with different details > is not much work). Let's see if others have comments on it as well. Also, performance is one issue. Although I think that with this new approach performance should not suffer. This approach simplifies the whole event interface a lot. The tree interface could drop the bridge pattern then as well, since it was never really needed. I only did it in the beginning to allow for a DOM parser to plug in, but I think in practice this is too cumbersome anyway. regards, Andreas |