From: Franck A. <fr...@ne...> - 2001-10-21 19:32:21
|
Regarding the XML parser interface discussions, I agree with Andreas that the bridge pattern seems to be a bit overkill and leads to (interface) code duplication, plus it makes reading the code a bit of a pain (OO spaghetti), especially because of the callback interface of the parser as it goes through indirections in all directions. In Nenie XML, I have a similar event interface without the bridge pattern. Still it suffers from some issues mentionned by Berend. First, you inherit from a specific implementation (not an issue in nxml but not nice generally). Also, I do various post-processing layers through the inherited event interface. These layers add more event procedures, sometimes building on parent events with Precursor but not always very clearly. This is cumbersome, layers really have dependencies -- you cannot really reorder them -- and combining them is not very convenient. In the end the layering is not very reusable. I had some loose thoughts about trying to solve those issues differently, and with the discussion on this list starting, I actually wrote a prototype yesterday which may help. It is available at: http://www.nenie.org/eiffel/xml/xmlprototype-oct2001.zip It's only compiled with SmallEiffel, not extensively tested, and uses Nenie XML in one class, but the point is to show the design rather than the details: it's code for reading rather than executing (but it does run). You don't need to know Nenie XML to understand it. The two ideas I have implemented are: (1) Event processing layer composition is implemented with delegation rather than inheritance. (2) The event interface is simplified to one callback routine with simple types (enumerations+string). Both ideas are to some extent (but not completely, more later) independent. The event interface tries to model all events with three variable, 2 enumerations and a string: * location = Attribute | Element | Comment .... * type = Start | Finish | Name_prefix | Name_local | Data ... * value = [UC_]STRING A short example: <!-- comment --> <a> <x:b attr='value'>content</b> </a> Would lead to the following event stream: LOCATION TYPE VALUE ------------------------------------------- Comment Start "" Comment Data " comment " Comment Finish "" Element Start "" Element Name_local "a" Attribute Start "" Attribute Finish "" Element Start "" Element Name_prefix "x" Element Name_local "b" Attribute Start "" Attribute Name_local "attr" Attribute Data "value" Attribute Finish "" Element Data "content" Element Name_prefix "x" Element Name_local "b" Element Finish "" Element Name_local "a" Element Finish "" See XE_INTERFACE for a pseudo-grammar of the event flow. Note this does not try to do DTDs and probably wouldn't scale. I think DTDs are legacy and the issue of having a clean interface for DTDs is not important, and as Berend said it's worth having a distinct subset for the core events. Layered events are implemented by each descendant forwarding events to the same interface it responds to: deferred class XE_INTERFACE feature -- Event interface on_event (a_where: XE_LOCATION; a_what: XE_TYPE; a_value: STRING) is -- XML event. -- See invariant for allowed event sequence. deferred .... deferred class XE_FILTER_INTERFACE inherit XE_INTERFACE feature -- Next next: XE_INTERFACE Actual event processors inherit from XE_FILTER_INTERFACE, and call next.on_event (...) within the own implementation of on_event. Then, using the filters is just a question of making a pipe, given functions that create filters and bind 'next': a_parser.set_interface (debug_printer (null)) -- XML parser event source -> print events -> null a_parser.set_interface (namespace_resolver (pretty_printer (null))) -- parser -> resolve namespaces -> print canonical xml -> null 'null' is a XE_INTERFACE that does nothing, to finish the pipe. 'a_parser' is the event source starting the pipe. This is where the simplified interfaces comes in: filters are allowed to change or add events. So, the namespace resolver adds an event with the namespace URI before each 'name, local part' event. Without the simplified interface, any downstream event interface would need to be typed with the new event -- e.g. an extra routine or a replacement routine with a different signature. At the cost of somewhat weaker typing, although it is still as checkable as before dynamically with contracts, we get easier composition of filter pipes. This is useful for filters which can do generic processing and forwards generically unknown events. An example is a value sharing filter which reduces the number of live strings: class XE_VALUE_SHARER inherit XE_FILTER_INTERFACE creation set_next feature on_event (a_where: XE_LOCATION; a_what: XE_TYPE; a_value: STRING) is -- Event. do -- ... if not values.has (a_value) then values.force (a_value) end next.on_event (a_where, a_what, values.item (a_value)) end feature {NONE} values: DS_HASH_SET[STRING] end This filter can be placed before or after the namespace resolver, it would not be possible if the namespace resolver changed the static types of the event interface. With a simple interface and this composition scheme, no client code would inherit from a parser, it would just call set(XE_INTERFACE). If it does use the bridge pattern, the small size of the interface and forwarding for one routine makes it more acceptable. So in the end, it seems this design allows: - easy composition of layers of event processing - clean client interface (no inheritance from parser) - bridge vs. factory does not matter for the parser At the cost of: - more dynamic typing of event types - somewhat different coding style On coding style, the namespace resolver I wrote for the prototype is 2/3 of the code of the one in Nenie XML while being more reusable. I'm not sure it is due to the API or some other reason. The API is different enough, and the problem small, so that I quickly stopped trying to copypaste my original code into the new API. One thing I considered was having a single polymorphic parameter in on_event, and then processing either using something like the visitor pattern or reverse assignments. The visitor pattern would be dangerously close than the existing event interfaces (worse with more indirections?) and reverse assignment type dispatching is ugly. A thing I didn't investigate but which may be worth looking at, is seeing how this integrates with the pipe classes in Gobo. I hope people can have a look at the code and comment whether it could be a good API. It's just a few hours hack, nothing set in stone even if it were adopted for Gobo XML (rewriting a similar idea with different details is not much work). -- fr...@ne... |