[gobo-eiffel-develop] A prototype XML event interface

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Regarding the XML parser interface discussions, I agree with 
Andreas that the bridge pattern seems to be a bit overkill 
and leads to (interface) code duplication, plus it makes 
reading the code a bit of a pain (OO spaghetti), especially 
because of the callback interface of the parser as it goes 
through indirections in all directions.

In Nenie XML, I have a similar event interface without the 
bridge pattern. Still it suffers from some issues mentionned
by Berend.

First, you inherit from a specific implementation (not an 
issue in nxml but not nice generally). 

Also, I do various post-processing layers through the 
inherited event interface. These layers add more event 
procedures, sometimes building on parent events with 
Precursor but not always very clearly. This is cumbersome, 
layers really have dependencies -- you cannot really 
reorder them -- and combining them is not very convenient.
In the end the layering is not very reusable.

I had some loose thoughts about trying to solve those 
issues differently, and with the discussion on this list 
starting, I actually wrote a prototype yesterday which 
may help. It is available at:

http://www.nenie.org/eiffel/xml/xmlprototype-oct2001.zip

It's only compiled with SmallEiffel, not extensively 
tested, and uses Nenie XML in one class, but the point 
is to show the design rather than the details: it's code 
for reading rather than executing (but it does run). You 
don't need to know Nenie XML to understand it.

The two ideas I have implemented are:

(1) Event processing layer composition is implemented with 
delegation rather than inheritance.

(2) The event interface is simplified to one callback 
routine with simple types (enumerations+string).

Both ideas are to some extent (but not completely, more 
later) independent.

The event interface tries to model all events with three
variable, 2 enumerations and a string:
 * location = Attribute | Element | Comment .... 
 * type = Start | Finish | Name_prefix | Name_local | Data ...
 * value = [UC_]STRING

A short example:

<!-- comment -->
<a>
   <x:b attr='value'>content</b>
</a>

Would lead to the following event stream:

LOCATION  TYPE         VALUE
-------------------------------------------
Comment   Start        ""
Comment   Data         " comment "
Comment   Finish       ""
Element   Start        ""
Element   Name_local   "a"
Attribute Start        ""
Attribute Finish       ""
Element   Start        ""
Element   Name_prefix  "x"
Element   Name_local   "b"
Attribute Start        ""
Attribute Name_local   "attr"
Attribute Data         "value"
Attribute Finish       ""
Element   Data         "content"
Element   Name_prefix  "x"
Element   Name_local   "b"
Element   Finish       ""
Element   Name_local   "a"
Element   Finish       ""

See XE_INTERFACE for a pseudo-grammar of the event flow.

Note this does not try to do DTDs and probably wouldn't 
scale. I think DTDs are legacy and the issue of having a
clean interface for DTDs is not important, and as Berend 
said it's worth having a distinct subset for the core events.

Layered events are implemented by each descendant forwarding 
events to the same interface it responds to:

deferred class XE_INTERFACE
feature -- Event interface

  on_event (a_where: XE_LOCATION; a_what: XE_TYPE; a_value: STRING) is
    -- XML event.
    -- See invariant for allowed event sequence.
    deferred
....

deferred class XE_FILTER_INTERFACE
inherit
  XE_INTERFACE

feature -- Next
  next: XE_INTERFACE

Actual event processors inherit from XE_FILTER_INTERFACE, and call 
next.on_event (...) within the own implementation of on_event.

Then, using the filters is just a question of making a pipe, 
given functions that create filters and bind 'next':

 a_parser.set_interface (debug_printer (null))
 -- XML parser event source -> print events -> null

 a_parser.set_interface (namespace_resolver (pretty_printer (null)))
 -- parser -> resolve namespaces -> print canonical xml -> null

'null' is a XE_INTERFACE that does nothing, to finish the pipe.
'a_parser' is the event source starting the pipe.

This is where the simplified interfaces comes in: filters are 
allowed to change or add events. So, the namespace resolver 
adds an event with the namespace URI before each 'name, local 
part' event. Without the simplified interface, any downstream 
event interface would need to be typed with the new event -- e.g. 
an extra routine or a replacement routine with a different 
signature.

At the cost of somewhat weaker typing, although it is still 
as checkable as before dynamically with contracts, we get 
easier composition of filter pipes. This is useful for 
filters which can do generic processing and forwards 
generically unknown events. An example is a value sharing 
filter which reduces the number of live strings:

class XE_VALUE_SHARER 
inherit XE_FILTER_INTERFACE
creation set_next 
feature

  on_event (a_where: XE_LOCATION; a_what: XE_TYPE; a_value: STRING) is
    -- Event.
    do
      -- ...
      if not values.has (a_value) then
        values.force (a_value)
      end
      next.on_event (a_where, a_what, values.item (a_value))
    end

feature {NONE}
  values: DS_HASH_SET[STRING]

end

This filter can be placed before or after the namespace 
resolver, it would not be possible if the namespace resolver 
changed the static types of the event interface.

With a simple interface and this composition scheme, no 
client code would inherit from a parser, it would just call 
set(XE_INTERFACE). If it does use the bridge pattern, the 
small size of the interface and forwarding for one routine 
makes it more acceptable. 

So in the end, it seems this design allows:

- easy composition of layers of event processing
- clean client interface (no inheritance from parser)
- bridge vs. factory does not matter for the parser

At the cost of:

- more dynamic typing of event types
- somewhat different coding style

On coding style, the namespace resolver I wrote for the 
prototype is 2/3 of the code of the one in Nenie XML while 
being more reusable. I'm not sure it is due to the API or 
some other reason. The API is different enough, and 
the problem small, so that I quickly stopped trying to 
copypaste my original code into the new API.

One thing I considered was having a single polymorphic 
parameter in on_event, and then processing either using 
something like the visitor pattern or reverse assignments.
The visitor pattern would be dangerously close than the 
existing event interfaces (worse with more indirections?)
and reverse assignment type dispatching is ugly.

A thing I didn't investigate but which may be worth looking 
at, is seeing how this integrates with the pipe classes 
in Gobo.

I hope people can have a look at the code and comment 
whether it could be a good API. It's just a few hours 
hack, nothing set in stone even if it were adopted for 
Gobo XML (rewriting a similar idea with different details 
is not much work).

--
fr...@ne...