Re: [Xswt-developer] the XML parser

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

> -----Original Message-----
> Date: Thu, 29 Jun 2006 20:42:40 -0500
> From: "David J. Orme" <dj...@co...>
> Subject: Re: [Xswt-developer] the XML parser
> I'm still of two minds about this.
> 
> One of the things we all like about XSWT is that it _isn't 
> XUL_.  It comes from a "less is more" design philosophy.

I like that philosophy, as it reduces the threshold for accepting it e.g. as
a part of other projects.

> On the + side for this idea, having such a layer would make 
> this migration easier

The layer is supposed to be light-weight, to hide the difference in API of
different XML parsers. The goal is to focus on what XSWT needs from a
parser, to give developers (including us) more freedom in which underlying
parser to use. There are other arguments I'd like to mention:
- I don't know all the contexts that use XSWT, but it is best if XSWT does
not dictate that parser used for other parts. I know from my own experience,
that it's sometimes difficult to use several parsers within one application.
- I believe we need this freedom ourselves, so we can work on the style
mechanism before we settle on the parser.
- Using a tree-oriented API will clean up XSWT, as the event-based pull
parsing makes it more difficult to read than a similar one, based on
tree-traversal.
- If XSWT is to be tightly coupled with an application, like an editor,
passing a tree is leaner and quicker than writing and reading a file.

> If you feel strongly that having the abstraction is worth more than the
weight
> it would add, I think I'll defer to your opinion.  If you're not totally
convinced
> yourself, then let's keep discussing it.

I'm convinced I want to, but I don't want to force it on XSWT without a
proper discussion. My suggestion for API is shown below. Design decisions:
- It is based on opaque objects, where the parser knows how to interpret it.
This makes it easier to support different representations and does not
require specific Element and Attribute classes. If a parser already builds
an object tree, this makes it possible to return this as is, and the parser
just needs to return the appropriate information, given little extra
overhead.
- There is no separate document object, as XSWT currently just uses the
root.
- It is based on positional access instead of Iterators. I don't feel
strongly about this, as XSWT only uses sequential access. Positional
traversal use less space, but may be slower if it requires scanning the list
each time (e.g. if the source list interleaves different child nodes, like
text and elements). Note that we can introduce Iterators for either child
elements or attributes or both.

A test implementation wrapping KXML, with one fairly generic class for
representing the tree and one subclass for building it, uses 230 lines (110
+ 120, respectively).

Please comment on both the general idea of an XML tree abstraction and the
suggested interface.

public interface IMinimalParser {
	// Builds an object tree from an InputStream input
	public Object build(InputStream input);

	// Returns the name of the element
	public String getElementName(Object element);
	// Returns the namespace (URI) of the element
	public String getElementNamespace(Object element);

	// Returns the number of children of element
	public int getChildElementCount(Object element);
	// Returns the child element of element at the given position 
	public Object getChildElement(Object element, int i);

	// Returns the number of attributes of element
	public int getAttributeCount(Object element);

	// Returns the name of the given attribute of element
	public String getAttributeName(Object element, int i);
	// Returns the namespace (URI) of the given attribute of element
	public String getAttributeNamespace(Object element, int i);
	// Returns the text value of the given attribute of element
	public String getAttributeValue(Object element, int i);

	// Returns the text content of element
	public String getElementText(Object element);
}