Re: [Vtd-xml-users] xerces discussion on VTD

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

--- Jimmy Zhang <cra...@co...> wrote:

> > There are really 2 major limitations I think:
> >
> > (a) It does not (and probably will not) handle
> DTDs:
> 
> DTD seems to have been depleted somewhat, e.g.
> in SOAP, various types of industry specific
> vocabularies, RDFs... XPath and XQuery doesn't even
have the
> notion of DTD anymore...

Right, I am not arguing whether it is very relevant or
not,
just pointing out that for cases where it is
(minority,
perhaps?), it is a limitation.

> > (b) Namespace handling is not very complete, nor
> > efficient; this because the way namespaces work is
> > somewhat conflicting with the way VTD-XML does its
> > processing, to obtain high speed. So I am not sure
> if
> > it should be used for documents with namespaces.
> 
> The name space handling is quite efficient... most
> cases
> I have not seen much difference at all... as to the

Well, let me put it this way: since it is done
on-demand,
its effects accumulate. It obviously does not add
overhead
during parsing stage (which is good), but it seems to
me
that processing done on-demand is much more extensive
than if it was incrementally computed during parsing.
Perhaps better way to put it is that in general VTD
minimizes
up-front costs, by adding some higher on-demand costs.
My favourite example is text access: if you must
access content
as Strings, it is little bit slower with VTD, since it
has
to decode content twice in those cases (first during
parsing,
to find indexes/offsets; second time when constructing
the
String). The same pattern is used extensively, and I
would
consider architectural/design decision.

Anyway, I do not think performance aspects are the
limiting
factor at this point; but it might become one if
non-conformant
cases were handled. Specifically, whereas parsing in
general is
checking for well-formedness constraints, this does
not
cover namespace validity checking. Attributes ns1:a
and
ns2:a may be duplicates, and this is not checked
at this point (as far as I know). If those were
checked during
parsing (as is the case for straight duplicate
checks),
parser would either have to keep namespace binding
state
(which might not be too bad), or dynamically derive
bindings (as is now done on-demand).
Similarly, checks for unbound namespaces should be
added,
if namespace-conformance was to be completed.

> completion
> part, I have two thoughts, one is that the namespace
> spec
> itself has issues, the second is that adding

It is bit complicated to implement, and not
necessarily
defined in optimal way. But at least it has no
ambiguities that I know of.

> complete support
> of it should not be difficult nor inefficient...

If so, I am sure this will be done in future! ;-)

...
> > Clearly understanding limitations and benefits,
> and
> > choosing right tools based on this is essential.
> >
> 
> I don't know why some kept thinking VTD is
> compressed
> DOM...does it look like compressed DOM to you?

No, I think that is wrong. It may be because some
people nowadays think any XML tree representation
is DOM. That's just not right.

-+ Tatu +-

____________________________________________________________________________________
Expecting? Get great news right away with email Auto-Check. 
Try the Yahoo! Mail Beta.
http://advision.webevents.yahoo.com/mailbeta/newmail_tools.html