|
From: Tatu S. <cow...@ya...> - 2007-01-17 20:14:24
|
--- Jimmy Zhang <cra...@co...> wrote: > > There are really 2 major limitations I think: > > > > (a) It does not (and probably will not) handle > DTDs: > > DTD seems to have been depleted somewhat, e.g. > in SOAP, various types of industry specific > vocabularies, RDFs... XPath and XQuery doesn't even have the > notion of DTD anymore... Right, I am not arguing whether it is very relevant or not, just pointing out that for cases where it is (minority, perhaps?), it is a limitation. > > (b) Namespace handling is not very complete, nor > > efficient; this because the way namespaces work is > > somewhat conflicting with the way VTD-XML does its > > processing, to obtain high speed. So I am not sure > if > > it should be used for documents with namespaces. > > The name space handling is quite efficient... most > cases > I have not seen much difference at all... as to the Well, let me put it this way: since it is done on-demand, its effects accumulate. It obviously does not add overhead during parsing stage (which is good), but it seems to me that processing done on-demand is much more extensive than if it was incrementally computed during parsing. Perhaps better way to put it is that in general VTD minimizes up-front costs, by adding some higher on-demand costs. My favourite example is text access: if you must access content as Strings, it is little bit slower with VTD, since it has to decode content twice in those cases (first during parsing, to find indexes/offsets; second time when constructing the String). The same pattern is used extensively, and I would consider architectural/design decision. Anyway, I do not think performance aspects are the limiting factor at this point; but it might become one if non-conformant cases were handled. Specifically, whereas parsing in general is checking for well-formedness constraints, this does not cover namespace validity checking. Attributes ns1:a and ns2:a may be duplicates, and this is not checked at this point (as far as I know). If those were checked during parsing (as is the case for straight duplicate checks), parser would either have to keep namespace binding state (which might not be too bad), or dynamically derive bindings (as is now done on-demand). Similarly, checks for unbound namespaces should be added, if namespace-conformance was to be completed. > completion > part, I have two thoughts, one is that the namespace > spec > itself has issues, the second is that adding It is bit complicated to implement, and not necessarily defined in optimal way. But at least it has no ambiguities that I know of. > complete support > of it should not be difficult nor inefficient... If so, I am sure this will be done in future! ;-) ... > > Clearly understanding limitations and benefits, > and > > choosing right tools based on this is essential. > > > > I don't know why some kept thinking VTD is > compressed > DOM...does it look like compressed DOM to you? No, I think that is wrong. It may be because some people nowadays think any XML tree representation is DOM. That's just not right. -+ Tatu +- ____________________________________________________________________________________ Expecting? Get great news right away with email Auto-Check. Try the Yahoo! Mail Beta. http://advision.webevents.yahoo.com/mailbeta/newmail_tools.html |