I spent some time during my holiday to think about how to implement updates.
The main problem is to efficiently update index structures. I have thus started
to simplify the created indexes and I'm nearly finished with that (see below).
Given these changes it should be much easier to implement the DOM methods for
node updates (if I had another week or two of spare time, I would surely be
able to implement a good part of XUpdate ;-). What kind of updates would you
exactly require for your project? Maybe I can start implementing those things
which are the most important for you.
P.S.: It looks like the latest changes will greatly improve performance - at
least that's my personal impression, but I have to do some tests to verify
that... Major changes are:
Most important, the main index (contained in dom.dbx) is much smaller now. This
index is actually not required by most operations. It provides direct access to
distinct nodes by their unique node identifier (for example, to display search
results). I have decided to drop large parts of this index. My current approach
is to index only the top-level elements. Access to all other nodes is done by
traversing the nearest available ancestor node. As a result, the dom.dbx file
gets much smaller: In fact (when indexing the Shakespeare collection) it is
smaller than the original XML data source size. Additionally, the index is
much easier to maintain, which is very important for later updates.
I have also improved the elements and full-text indexes, which map element
names and document terms to a list of matching node-identifiers: the
node-identifier lists (consisting of long integers) are now stored in
variable-byte coding to save storage space. Additionally, only the difference
between the previous and the current node-id is saved (delta coding). This way
I managed to reduce the size of the fulltext-index for the Shakespeare
collection by 50-70%.