Re: [Exist-open] Data index corruption when xpathing on whole dbcollection

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi Wolfgang,
Many thanks for your feedbacks.
So the fact that no index is used could lead to the data corruption and 
rewriting xpath should prevent from a database crash.

About the other questions, we are trying in this case to isolate one 
document into database depending on parameters which are passed to xquery.
You can view the xquery code into /db/storedModules/rpc.xqm / 
rpc:get-invoice-document of our sample.
The algorithm is the following :
1°. We try to extract data from a parameter which can be used to reduce 
subset of data to look into.
If this one is passed, there doesn't seem to be a problem due to very 
restricted subset but when the one doen't exist, following steps are 
done on the whole document subset.
2°- If an id has been passed, Look for the document by the given id on 
the subset computed from 1°.
     In this case, value are very similar with a fixed 
format.(prefix_incrementalValue or prefix_date_incremental_value)
3°- If no id has been passed, try to get the document by the combination 
of other parameters on the subset computed from 1°.

When the document is found, there is only small operations done on it.
We are trying to rewrite this module which has been identified causing a 
lot of corruption every weeks.

Just for info, When trying to get a unique document by xpathing with is 
full indexed value, it takes approximatly 10 seconds on a range of 
200000 documents.
How happy we would be if you find a way to optimize equal operations on 
a range index search ;)
We would be proud to help you improving this part, so don't hesitate to 
ask us testing it on our very huge database.

Many thanks for your precious help.

Cheers,
Nathan

Le 30/03/2013 10:49, Wolfgang Meier a écrit :
> Hi Nathan,
>> The only thing i have noticed when using xquery profiler is that xpath
>> expression didn't use index when running.
>> We have found a way to force xpath expression to use indexes.
>>
>> Do you think this issue could be the cause of data corruption?
> If the query engine has to scan a large number of documents instead of using an index, memory consumption can increase considerably, potentially resulting in out of memory errors and throwing the db into an inconsistent state. For sure out of memory errors should be avoided and it is important to identify any bottleneck expression.
>> On a second hand, in order to speed up index research, we are looking
>> for other lines of analysis and we wonder if there wouldn't be a better
>> index than range index to deal with xpath value like
>> [value="INV_20130201_XXXXXXX"]?
> It depends on the context. Is the value you are trying to look up quite frequent? Is the filter applied to a larger subset of the db or just to a few ancestor elements? If the context sequence is small and you need to do a lot of processing on single items, it can have benefits to force the root item into memory first (using util:expand). However, this really depends on the context of the expression.
>
> I believe I still have some backups of your test data set somewhere, so if you tell me the concrete expression you are trying to optimize, I could have a closer look.
>> Many thanks in advance for your help and I wish you a good easter weekend.
> I actually reserved the following days to work on a rewrite of the range index which will further improve optimizations on simple key/value lookup expressions ;-)
>
> Best,
>
> Wolfgang

Re: [Exist-open] Data index corruption when xpathing on whole dbcollection

eXist-db is a feature rich Open Source native XML database

Re: [Exist-open] Data index corruption when xpathing on whole dbcollection