Hi Andre,
Apologies if the email formatting is a bit off - I'm being forced to use outlook 2003.. Replies are inline.
________________________________
From: andrea.aime@... [mailto:andrea.aime@...] On Behalf Of Andrea Aime
Sent: Monday, 3 December 2012 04:02
To: Geoff Williams
Cc: geotools-gt2-users@...
Subject: Re: [Geotools-gt2-users] ShapeFileDataStoreFactory uses createIndex variable to control whether index is used in shapefiles [SEC=UNCLASSIFIED]
On Fri, Nov 30, 2012 at 8:17 AM, Geoff Williams <G.Williams2@...>> wrote:
Hi List,
I think I've found a bug in the logic of the ShapeFileDataStoreFactory class.
In the createNewDataStore() method, I've noticed that the createIndex variable is used to actually control whether the index is used. When enabled, indexes are always (re)created. Here's the relevant code:
if (createIndex) {
store = new IndexedShapefileDataStore(url, namespace,
useMemoryMappedBuffer, cacheMemoryMaps, true,
IndexType.QIX, dbfCharset);
} else {
store = new ShapefileDataStore(url, namespace,
useMemoryMappedBuffer, cacheMemoryMaps, dbfCharset);
}
Looked at the patch, which stats with a rather scary code:
boolean createIndex = isCreateSpatialIndex.booleanValue() && isLocal;
+ boolean enableIndex = true && isLocal;
true && isLocal, really? :-)
I was going to add the code to support the params map here but I thought I'd run it past the list first, your right this is just plain wrong at the moment ;-)
Anyways, the result is that the store ends up creating a IndexedShapefileDataStore every time
the shapefile is local.
Yeah I think that would be a good default. If i develop the param map support that Michael mentioned then the patch should please everyone
Hum... makes some sense, but doesn't the datastore end up checking if the index file exists
every time it has to use it anyways?
Wondering how checking for file existance is so much lighter than comparing two last modification
dates: don't get me wrong, it's certainly lighter, but I'm surprised it makes a difference between
grinding the system to a halt and making it work fine... what operating system and file system
are you using?
It seems to make a huge difference - at least in my own tests. I'm using centos5 32 bit with ext3 filesystems running under virtualbox 4.1 on windows XP. My disk images are growing files on an old desktop and give truly terrible IO performance. I developed a JMeter project to do a get capabilities request and then make 50 getFeature getMap requests in parallel. With createIndex enabled the VM becomes unusable to the point of needing rebooting (quicker then waiting for everything to finish - I'm sure control would return eventually), with the IndexedShapefileDataStore enabled but set to not update the index, performance is still really good and looking up by featureID is lightening fast too. I didn't look at the actual code that does the update, perhaps it does something else too.
Anyways, not opposed to the change per se, the original code was there because the indexed
store was experimental, but years have passed by and it's not experimental anymore... I'm
actually in the process of throwing both of them away and introduce a single new implementation
in the 9.x series.
Sounds great, in the meantime it might be handy for some users to have this fix in a future version of geoserver. I'm going to refine the patch with suggestions Michael, will post to Jira when ready: https://jira.codehaus.org/browse/GEOT-4336
Thanks for looking into this. Cheers,
Geoff
Cheers
Andrea
--
==
Our support, Your Success! Visit http://opensdi.geo-solutions.it for more information.
==
Ing. Andrea Aime
@geowolf
Technical Lead
GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549
http://www.geo-solutions.it
http://twitter.com/geosolutions_it
-------------------------------------------------------
|