From: Geoff W. <G.W...@bo...> - 2012-11-30 07:17:37
Attachments:
shapefile_index_switch.patch
|
Hi List, I think I've found a bug in the logic of the ShapeFileDataStoreFactory class. In the createNewDataStore() method, I've noticed that the createIndex variable is used to actually control whether the index is used. When enabled, indexes are always (re)created. Here's the relevant code: if (createIndex) { store = new IndexedShapefileDataStore(url, namespace, useMemoryMappedBuffer, cacheMemoryMaps, true, IndexType.QIX, dbfCharset); } else { store = new ShapefileDataStore(url, namespace, useMemoryMappedBuffer, cacheMemoryMaps, dbfCharset); } I've been using GeoTools 8.3 as part of geoserver 2.2.1 to deliver WFS data from shapefiles. I found that under load (generated with jmeter) , the GeoTools check to see if the index was up-to-date was causing my test system to grind to a halt so I disabled index creation/updating in geoserver only to find out that doing so completely disabled the use of the index. I think the above block of code should be changed to introduce a separate new variable to enable use of the index and the createIndex variable should be passed in the place of the hardcoded 'true' argument. I developed a patch against GeoTools 8.3 to enable indexes for local files. I haven't included support for setting the new variable in the params Map but I'd be happy to code this up and make any changes necessary to get this to work with GeoTools 8.4 - any interest? I'd really like to get this fix into GeoTools if possible - for performance reasons, I need to have index updating off but I still need to be able to use the indexes (which I create offline). Thanks, Geoff |
From: Michael B. <mic...@gm...> - 2012-11-30 22:28:15
|
Hi Geoff, Thanks for the bug report and the patch. I've created a Jira issue with your patch attached: https://jira.codehaus.org/browse/GEOT-4336 I'm afraid it's hard to say when this will be looked at. Andrea, who maintains the shapefile module, is one of the most over-worked developers on both GeoTools and GeoServer. However, it will have more chance of being reviewed sooner if you can extend the patch to: - take care of the params map - include one or more unit tests (if possible, though it might not be obvious how to test in this case) cheers Michael On 30 November 2012 18:17, Geoff Williams <G.W...@bo...> wrote: > > Hi List, > > I think I've found a bug in the logic of the ShapeFileDataStoreFactory class. > > In the createNewDataStore() method, I've noticed that the createIndex variable is used to actually control whether the index is used. When enabled, indexes are always (re)created. Here's the relevant code: > > if (createIndex) { > store = new IndexedShapefileDataStore(url, namespace, > useMemoryMappedBuffer, cacheMemoryMaps, true, > IndexType.QIX, dbfCharset); > } else { > store = new ShapefileDataStore(url, namespace, > useMemoryMappedBuffer, cacheMemoryMaps, dbfCharset); > } > > I've been using GeoTools 8.3 as part of geoserver 2.2.1 to deliver WFS data from shapefiles. I found that under load (generated with jmeter) , the GeoTools check to see if the index was up-to-date was causing my test system to grind to a halt so I disabled index creation/updating in geoserver only to find out that doing so completely disabled the use of the index. > > I think the above block of code should be changed to introduce a separate new variable to enable use of the index and the createIndex variable should be passed in the place of the hardcoded 'true' argument. > > I developed a patch against GeoTools 8.3 to enable indexes for local files. I haven't included support for setting the new variable in the params Map but I'd be happy to code this up and make any changes necessary to get this to work with GeoTools 8.4 - any interest? > > I'd really like to get this fix into GeoTools if possible - for performance reasons, I need to have index updating off but I still need to be able to use the indexes (which I create offline). > > Thanks, > Geoff > > ------------------------------------------------------------------------------ > Keep yourself connected to Go Parallel: > TUNE You got it built. Now make it sing. Tune shows you how. > http://goparallel.sourceforge.net > _______________________________________________ > GeoTools-GT2-Users mailing list > Geo...@li... > https://lists.sourceforge.net/lists/listinfo/geotools-gt2-users > |
From: Andrea A. <and...@ge...> - 2012-12-02 17:02:29
|
On Fri, Nov 30, 2012 at 8:17 AM, Geoff Williams <G.W...@bo...>wrote: > > Hi List, > > I think I've found a bug in the logic of the ShapeFileDataStoreFactory > class. > > In the createNewDataStore() method, I've noticed that the createIndex > variable is used to actually control whether the index is used. When > enabled, indexes are always (re)created. Here's the relevant code: > > if (createIndex) { > store = new IndexedShapefileDataStore(url, namespace, > useMemoryMappedBuffer, cacheMemoryMaps, true, > IndexType.QIX, dbfCharset); > } else { > store = new ShapefileDataStore(url, namespace, > useMemoryMappedBuffer, cacheMemoryMaps, dbfCharset); > } > Looked at the patch, which stats with a rather scary code: boolean createIndex = isCreateSpatialIndex.booleanValue() && isLocal; + boolean enableIndex = true && isLocal; true && isLocal, really? :-) Anyways, the result is that the store ends up creating a IndexedShapefileDataStore every time the shapefile is local. Hum... makes some sense, but doesn't the datastore end up checking if the index file exists every time it has to use it anyways? Wondering how checking for file existance is so much lighter than comparing two last modification dates: don't get me wrong, it's certainly lighter, but I'm surprised it makes a difference between grinding the system to a halt and making it work fine... what operating system and file system are you using? Anyways, not opposed to the change per se, the original code was there because the indexed store was experimental, but years have passed by and it's not experimental anymore... I'm actually in the process of throwing both of them away and introduce a single new implementation in the 9.x series. Cheers Andrea -- == Our support, Your Success! Visit http://opensdi.geo-solutions.it for more information. == Ing. Andrea Aime @geowolf Technical Lead GeoSolutions S.A.S. Via Poggio alle Viti 1187 55054 Massarosa (LU) Italy phone: +39 0584 962313 fax: +39 0584 1660272 mob: +39 339 8844549 http://www.geo-solutions.it http://twitter.com/geosolutions_it ------------------------------------------------------- |
From: Geoff W. <G.W...@bo...> - 2012-12-03 01:14:53
|
Hi Andre, Apologies if the email formatting is a bit off - I'm being forced to use outlook 2003.. Replies are inline. ________________________________ From: and...@gm... [mailto:and...@gm...] On Behalf Of Andrea Aime Sent: Monday, 3 December 2012 04:02 To: Geoff Williams Cc: geo...@li... Subject: Re: [Geotools-gt2-users] ShapeFileDataStoreFactory uses createIndex variable to control whether index is used in shapefiles [SEC=UNCLASSIFIED] On Fri, Nov 30, 2012 at 8:17 AM, Geoff Williams <G.W...@bo...<mailto:G.W...@bo...>> wrote: Hi List, I think I've found a bug in the logic of the ShapeFileDataStoreFactory class. In the createNewDataStore() method, I've noticed that the createIndex variable is used to actually control whether the index is used. When enabled, indexes are always (re)created. Here's the relevant code: if (createIndex) { store = new IndexedShapefileDataStore(url, namespace, useMemoryMappedBuffer, cacheMemoryMaps, true, IndexType.QIX, dbfCharset); } else { store = new ShapefileDataStore(url, namespace, useMemoryMappedBuffer, cacheMemoryMaps, dbfCharset); } Looked at the patch, which stats with a rather scary code: boolean createIndex = isCreateSpatialIndex.booleanValue() && isLocal; + boolean enableIndex = true && isLocal; true && isLocal, really? :-) I was going to add the code to support the params map here but I thought I'd run it past the list first, your right this is just plain wrong at the moment ;-) Anyways, the result is that the store ends up creating a IndexedShapefileDataStore every time the shapefile is local. Yeah I think that would be a good default. If i develop the param map support that Michael mentioned then the patch should please everyone Hum... makes some sense, but doesn't the datastore end up checking if the index file exists every time it has to use it anyways? Wondering how checking for file existance is so much lighter than comparing two last modification dates: don't get me wrong, it's certainly lighter, but I'm surprised it makes a difference between grinding the system to a halt and making it work fine... what operating system and file system are you using? It seems to make a huge difference - at least in my own tests. I'm using centos5 32 bit with ext3 filesystems running under virtualbox 4.1 on windows XP. My disk images are growing files on an old desktop and give truly terrible IO performance. I developed a JMeter project to do a get capabilities request and then make 50 getFeature getMap requests in parallel. With createIndex enabled the VM becomes unusable to the point of needing rebooting (quicker then waiting for everything to finish - I'm sure control would return eventually), with the IndexedShapefileDataStore enabled but set to not update the index, performance is still really good and looking up by featureID is lightening fast too. I didn't look at the actual code that does the update, perhaps it does something else too. Anyways, not opposed to the change per se, the original code was there because the indexed store was experimental, but years have passed by and it's not experimental anymore... I'm actually in the process of throwing both of them away and introduce a single new implementation in the 9.x series. Sounds great, in the meantime it might be handy for some users to have this fix in a future version of geoserver. I'm going to refine the patch with suggestions Michael, will post to Jira when ready: https://jira.codehaus.org/browse/GEOT-4336 Thanks for looking into this. Cheers, Geoff Cheers Andrea -- == Our support, Your Success! Visit http://opensdi.geo-solutions.it for more information. == Ing. Andrea Aime @geowolf Technical Lead GeoSolutions S.A.S. Via Poggio alle Viti 1187 55054 Massarosa (LU) Italy phone: +39 0584 962313 fax: +39 0584 1660272 mob: +39 339 8844549 http://www.geo-solutions.it http://twitter.com/geosolutions_it ------------------------------------------------------- |
From: Geoff W. <G.W...@bo...> - 2012-12-03 06:17:05
|
Hi Michael, Thanks for the feedback, I've updated Jira with an updated patch including both of your suggestions. > > https://jira.codehaus.org/browse/GEOT-4336 > > - take care of the params map done > - include one or more unit tests (if possible, though it might not be > obvious how to test in this case) Use instanceof operator to check correct type of class was being returned. Hope this is useful. cheers, Geoff |
From: Michael B. <mic...@gm...> - 2012-12-03 07:12:29
|
Thanks Geoff - I saw your previous message to Andrea too. Looks like things are underway. Always happy to help the met bureau :) Michael On 3 December 2012 17:16, Geoff Williams <G.W...@bo...> wrote: > Hi Michael, > > Thanks for the feedback, I've updated Jira with an updated patch including both of your suggestions. >> >> https://jira.codehaus.org/browse/GEOT-4336 >> >> - take care of the params map > done >> - include one or more unit tests (if possible, though it might not be >> obvious how to test in this case) > Use instanceof operator to check correct type of class was being returned. > > Hope this is useful. > > cheers, > Geoff |