#3 How to find duplicates like the one in the attached shapefile?

v1.0 (example)
open
nobody
Duplicates (1)
5
2013-10-14
2013-10-10
Eric Jarvies
No

I have attached a couple of shapefiles, one has 156 polygon features, and the other has 157 linestring features, and this is pulled from a data set containing a few million features. I have been unable to figure out which tool/operation within Openjump that will allow me to compare two layers and extract only those geoms that are different, and I have been unable to successfully use just a simple Delete Duplicates operation.

In the 157 shapefile, I denoted which feature is duplicate(under notes column the record named 'duplicate), but I denoted this manually, as I was unable to identify it using any of the operations that I thus far understand within OpenJump... I would've thought that at least the Delete Duplicates operation would have worked, but it did not, so perhaps it would work on a Windows version? Without looking in the attributes, see if you can use Delete Duplicates to identify the problem feature, because it would not work for me.

Thanks,

Eric

1 Attachments

Discussion

  • You probably want to compare polygon "boundaries" with the linestring layer ?
    I can see 2 tools for that :
    1) Tools > QA > Calculate Geometry Differences...
    2) Extension > Matching > Matching (needs to be installed)
    First one is very efficient if you want to locate every small differences between shapes of two datasets supposed to be equal. Not sure it can answer you specific problem though (don't know if geometry are comapred with strict equality or topological equality).
    The second one can find exactly what you are looking for (it has options to test geometric/topologic equality or 2D/3D equality), but you must first convert your polygons to linestring because linestring and polygons will never be considered equal.

     
    • Eric Jarvies
      Eric Jarvies
      2013-10-14

      The difficulty is in finding/identifying/knowing these specific geoms.

      This is what seems to work the best;

      1. Tools > QA > Delete Duplicate Geometries
      2. Tools > Edit Geometry > Convert > Extract Segments
      3. Tools > Edit Geometry > Noder

      The Noder tool will point out any geoms that overlap/intersect, which is one way of finding these types of duplicates, which are 'almost' identical, but not quite(as ede points out below).

      In this particular data set(elevation bands/islands linestrings), there should not be any geoms that intersect or overlap, so the Noder tool clearly does the trick.

      However, if this data contained geoms that did intersect or overlap, then the Noder tool would clearly not be the tool to use in order to find almost duplicates type of geoms like are found in example layer 157.

       
  • ede
    ede
    2013-10-14

    well. i checked your test set and your marked duplicate in 157 line.shp is not exactly identical, see below, so of course delete duplicates fails.

    ..ede

    FID 425 LINESTRING (631665.8649272032 2558604.8680511904, 631666.0269797787 2558604.4630413516, 631665.6760009764 2558603.3510072324, 631664.9830012557 2558603.0490333233, 631663.0189953792 2558602.989003342, 631661.4279374136 2558603.2310688705, 631660.747989451 2558603.763029351, 631659.3240102038 2558604.420927138, 631656.4880495989 2558605.3179720477, 631655.8960186038 2558606.2829509536, 631655.7389921712 2558606.840001615, 631655.8659428132 2558607.2649538917, 631656.4099417167 2558607.8440547287, 631657.2390120968 2558608.291016647, 631659.9309574938 2558607.8550392827, 631662.8010066971 2558607.258954947, 631664.1240172186 2558606.7460451694, 631665.8649272032 2558604.8680511904)

    FID 465 LINESTRING (631662.8010066971 2558607.258954947, 631664.1240172186 2558606.7460451694, 631665.8649272032 2558604.8680511904, 631666.0269797787 2558604.4630413516, 631665.6760009764 2558603.3510072324, 631664.9830012557 2558603.0490333233, 631663.0189953792 2558602.989003342, 631661.4279374136 2558603.2310688705, 631660.747989451 2558603.763029351, 631659.3240102038 2558604.420927138, 631656.4880495989 2558605.3179720477, 631655.8960186038 2558606.2829509536, 631655.7389921712 2558606.840001615, 631655.8659428132 2558607.2649538917, 631656.4099417167 2558607.8440547287, 631657.2390120968 2558608.291016647, 631659.9309574938 2558607.8550392827, 631662.8010066971 2558607.258954947)

     
  • OK, I thought you was comparing lines with polygons.
    You just want to find duplicates in the line layer.

    Try with the matching extension which has all the options to do that.
    With the following parameters (see attachment), you'll find the pair of geometries you marked as duplicate (note that it will output the two matching geometries, not only one).

    Here are other methods which which use only core features (but which are not as straightforward as the previous one) :
    Auto-assign an identifier attribute "id" to your line features (Tools > Edit Attribute > Auto Assign Attribute)
    Use Tools > Analysis > Spatial join (157line equals 157line)
    or
    Use Tools > Analysis > Overlay
    Duplicate feature are the only one which have two different attributes A_id and B_id (you'll need a formula to find them if you have thousands features)

     
    • Eric Jarvies
      Eric Jarvies
      2013-10-15

      Michael,

      The other night when I had written this post I was half asleep, and reading it now I see it was not very clear, so my apologies.

      I will try what you've suggested and see how that compares to how I am doing it now.

      Thank you for the advice,

      Eric