Clipper / Discussion / Discussion Forum: usefulness of binning a large set into spatially related sets to speed up clipping

Jan - 2019-11-01

I have quite large polygon datasets, order of 1e6 polys which can be clipped in one go, but then take (too much) time. Multicoring is possible, but does not help out enough, even with 16 threads on a Ryzen 2700X.

Working with large numbers the expected N2 complexity would have to be - clipped - not to say reduced.
Is this assumption of mine correct?

My reason for believing that a serious speedup for large datasets is possible, comes from running the desired dissolves on Qgis, but more to the point, arcgis in the latest versions (pro: 2.4.2, desktop 10.7.1). The latter are one, if not two orders of magnitude faster than what I did using the clipper library simply dumping all the rings as path. After quite a bit of work, I felt, well, silly.

If my idea about binning/two steps approach is correct, the basic approach could be to use a spatial index to group subsets, clip these and subsequently test for overlaps to do the second round, while leaving the non-overlapping well alone.
I've not quite thought this out and wonder if someone has, and/or more people have this need.
When filling the path, binning could be done without the need of a spatial index, through a service to be provided by the library perhaps.

Please let me know whether you agree, have more developed ideas, etc.
Thanks,
Jan

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ignorant - 2019-11-02

dear,
what rendering do you require?
winding numbers?. ..parallelel edge bundling.....?...
There are apecialised versions of vatti for odd/even(alternate) windings which run hundreds of times faster if many many overlapping edges are input for clipping.

Last edit: Ignorant 2019-11-02

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jan - 2019-11-02

Hi,
Thanks for the attention! I require winding number dissolving for this specific case.
My basic code is (parts being polygons which may have holes etc. as in shapefile):

PolyTree pt; Clipper cl; cl.StrictlySimple(strictlysimple); cl.ReverseSolution(true); for (const auto &p : parts) { const unsigned np = p.npoints(); Path subj; if (subj.capacity()<np) subj.reserve(np); const PuntVector &vertex = p.get_outer().vertices(); for (const auto &pp : vertex) subj.emplace_back(IRound(pp.x * multiplier), IRound(pp.y * multiplier)); cl.AddPath(subj, ptSubject, true); subj.clear(); for (auto &inn : p.get_innerrings()) { for (const auto &pp : inn.vertices()) subj.emplace_back(IRound(pp.x * multiplier), IRound(pp.y * multiplier)); cl.AddPath(subj, ptSubject, true); subj.clear(); } } cl.Execute(ctUnion, pt, pftNonZero);

Should I google up on specialised Vatti algorithms?
Thanks,
Jan
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Ignorant - 2019-11-02
  
  the only easily available (with source code) is Angus clipper if you want
  windings other than alternate.
  Specialised vatti includes Anugus code.The other one (with overfed bugs) is
  murtas general polygon clipper.
  an attempt to remove some noticed bugs is available on github. Code for
  vatti is also available from NCAR
  which was the first vatti available with proper horizontal edge
  handing.Alan murta combined al the 4 operations
  in a single source to call it the General Polygon clipper--He added 3 bugs
  in the process.That code has a couple of
  innovations which might help you..It has a horrible input polygon
  partitioning code so rewriting is necessary for a large number of input
  vertices.
  Angus code clipperV2 is much neater but does NOT handle overlapping
  bundles which gpc does.
  
  On Sat, Nov 2, 2019 at 5:44 PM Jan jheckman@users.sourceforge.net wrote:
  
  Hi,
  Thanks for the attention! I require winding number dissolving for this
  specific case.
  My basic code is (parts being polygons which may have holes etc. as in
  shapefile):
  
  PolyTree pt; Clipper cl; cl.StrictlySimple(strictlysimple); cl.ReverseSolution(true); for (const auto &p : parts) { const unsigned np = p.npoints(); Path subj; if (subj.capacity()<np) subj.reserve(np); const PuntVector &vertex = p.get_outer().vertices(); for (const auto &pp : vertex) subj.emplace_back(IRound(pp.x * multiplier), IRound(pp.y * multiplier)); cl.AddPath(subj, ptSubject, true); subj.clear(); for (auto &inn : p.get_innerrings()) { for (const auto &pp : inn.vertices()) subj.emplace_back(IRound(pp.x * multiplier), IRound(pp.y * multiplier)); cl.AddPath(subj, ptSubject, true); subj.clear(); } } cl.Execute(ctUnion, pt, pftNonZero);
  
  Should I google up on specialised Vatti algorithms?
  Thanks,
  Jan
  
  usefulness of binning a large set into spatially related sets to speed up
  clipping
  https://sourceforge.net/p/polyclipping/discussion/1148419/thread/3d84e8fb66/?limit=25#e68e
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/polyclipping/discussion/1148419/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  alternate
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Angus Johnson - 2019-11-02

Hi Jan.

Working with large numbers the expected N2 complexity would have to be - clipped - not to say reduced.
Is this assumption of mine correct?

No. IIRC Clipper's algorithm complexity is worse than N^2, hence the problem you're encountering with very large datasets. (For performance comparisons with other clipping libraries see https://rogue-modron.blogspot.com/2011/04/polygon-clipping-wrapper-benchmark.html )

The simplest way to address this might be to union subsets of your polygons before performng a final union of your subset results.

There are other much more complex ways to address this, including streaming polygons as they're encountered in the sweep algorithm, but if the above suggestion down't work for you, I'd suggest using another library that's perhaps better suited to very large datasets.

Cheers

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jan - 2019-11-02

You guys helped me on my way, thanks.
I have one remaining question because the code improvements and the speed increase with preview/beta version 2.0 are attractive. Do you have any reason to believe that the preview version is less correct than the preceding version 4.6.2? (I'm not asking for guarantees, mind you)
I'll certainly get to work on pre-grouping/input partitioning, probably using the fact that my polygons have calculated stats by the time of clipping, so I can at least use the extents to determine potential proximity, which should not be that hard if I use a tiling approach. Maybe I'll try to look for concentrations, have to try out what will work without overly complicating and inviting hard to find bugs :) It's usually easier said than done, off course.
In the mean time I collected some papers. Probably you have them, but anyway:
https://www.dropbox.com/sh/h8le08p5rlnq3wn/AABjpwtOdEo8JkloCB8j2rZka?dl=0
GPC is also in it. Nice old style C-code, many capitals, reminds just a little of Fortran code :)
GPC does not appear to admit as many bugs as I expected.
Cheers!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Angus Johnson - 2019-11-02

Do you have any reason to believe that the preview version is less correct than the preceding version 4.6.2? (I'm not asking for guarantees, mind you)

I'm confident it's still just as robust as version 4.6.2. However, it's still only beta because there are additional features I hope to address such as removing micro self-intersections related to rounding (which also occur in the 'stable' version) and a more complete solution to joining polygons in solutions that have touching edges. (Neither of these things are easy to fix, and I've needed a break from Clipper to focus on other things, hence the prolonged delay there.)

so I can at least use the extents to determine potential proximity

I don't think that's necessary and would probably only slow things down. Just arbitrarily divide your polygons into smaller groups (eg 1,000) and union these before a final union of the results.

Edit: I'd also be very interested to know how well this works for you.

Last edit: Angus Johnson 2019-11-02

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Jan - 2019-11-02
  
  Thanks!
  
  On Sat, Nov 2, 2019 at 11:03 PM Angus Johnson angusj@users.sourceforge.net
  wrote:
  
  Do you have any reason to believe that the preview version is less correct
  than the preceding version 4.6.2? (I'm not asking for guarantees, mind you)
  
  I'm confident it's still just as robust as version one. However, it's
  still only beta because there are additional features I hope to address
  such as removing micro self-intersections related to rounding (which also
  occur on version 1) and a more complete solution to joining polygons in
  solutions that have touching edges. (Neither of these things are easy to
  fix, and I've needed a break from Clipper to focus on other things, hence
  the prolonged delay there.)
  
  so I can at least use the extents to determine potential proximity
  
  I don't think that's necessary and would probably only slow things down.
  Just arbitrarily divide your polygons into smaller groups (eg 1,000) and
  union these before a final union of the results.
  
  usefulness of binning a large set into spatially related sets to speed up
  clipping
  https://sourceforge.net/p/polyclipping/discussion/1148419/thread/3d84e8fb66/?limit=25#6d1e
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/polyclipping/discussion/1148419/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  --
  Lees meer over de Landelijke Signaleringskaart
  https://www.dropbox.com/s/yuqfthjdcmbmdhe/NominatieSignaleringskaart.pdf?dl=0
  en stem op ons als uw favoriete Omgevingswet project
  https://www.eventtouch.eu/ministeriebzk/trofee2019/!
  
  alternate
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Angus Johnson - 2019-11-03

Also, given that performance is critical, are you sure you need to use the PolyTree structure instead of Paths?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jan - 2019-11-03

Hi Angus,

Was caught up in some socialities ;)

I used the PolyTree approach years ago because it gave me the holes.
If you say I shouldn't need to do this, I'll have another look.

Now I'm converting to 2.0 and have a few loose ends:
ReverseSolution(), SimpliFyPolygon(), CleanPolygon(), StrictlySimple() are no longer there,
I'm trying to find out whether I really need them, but I'm not comfortable to scrap them;
less important but potentially so: ArcTolerance is no longer available and tuning variables the likes of delta and stepsperrad may need exposing (too).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jan - 2019-11-06

I've implemented subdivision and tweaked it a bit. In my task the effect is substantial, from about 8 hours to 40 minutes (factor 12). Subdivision is by a factor of the number of items (per class), which varies quite a bit. That worked better than using fixed numbers of polys per subtask, say 1000. I have not preordered the polygons, but they have some ordening stemming from a set of shapefiles covering regular tiles of the whole area. That may also explain why a fractional size worked better than a fixed size.
I first implemented the 2.0 Clipper version in my library, but reverted when I did not get correct results, because I need the library to be functional. When I have time, I'll try to replay my scenario's by directly to the V2.0 objects to find out whether I missed something or not.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Angus Johnson - 2019-11-06

I've implemented subdivision and tweaked it a bit. In my task the effect is substantial, from about 8 hours to 40 minutes

Excellent. Also, I'm wondering now if groups of 100 or 200 may even be a little faster. It'd be interesting to find the sweet spot.

I first implemented the 2.0 Clipper version in my library, but reverted when I did not get correct result ... When I have time, I'll try to replay my scenario's by directly to the V2.0 objects to find out whether I missed something or not.

Obviously I'd be very interested to see examples of any problems (preferably with smallish datasets :)).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Jan - 2019-11-06
  
  Hi Angus,
  I tried smaller groups, but they gave significantly worse results.
  1000 was a good start, but also to my surprise, higher was better but 5000
  too much.
  My multithreading does not cover this part, it would have to be nested
  multithreading, I'm looking into that, as it would help with the largest
  datasets.
  Also I haven't spent energy to find out whether the clipping of the groups
  individually or the clipping of the groupresults takes more time.
  I hope I can select a smallish sample, but first I have to set up the test
  without using my own library as a potentially confusing influence.
  Thanks,
  Jan
  
  On Wed, Nov 6, 2019 at 7:21 PM Angus Johnson angusj@users.sourceforge.net
  wrote:
  
  I've implemented subdivision and tweaked it a bit. In my task the effect
  is substantial, from about 8 hours to 40 minutes
  
  Excellent. Also, I'm wondering now if groups of 100 or 200 may even be a
  little faster. It'd be interesting to find the sweet spot.
  
  I first implemented the 2.0 Clipper version in my library, but reverted
  when I did not get correct result ... When I have time, I'll try to replay
  my scenario's by directly to the V2.0 objects to find out whether I missed
  something or not.
  
  Obviously I'd be very interested to see examples of any problems
  (preferably with smallish datasets :)).
  
  usefulness of binning a large set into spatially related sets to speed up
  clipping
  https://sourceforge.net/p/polyclipping/discussion/1148419/thread/3d84e8fb66/?limit=25#e7eb
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/polyclipping/discussion/1148419/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  --
  Lees meer over de Landelijke Signaleringskaart
  https://www.dropbox.com/s/yuqfthjdcmbmdhe/NominatieSignaleringskaart.pdf?dl=0
  en stem op ons als uw favoriete Omgevingswet project
  https://www.eventtouch.eu/ministeriebzk/trofee2019/!
  
  alternate
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ignorant - 2019-11-07

curiosity ...!
Is the speedup because of multithreading?.
Please, can you share your polygon subgrouping method?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Jan - 2019-11-07
  
  On the advice of Angus I did not (yet) put any effort in to group spatially.My data are already organised (spatially) fairly well.So, I'll give a few details first which have to do with my library/organisation.I use a format for a a collection of polygon with holes.
  In my lingo that is a MultiComplexGon.
  The data comes as shapefiles, which are read into a single object as before, and then gets sent to the dissolve routine, which puts the data into Path objects, calculates and puts the result back.dissolveloadfactor is an estimate of the average complexity of the records in the shapefile.
  Translating twice is a bit of a hassle, but it is hardly significant timewise.
  tshpi is a struct that helps handling shapefile input.
  The multithreading is done using openMP (have to get around to using a more C++ threadpool), which explains the #pragma omp critical for the load.
  So hoping it's not too disappointing:
  
  MultiComplexGonVector mcgv; const int dissolveload = max(1000, tshpi.nRecords / 20); const double dissolveloadfactor = 1.1 / dissolveload; mcgv.reserve(max(1, int(dissolveloadfactor * tshpi.nRecords))); for (inrec = 0; inrec < tshpi.nRecords; ++inrec) { MultiComplexGon mcg2; #pragma omp critical { mcg2 = MultiComplexGon(&tshpi, inrec); } for (auto& cg : mcg2) { if (nparts++ % dissolveload == 0) mcgv.emplace_back(); mcgv.back() += move(cg); } } for (int ip = 0; ip < (int)mcgv.size(); ++ip) { auto& mcg = mcgv[ip]; mcg.dissolve(); for (auto& cg : mcg) mcgfinal.insert(move(cg)); } mcgfinal.dissolve();
  
  I could use a spatial index or merely use the centroids/extents for more organisation of the feed, but having a fairly useful result, it's not a priority for me right now. I suspect that I would need to 'grow' the groups instead of simply hacking the input into parts.
  
  Last edit: Jan 2019-11-07
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

usefulness of binning a large set into spatially related sets to speed up clipping

Polygon and line clipping and offsetting library (C++, C#, Delphi)

Forums

Help

usefulness of binning a large set into spatially related sets to speed up clipping

usefulness of binning a large set into spatially related sets to speed up clipping

Polygon and line clipping and offsetting library (C++, C#, Delphi)

Forums

Help

usefulness of binning a large set into spatially related sets to speed up clipping document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

usefulness of binning a large set into spatially related sets to speed up clipping