Menu

usefulness of binning a large set into spatially related sets to speed up clipping

Jan
2019-11-01
2019-11-07
  • Jan

    Jan - 2019-11-01

    I have quite large polygon datasets, order of 1e6 polys which can be clipped in one go, but then take (too much) time. Multicoring is possible, but does not help out enough, even with 16 threads on a Ryzen 2700X.

    Working with large numbers the expected N2 complexity would have to be - clipped - not to say reduced.
    Is this assumption of mine correct?

    My reason for believing that a serious speedup for large datasets is possible, comes from running the desired dissolves on Qgis, but more to the point, arcgis in the latest versions (pro: 2.4.2, desktop 10.7.1). The latter are one, if not two orders of magnitude faster than what I did using the clipper library simply dumping all the rings as path. After quite a bit of work, I felt, well, silly.

    If my idea about binning/two steps approach is correct, the basic approach could be to use a spatial index to group subsets, clip these and subsequently test for overlaps to do the second round, while leaving the non-overlapping well alone.
    I've not quite thought this out and wonder if someone has, and/or more people have this need.
    When filling the path, binning could be done without the need of a spatial index, through a service to be provided by the library perhaps.

    Please let me know whether you agree, have more developed ideas, etc.
    Thanks,
    Jan

     
  • Ignorant

    Ignorant - 2019-11-02

    dear,
    what rendering do you require?
    winding numbers?. ..parallelel edge bundling.....?...
    There are apecialised versions of vatti for odd/even(alternate) windings which run hundreds of times faster if many many overlapping edges are input for clipping.

     

    Last edit: Ignorant 2019-11-02
  • Jan

    Jan - 2019-11-02

    Hi,
    Thanks for the attention! I require winding number dissolving for this specific case.
    My basic code is (parts being polygons which may have holes etc. as in shapefile):

        PolyTree pt;
        Clipper cl;
        cl.StrictlySimple(strictlysimple);
        cl.ReverseSolution(true);
        for (const auto &p : parts)
        {
            const unsigned np = p.npoints();
            Path subj; if (subj.capacity()<np) subj.reserve(np);
            const PuntVector &vertex = p.get_outer().vertices();
            for (const auto &pp : vertex)
                subj.emplace_back(IRound(pp.x * multiplier), IRound(pp.y * multiplier));
            cl.AddPath(subj, ptSubject, true); subj.clear();
            for (auto &inn : p.get_innerrings())
            {
                for (const auto &pp : inn.vertices())
                    subj.emplace_back(IRound(pp.x * multiplier), IRound(pp.y * multiplier));
                cl.AddPath(subj, ptSubject, true); subj.clear();
            }
        }
        cl.Execute(ctUnion, pt, pftNonZero);
    

    Should I google up on specialised Vatti algorithms?
    Thanks,
    Jan

     
    • Ignorant

      Ignorant - 2019-11-02

      the only easily available (with source code) is Angus clipper if you want
      windings other than alternate.
      Specialised vatti includes Anugus code.The other one (with overfed bugs) is
      murtas general polygon clipper.
      an attempt to remove some noticed bugs is available on github. Code for
      vatti is also available from NCAR
      which was the first vatti available with proper horizontal edge
      handing.Alan murta combined al the 4 operations
      in a single source to call it the General Polygon clipper--He added 3 bugs
      in the process.That code has a couple of
      innovations which might help you..It has a horrible input polygon
      partitioning code so rewriting is necessary for a large number of input
      vertices.
      Angus code clipperV2 is much neater but does NOT handle overlapping
      bundles which gpc does.

      On Sat, Nov 2, 2019 at 5:44 PM Jan jheckman@users.sourceforge.net wrote:

      Hi,
      Thanks for the attention! I require winding number dissolving for this
      specific case.
      My basic code is (parts being polygons which may have holes etc. as in
      shapefile):

      PolyTree pt;
      Clipper cl;
      cl.StrictlySimple(strictlysimple);
      cl.ReverseSolution(true);
      for (const auto &p : parts)
      {
          const unsigned np = p.npoints();
          Path subj; if (subj.capacity()<np) subj.reserve(np);
          const PuntVector &vertex = p.get_outer().vertices();
          for (const auto &pp : vertex)
              subj.emplace_back(IRound(pp.x * multiplier), IRound(pp.y * multiplier));
          cl.AddPath(subj, ptSubject, true); subj.clear();
          for (auto &inn : p.get_innerrings())
          {
              for (const auto &pp : inn.vertices())
                  subj.emplace_back(IRound(pp.x * multiplier), IRound(pp.y * multiplier));
              cl.AddPath(subj, ptSubject, true); subj.clear();
          }
      }
      cl.Execute(ctUnion, pt, pftNonZero);
      

      Should I google up on specialised Vatti algorithms?
      Thanks,
      Jan


      usefulness of binning a large set into spatially related sets to speed up
      clipping
      https://sourceforge.net/p/polyclipping/discussion/1148419/thread/3d84e8fb66/?limit=25#e68e


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/polyclipping/discussion/1148419/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       
  • Angus Johnson

    Angus Johnson - 2019-11-02

    Hi Jan.

    Working with large numbers the expected N2 complexity would have to be - clipped - not to say reduced.
    Is this assumption of mine correct?

    No. IIRC Clipper's algorithm complexity is worse than N^2, hence the problem you're encountering with very large datasets. (For performance comparisons with other clipping libraries see https://rogue-modron.blogspot.com/2011/04/polygon-clipping-wrapper-benchmark.html )

    The simplest way to address this might be to union subsets of your polygons before performng a final union of your subset results.

    There are other much more complex ways to address this, including streaming polygons as they're encountered in the sweep algorithm, but if the above suggestion down't work for you, I'd suggest using another library that's perhaps better suited to very large datasets.

    Cheers

     
  • Jan

    Jan - 2019-11-02

    You guys helped me on my way, thanks.
    I have one remaining question because the code improvements and the speed increase with preview/beta version 2.0 are attractive. Do you have any reason to believe that the preview version is less correct than the preceding version 4.6.2? (I'm not asking for guarantees, mind you)
    I'll certainly get to work on pre-grouping/input partitioning, probably using the fact that my polygons have calculated stats by the time of clipping, so I can at least use the extents to determine potential proximity, which should not be that hard if I use a tiling approach. Maybe I'll try to look for concentrations, have to try out what will work without overly complicating and inviting hard to find bugs :) It's usually easier said than done, off course.
    In the mean time I collected some papers. Probably you have them, but anyway:
    https://www.dropbox.com/sh/h8le08p5rlnq3wn/AABjpwtOdEo8JkloCB8j2rZka?dl=0
    GPC is also in it. Nice old style C-code, many capitals, reminds just a little of Fortran code :)
    GPC does not appear to admit as many bugs as I expected.
    Cheers!

     
  • Angus Johnson

    Angus Johnson - 2019-11-02

    Do you have any reason to believe that the preview version is less correct than the preceding version 4.6.2? (I'm not asking for guarantees, mind you)

    I'm confident it's still just as robust as version 4.6.2. However, it's still only beta because there are additional features I hope to address such as removing micro self-intersections related to rounding (which also occur in the 'stable' version) and a more complete solution to joining polygons in solutions that have touching edges. (Neither of these things are easy to fix, and I've needed a break from Clipper to focus on other things, hence the prolonged delay there.)

    so I can at least use the extents to determine potential proximity

    I don't think that's necessary and would probably only slow things down. Just arbitrarily divide your polygons into smaller groups (eg 1,000) and union these before a final union of the results.

    Edit: I'd also be very interested to know how well this works for you.

     

    Last edit: Angus Johnson 2019-11-02
    • Jan

      Jan - 2019-11-02

      Thanks!

      On Sat, Nov 2, 2019 at 11:03 PM Angus Johnson angusj@users.sourceforge.net
      wrote:

      Do you have any reason to believe that the preview version is less correct
      than the preceding version 4.6.2? (I'm not asking for guarantees, mind you)

      I'm confident it's still just as robust as version one. However, it's
      still only beta because there are additional features I hope to address
      such as removing micro self-intersections related to rounding (which also
      occur on version 1) and a more complete solution to joining polygons in
      solutions that have touching edges. (Neither of these things are easy to
      fix, and I've needed a break from Clipper to focus on other things, hence
      the prolonged delay there.)

      so I can at least use the extents to determine potential proximity

      I don't think that's necessary and would probably only slow things down.
      Just arbitrarily divide your polygons into smaller groups (eg 1,000) and
      union these before a final union of the results.


      usefulness of binning a large set into spatially related sets to speed up
      clipping
      https://sourceforge.net/p/polyclipping/discussion/1148419/thread/3d84e8fb66/?limit=25#6d1e


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/polyclipping/discussion/1148419/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

      --
      Lees meer over de Landelijke Signaleringskaart
      https://www.dropbox.com/s/yuqfthjdcmbmdhe/NominatieSignaleringskaart.pdf?dl=0
      en stem op ons als uw favoriete Omgevingswet project
      https://www.eventtouch.eu/ministeriebzk/trofee2019/!

       
  • Angus Johnson

    Angus Johnson - 2019-11-03

    Also, given that performance is critical, are you sure you need to use the PolyTree structure instead of Paths?

     
  • Jan

    Jan - 2019-11-03

    Hi Angus,

    Was caught up in some socialities ;)

    I used the PolyTree approach years ago because it gave me the holes.
    If you say I shouldn't need to do this, I'll have another look.

    Now I'm converting to 2.0 and have a few loose ends:
    ReverseSolution(), SimpliFyPolygon(), CleanPolygon(), StrictlySimple() are no longer there,
    I'm trying to find out whether I really need them, but I'm not comfortable to scrap them;
    less important but potentially so: ArcTolerance is no longer available and tuning variables the likes of delta and stepsperrad may need exposing (too).

     
  • Jan

    Jan - 2019-11-06

    I've implemented subdivision and tweaked it a bit. In my task the effect is substantial, from about 8 hours to 40 minutes (factor 12). Subdivision is by a factor of the number of items (per class), which varies quite a bit. That worked better than using fixed numbers of polys per subtask, say 1000. I have not preordered the polygons, but they have some ordening stemming from a set of shapefiles covering regular tiles of the whole area. That may also explain why a fractional size worked better than a fixed size.
    I first implemented the 2.0 Clipper version in my library, but reverted when I did not get correct results, because I need the library to be functional. When I have time, I'll try to replay my scenario's by directly to the V2.0 objects to find out whether I missed something or not.

     
  • Angus Johnson

    Angus Johnson - 2019-11-06

    I've implemented subdivision and tweaked it a bit. In my task the effect is substantial, from about 8 hours to 40 minutes

    Excellent. Also, I'm wondering now if groups of 100 or 200 may even be a little faster. It'd be interesting to find the sweet spot.

    I first implemented the 2.0 Clipper version in my library, but reverted when I did not get correct result ... When I have time, I'll try to replay my scenario's by directly to the V2.0 objects to find out whether I missed something or not.

    Obviously I'd be very interested to see examples of any problems (preferably with smallish datasets :)).

     
    • Jan

      Jan - 2019-11-06

      Hi Angus,
      I tried smaller groups, but they gave significantly worse results.
      1000 was a good start, but also to my surprise, higher was better but 5000
      too much.
      My multithreading does not cover this part, it would have to be nested
      multithreading, I'm looking into that, as it would help with the largest
      datasets.
      Also I haven't spent energy to find out whether the clipping of the groups
      individually or the clipping of the groupresults takes more time.
      I hope I can select a smallish sample, but first I have to set up the test
      without using my own library as a potentially confusing influence.
      Thanks,
      Jan

      On Wed, Nov 6, 2019 at 7:21 PM Angus Johnson angusj@users.sourceforge.net
      wrote:

      I've implemented subdivision and tweaked it a bit. In my task the effect
      is substantial, from about 8 hours to 40 minutes

      Excellent. Also, I'm wondering now if groups of 100 or 200 may even be a
      little faster. It'd be interesting to find the sweet spot.

      I first implemented the 2.0 Clipper version in my library, but reverted
      when I did not get correct result ... When I have time, I'll try to replay
      my scenario's by directly to the V2.0 objects to find out whether I missed
      something or not.

      Obviously I'd be very interested to see examples of any problems
      (preferably with smallish datasets :)).


      usefulness of binning a large set into spatially related sets to speed up
      clipping
      https://sourceforge.net/p/polyclipping/discussion/1148419/thread/3d84e8fb66/?limit=25#e7eb


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/polyclipping/discussion/1148419/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

      --
      Lees meer over de Landelijke Signaleringskaart
      https://www.dropbox.com/s/yuqfthjdcmbmdhe/NominatieSignaleringskaart.pdf?dl=0
      en stem op ons als uw favoriete Omgevingswet project
      https://www.eventtouch.eu/ministeriebzk/trofee2019/!

       
  • Ignorant

    Ignorant - 2019-11-07

    curiosity ...!
    Is the speedup because of multithreading?.
    Please, can you share your polygon subgrouping method?

     
    • Jan

      Jan - 2019-11-07

      On the advice of Angus I did not (yet) put any effort in to group spatially.My data are already organised (spatially) fairly well.So, I'll give a few details first which have to do with my library/organisation.I use a format for a a collection of polygon with holes.
      In my lingo that is a MultiComplexGon.
      The data comes as shapefiles, which are read into a single object as before, and then gets sent to the dissolve routine, which puts the data into Path objects, calculates and puts the result back.dissolveloadfactor is an estimate of the average complexity of the records in the shapefile.
      Translating twice is a bit of a hassle, but it is hardly significant timewise.
      tshpi is a struct that helps handling shapefile input.
      The multithreading is done using openMP (have to get around to using a more C++ threadpool), which explains the #pragma omp critical for the load.
      So hoping it's not too disappointing:

          MultiComplexGonVector mcgv;
          const int dissolveload = max(1000, tshpi.nRecords / 20);
          const double dissolveloadfactor = 1.1 / dissolveload;
          mcgv.reserve(max(1, int(dissolveloadfactor * tshpi.nRecords)));
          for (inrec = 0; inrec < tshpi.nRecords; ++inrec)
          {
              MultiComplexGon mcg2;
          #pragma omp critical
              { mcg2 = MultiComplexGon(&tshpi, inrec); }
              for (auto& cg : mcg2)
              {
                  if (nparts++ % dissolveload == 0) mcgv.emplace_back();
                  mcgv.back() += move(cg);
              }
          }
          for (int ip = 0; ip < (int)mcgv.size(); ++ip)
          {
              auto& mcg = mcgv[ip];
              mcg.dissolve();
              for (auto& cg : mcg)
                  mcgfinal.insert(move(cg));
          }
          mcgfinal.dissolve();
      

      I could use a spatial index or merely use the centroids/extents for more organisation of the feed, but having a fairly useful result, it's not a priority for me right now. I suspect that I would need to 'grow' the groups instead of simply hacking the input into parts.

       

      Last edit: Jan 2019-11-07