Menu

#22 Import runs out of memory on some pbf files

main
closed-fixed
import (1)
7
2016-11-21
2016-11-13
No

Current version (commit b23a6d) crashes while importing 'bigger' osm.pbf files. I just tried the last Slovenia's PBF (http://data.osm-hr.org/slovenia/archive/20161031-slovenia.osm.pbf), and it crashes on step #9 (MergeAreasGenerator) as it runs out of memory in MergeAreasGenerator::IndexAreasByNodeIds at idAreaMap[id].insert(area);

Import of older data (http://data.osm-hr.org/slovenia/archive/20160131-slovenia.osm.pbf) actually succeded, but these are hardly HUGE datasets (working one is 82M, while crashing one has 130M. I also tried to run import with an older build of libosmscout (approximately a year old) and that worked flawlessly.

Discussion

  • Tim Teulings

    Tim Teulings - 2016-11-13

    Hello Toni,

    yes, there are various reason why memory consumption has encreased. This can eithe rbe changes in libosmscout or in the data itself.

    I was able to import the data you refeenced without problems:

    ../Import/src/Import --rawCoordBlockSize 120000000 --coordDataMemoryMaped true --coordIndexCacheSize 1000000 --rawNodeDataMemoryMaped true --rawWayDataMemoryMaped true --rawWayIndexCacheSize 100000 --rawWayBlockSize 2500000 --wayDataMemoryMaped true --areaDataMemoryMaped true --routeNodeBlockSize 2000000 --typefile ../stylesheets/map.ost --destinationDirectory slovenia slovenia.osm.pbf

    [tim:~/projects/OSMScout/maps] master(+32/-25,59)+* ± grep -e " => " -e "+ Step #" slovenia.txt

    • Step #1 - TypeDataGenerator...
      => 0.000s, RSS 7,9 MiB, VM 50,7 MiB
    • Step #2 - Preprocess...
      => 6.760s, RSS 83,1 MiB, VM 784,8 MiB
    • Step #3 - CoordDataGenerator...
      => 16.612s, RSS 1,1 GiB, VM 1,8 GiB
    • Step #4 - RawWayIndexGenerator...
      => 0.675s, RSS 95,0 MiB, VM 800,2 MiB
    • Step #5 - RawRelationIndexGenerator...
      => 0.023s, RSS 24,9 MiB, VM 722,9 MiB
    • Step #6 - RelAreaDataGenerator...
      => 10.491s, RSS 443,6 MiB, VM 1,2 GiB
    • Step #7 - WayAreaDataGenerator...
      => 11.249s, RSS 461,4 MiB, VM 1,2 GiB
    • Step #8 - MergeAreaDataGenerator...
      => 7.365s, RSS 97,6 MiB, VM 798,5 MiB
    • Step #9 - MergeAreasGenerator...
      => 88.567s, RSS 1,7 GiB, VM 2,3 GiB
    • Step #10 - WayWayDataGenerator...
      => 5.884s, RSS 743,3 MiB, VM 1,5 GiB
    • Step #11 - OptimizeAreaWayIdsGenerator...
      => 17.484s, RSS 264,1 MiB, VM 986,4 MiB
    • Step #12 - NodeDataGenerator...
      => 0.126s, RSS 38,7 MiB, VM 758,8 MiB
    • Step #13 - SortNodeDataGenerator...
      => 0.175s, RSS 38,7 MiB, VM 761,3 MiB
    • Step #14 - SortWayDataGenerator...
      => 1.478s, RSS 55,1 MiB, VM 776,0 MiB
    • Step #15 - AreaNodeIndexGenerator...
      => 0.494s, RSS 39,7 MiB, VM 761,1 MiB
    • Step #16 - AreaWayIndexGenerator...
      => 1.124s, RSS 51,9 MiB, VM 773,9 MiB
    • Step #17 - AreaAreaIndexGenerator...
      => 9.841s, RSS 172,4 MiB, VM 893,2 MiB
    • Step #18 - WaterIndexGenerator...
      => 2.130s, RSS 92,5 MiB, VM 814,9 MiB
    • Step #19 - OptimizeAreasLowZoomGenerator...
      => 21.276s, RSS 509,3 MiB, VM 1,2 GiB
    • Step #20 - OptimizeWaysLowZoomGenerator...
      => 1.268s, RSS 92,7 MiB, VM 814,9 MiB
    • Step #21 - LocationIndexGenerator...
      => 11.607s, RSS 158,7 MiB, VM 883,1 MiB
    • Step #22 - RouteDataGenerator...
      => 11.350s, RSS 291,1 MiB, VM 1,1 GiB
    • Step #23 - IntersectionIndexGenerator...
      => 0.039s, RSS 80,6 MiB, VM 800,8 MiB
    • Step #24 - TextIndexGenerator...
      => 1.290s, RSS 80,6 MiB, VM 800,8 MiB
      Overall 227.342s, RSS 1,7 GiB, VM 2,3 GiB
      => 117,3 MiB
      => 2,4 MiB

    Step #9 though has the maximum memory requirements. So the next question is:
    How much memory dou you have? Which OS are you using? Is it 32 bit or 64 bit?

    I was able to reduce memory consumption by setting:
    --rawWayDataMemoryMaped false

     
  • Toni Rutar Lokar

    I built/ran this on Win 10 x64 (VS 2015), but with a 32bit build of libosmscout. If you check your step #9, you'll see that you already went over the 32bit limit

    • Step #9 - MergeAreasGenerator...
      => 88.567s, RSS 1,7 GiB, VM 2,3 GiB

    as 2,3GB is over 32bit limit without the /LARGEADDRESSAWARE, which I'm a bit reluctant to use. Still - even with /LARGEADDRESSAWARE a bigger dataset would crash import. So I'd still suggest trying to lower the memory usage if possible.

    ... now, with /LARGEADDRESSAWARE, I got to step #10 (WayWayDataGenerator) but now I ran into some new trouble (at first look it looks like some data 'corruption') with outputs like:

    !! Cannot resolve node with id 426732420028174592 for way 438399017, splitting
    !! Cannot resolve node with id 5429394624 for way 438399017, splitting
    !! Cannot resolve node with id 1011276438357722672 for way 438399017, splitting
    !! Cannot resolve node with id 4530422800 for way 438399017, splitting

    !! Cannot resolve node with id -9223365438774739741 for way 438399017, splitting
    

    !! Cannot resolve node with id -6917521330205724080 for way 438399017, splitting
    !! Cannot resolve node with id -4683734815544872575 for way 438399017, splitting
    !! Cannot resolve node with id 13089995 for way 438399017, splitting
    !! Cannot resolve node with id 99931219 for way 438399017, splitting
    !! Cannot resolve node with id 99931221 for way 438399017, splitting

    ... before crashing. Still have to check it out properly, though.

    BTW: do you test libosmscout on Visual Studio/Windows builds? And - what versions on LibXML, libProtoBuf, libIconV and ZLib do you use?

     
  • Toni Rutar Lokar

    Done a bit more analysis and the crash seems to be x86 related. I did a Ubuntu x64 build, which ran the import without any problems, but a Debian x86 build crashed (same as Windows) on step #11

    Then I did some comparing of created data and found first changes alreday on step #1 (generation of types.dat). Those seem optimization-related (group names are writen in reverse order if comparing VS2015 and linux created data.

    But then, on step #2 the real problems arise x64 build of rawways.dat and rawcoastline.dat differ from x86 builds (but are same between linux and windows x86 builds).

    So - the bugreport should actually be - x86 builds of Import crash.

     
  • Toni Rutar Lokar

    Found the first bug, which is responsible for rawways.dat and rawcoastline.dat corruption. In PreprocessPBF.cpp, function void PreprocessPBF::ReadWays(...) uses:
    unsigned long ref=0;

    ... which is 32bit on x86 build but values come over 32bit number limits. changing it to
    OSMId ref=0;
    (as you use OSMId list for data.wayData) fixes this problem. The rest of the data is still not the same, so I guess there's more 'unsigned long's around the sources which should be unsigned long long or something similar.

     
  • Tim Teulings

    Tim Teulings - 2016-11-15
    • status: open --> accepted
    • assigned_to: Tim Teulings
     
  • Toni Rutar Lokar

    I went thorught the import process and couldn't find more problems. The difference in data (at least the one that I checked) is unimportant as it comes from unordered lists of data, lists sorted by refs and slightly different merge results.

     
  • Tim Teulings

    Tim Teulings - 2016-11-16

    So is there any problem to solve? Have you tested disabling mmap for files (this is also implemented under windows)? Do you need further information how to reduce memory requirements?

     
  • Toni Rutar Lokar

    Now, with the bugfix, the import goes through nicely. the data also seems OK at first look in OsmScout2 (will have to do a bit more checking though). Enabling /LARGEADDRESSAWARE also solves the problem with 'big' memory footprint for now (or until I try to import some way bigger files). so generally I could say that the problem is solved (with the bugfix I proposed, as otherwise the data gets corrupted). Would have to check an Import of bigger data file under x86 with disabled memory mapped files though, before saying that all's well :) In any case - I think this ticked can be closed once the bugfix is applied. I'd also suggest enabling /LARGEADDRESSAWARE in the VS linker settings for Import project on x86.

     
  • Tim Teulings

    Tim Teulings - 2016-11-18

    I have pushed the fix to the github repository before and now pushed the current development snapshot also to the sourceforge repository.

    I'm closing this issue thus.

    Regarding /LARGEADDRESSAWARE: Do you use the cmake files to create the VisualStudio project? In this case, could you make a oull request for adding the flag?

     
  • Tim Teulings

    Tim Teulings - 2016-11-18
    • status: accepted --> closed-fixed
     
  • Toni Rutar Lokar

    Regarding VS project - I actually started off with what is in the 'windows' folder and tailored it until I got a usable solution/projects ;) That's also why I asked whether you make any windows/VS builds, as those projects need to be adjusted quite a bit before you can actually build them.

    ... and last but not least - Thank you for a very nice library!

     

    Last edit: Toni Rutar Lokar 2016-11-18
  • Tim Teulings

    Tim Teulings - 2016-11-21

    I would suggest to start with the cmake project files and generate visual studio files form it. My current plan would be to drop the manually maintained visual stuio project files soon. There are a number of automatic builds for Linux, Windows and Mac OS.

    See:

    If you supply pull request via github (https://github.com/Framstag/libosmscout/pulls) these would automatically build, too. I'm not sure, fi travis or appveyor still support 32 bit builds but in case of we can add it (for a subset of builds).

     

Log in to post a comment.

MongoDB Logo MongoDB