Menu

#9 segmentation fault with large data set

open-accepted
matbuild (2)
7
2014-07-14
2005-08-11
Anonymous
No

Hi,

I'm attempting to do a C205 SNFS factorization, but
matbuild crashes in pass 32. First, I get

ll_catFields(): memory reallocation error!
191888103 s32's requested.
Old size was L->maxDataSize=189664691
L->numFields = 25942739, maxShift = 2484826
L->index[25942739] = 189403267
numPairs=91165, numNewEntries=66

which is a bit puzzling since the computer has 3GB of
memory, and at that time it was "only" using about
1.4GB. This was followed almost immediately by the
segfault. Below is the output of the successful pass
31 and the problem in pass 32, followed by the output
of gdb indicating where it thinks the problem is. I
don't know how to diagnose this further, so hopefully
someone can help!

Greg

pass 31...
Before sortByNumLP()... Doing ll_verify(P)...
ll_verify() reports that 'P' appears to be intact.
makePass:
There are 126896 relations with 0 large primes.
There are 618329 relations with 1 large primes.
There are 7390730 relations with 2 large primes.
There are 9751011 relations with 3 large primes.
There are 6628289 relations with 4 large primes.
There are 1865015 relations with 5 large primes.
There are 0 relations with 6 large primes.
After sortByNumLP()... Doing ll_verify(P)...
ll_verify() reports that 'P' appears to be intact.
Deleting 115465 singleton large primes.
Deleting 3880 singleton large primes.
Deleting 1763 singleton large primes.
Deleting 845 singleton large primes.
Deleting 355 singleton large primes.
Deleting 165 singleton large primes.
Deleting 75 singleton large primes.
Deleting 44 singleton large primes.
Deleting 25 singleton large primes.
Deleting 14 singleton large primes.
Deleting 8 singleton large primes.
Deleting 4 singleton large primes.
Deleting 2 singleton large primes.
Deleting 1 singleton large primes.
Deleting 1 singleton large primes.
Deleting 0 singleton large primes.
Total: 122647 singletons deleted.
makePass:
There are 126896 relations with 0 large primes.
There are 511369 relations with 1 large primes.
There are 7387788 relations with 2 large primes.
There are 9745366 relations with 3 large primes.
There are 6623078 relations with 4 large primes.
There are 1863126 relations with 5 large primes.
There are 0 relations with 6 large primes.
Doing merge on chunk 1/41 (P0=0, P1=1039761)...
Doing 299338 additions...
Doing merge on chunk 2/41 (P0=1039762, P1=2079524)...
Doing 100141 additions...
Doing merge on chunk 3/41 (P0=2079525, P1=3119287)...
Doing 24114 additions...
Doing merge on chunk 4/41 (P0=3119288, P1=4159050)...
Doing 15463 additions...
Doing merge on chunk 5/41 (P0=4159051, P1=5198813)...
Doing 10431 additions...
Doing merge on chunk 6/41 (P0=5198814, P1=6238576)...
Doing 8247 additions...
Doing merge on chunk 7/41 (P0=6238577, P1=7278339)...
Doing 6216 additions...
Doing merge on chunk 8/41 (P0=7278340, P1=8318102)...
Doing 5108 additions...
Doing merge on chunk 9/41 (P0=8318103, P1=9357865)...
Doing 4031 additions...
Doing merge on chunk 10/41 (P0=9357866, P1=10397628)...
Doing 3607 additions...
Doing merge on chunk 11/41 (P0=10397629, P1=11437391)...
Doing 3051 additions...
Doing merge on chunk 12/41 (P0=11437392, P1=12477154)...
Doing 2598 additions...
Doing merge on chunk 13/41 (P0=12477155, P1=13516917)...
Doing 2316 additions...
Doing merge on chunk 14/41 (P0=13516918, P1=14556680)...
Doing 2004 additions...
Doing merge on chunk 15/41 (P0=14556681, P1=15596443)...
Doing 1817 additions...
Doing merge on chunk 16/41 (P0=15596444, P1=16636206)...
Doing 1496 additions...
Doing merge on chunk 17/41 (P0=16636207, P1=17675969)...
Doing 1472 additions...
Doing merge on chunk 18/41 (P0=17675970, P1=18715732)...
Doing 1272 additions...
Doing merge on chunk 19/41 (P0=18715733, P1=19755495)...
Doing 1209 additions...
Doing merge on chunk 20/41 (P0=19755496, P1=20795258)...
Doing 1085 additions...
Doing merge on chunk 21/41 (P0=20795259, P1=21835021)...
Doing 30442 additions...
Doing merge on chunk 22/41 (P0=21835022, P1=22874784)...
Doing 50287 additions...
Doing merge on chunk 23/41 (P0=22874785, P1=23914547)...
Doing 24305 additions...
Doing merge on chunk 24/41 (P0=23914548, P1=24954310)...
Doing 14350 additions...
Doing merge on chunk 25/41 (P0=24954311, P1=25994073)...
Doing 9754 additions...
Doing merge on chunk 26/41 (P0=25994074, P1=27033836)...
Doing 6705 additions...
Doing merge on chunk 27/41 (P0=27033837, P1=28073599)...
Doing 5428 additions...
Doing merge on chunk 28/41 (P0=28073600, P1=29113362)...
Doing 4259 additions...
Doing merge on chunk 29/41 (P0=29113363, P1=30153125)...
Doing 3504 additions...
Doing merge on chunk 30/41 (P0=30153126, P1=31192888)...
Doing 2960 additions...
Doing merge on chunk 31/41 (P0=31192889, P1=32232651)...
Doing 2606 additions...
Doing merge on chunk 32/41 (P0=32232652, P1=33272414)...
Doing 2187 additions...
Doing merge on chunk 33/41 (P0=33272415, P1=34312177)...
Doing 1867 additions...
Doing merge on chunk 34/41 (P0=34312178, P1=35351940)...
Doing 1593 additions...
Doing merge on chunk 35/41 (P0=35351941, P1=36391703)...
Doing 1520 additions...
Doing merge on chunk 36/41 (P0=36391704, P1=37431466)...
Doing 1348 additions...
Doing merge on chunk 37/41 (P0=37431467, P1=38471229)...
Doing 1197 additions...
Doing merge on chunk 38/41 (P0=38471230, P1=39510992)...
Doing 1131 additions...
Doing merge on chunk 39/41 (P0=39510993, P1=40550755)...
Doing 977 additions...
Doing merge on chunk 40/41 (P0=40550756, P1=41590518)...
Doing 867 additions...
Doing merge on chunk 41/41 (P0=41590519, P1=42630223)...
Doing 818 additions...
* There are now 130856 full relations.
pass 32...
Before sortByNumLP()... Doing ll_verify(P)...
ll_verify() reports that 'P' appears to be intact.
makePass:
There are 130856 relations with 0 large primes.
There are 628469 relations with 1 large primes.
There are 7498707 relations with 2 large primes.
There are 9714844 relations with 3 large primes.
There are 6476826 relations with 4 large primes.
There are 1784065 relations with 5 large primes.
There are 0 relations with 6 large primes.
After sortByNumLP()... Doing ll_verify(P)...
ll_verify() reports that 'P' appears to be intact.
Deleting 117404 singleton large primes.
Deleting 4741 singleton large primes.
Deleting 2149 singleton large primes.
Deleting 1000 singleton large primes.
Deleting 463 singleton large primes.
Deleting 222 singleton large primes.
Deleting 110 singleton large primes.
Deleting 65 singleton large primes.
Deleting 30 singleton large primes.
Deleting 12 singleton large primes.
Deleting 3 singleton large primes.
Deleting 2 singleton large primes.
Deleting 2 singleton large primes.
Deleting 0 singleton large primes.
Total: 126203 singletons deleted.
makePass:
There are 130856 relations with 0 large primes.
There are 521504 relations with 1 large primes.
There are 7495042 relations with 2 large primes.
There are 9707658 relations with 3 large primes.
There are 6470500 relations with 4 large primes.
There are 1782004 relations with 5 large primes.
There are 0 relations with 6 large primes.
Doing merge on chunk 1/40 (P0=0, P1=1065755)...
Doing 299323 additions...
Doing merge on chunk 2/40 (P0=1065756, P1=2131512)...
Doing 91787 additions...
Doing merge on chunk 3/40 (P0=2131513, P1=3197269)...
Doing 24088 additions...
Doing merge on chunk 4/40 (P0=3197270, P1=4263026)...
Doing 14974 additions...
Doing merge on chunk 5/40 (P0=4263027, P1=5328783)...
Doing 11072 additions...
Doing merge on chunk 6/40 (P0=5328784, P1=6394540)...
Doing 7966 additions...
Doing merge on chunk 7/40 (P0=6394541, P1=7460297)...
Doing 6376 additions...
Doing merge on chunk 8/40 (P0=7460298, P1=8526054)...
Doing 5290 additions...
Doing merge on chunk 9/40 (P0=8526055, P1=9591811)...
Doing 4123 additions...
Doing merge on chunk 10/40 (P0=9591812, P1=10657568)...
Doing 3465 additions...
Doing merge on chunk 11/40 (P0=10657569, P1=11723325)...
Doing 2961 additions...
Doing merge on chunk 12/40 (P0=11723326, P1=12789082)...
Doing 2442 additions...
Doing merge on chunk 13/40 (P0=12789083, P1=13854839)...
Doing 2180 additions...
Doing merge on chunk 14/40 (P0=13854840, P1=14920596)...
Doing 1890 additions...
Doing merge on chunk 15/40 (P0=14920597, P1=15986353)...
Doing 1629 additions...
Doing merge on chunk 16/40 (P0=15986354, P1=17052110)...
Doing 1559 additions...
Doing merge on chunk 17/40 (P0=17052111, P1=18117867)...
Doing 1457 additions...
Doing merge on chunk 18/40 (P0=18117868, P1=19183624)...
Doing 1302 additions...
Doing merge on chunk 19/40 (P0=19183625, P1=20249381)...
Doing 1112 additions...
Doing merge on chunk 20/40 (P0=20249382, P1=21315138)...
Doing 1064 additions...
Doing merge on chunk 21/40 (P0=21315139, P1=22380895)...
Doing 60436 additions...
Doing merge on chunk 22/40 (P0=22380896, P1=23446652)...
Doing 33554 additions...
Doing merge on chunk 23/40 (P0=23446653, P1=24512409)...
Doing 18551 additions...
Doing merge on chunk 24/40 (P0=24512410, P1=25578166)...
Doing 11442 additions...
Doing merge on chunk 25/40 (P0=25578167, P1=26643923)...
Doing 8147 additions...
Doing merge on chunk 26/40 (P0=26643924, P1=27709680)...
Doing 5868 additions...
Doing merge on chunk 27/40 (P0=27709681, P1=28775437)...
Doing 4783 additions...
Doing merge on chunk 28/40 (P0=28775438, P1=29841194)...
Doing 3892 additions...
Doing merge on chunk 29/40 (P0=29841195, P1=30906951)...
Doing 3199 additions...
Doing merge on chunk 30/40 (P0=30906952, P1=31972708)...
Doing 2757 additions...
Doing merge on chunk 31/40 (P0=31972709, P1=33038465)...
Doing 2372 additions...
Doing merge on chunk 32/40 (P0=33038466, P1=34104222)...
Doing 1924 additions...
Doing merge on chunk 33/40 (P0=34104223, P1=35169979)...
Doing 1786 additions...
Doing merge on chunk 34/40 (P0=35169980, P1=36235736)...
Doing 1598 additions...
Doing merge on chunk 35/40 (P0=36235737, P1=37301493)...
Doing 1338 additions...
Doing merge on chunk 36/40 (P0=37301494, P1=38367250)...
Doing 1236 additions...
Doing merge on chunk 37/40 (P0=38367251, P1=39433007)...
Doing 1161 additions...
Doing merge on chunk 38/40 (P0=39433008, P1=40498764)...
Doing 1017 additions...
Doing merge on chunk 39/40 (P0=40498765, P1=41564521)...
Doing 945 additions...
Doing merge on chunk 40/40 (P0=41564522, P1=42630223)...
Doing 821 additions...
* There are now 135003 full relations.
pass 33...
Before sortByNumLP()... Doing ll_verify(P)...
ll_verify() reports that 'P' appears to be intact.
makePass:
There are 135003 relations with 0 large primes.
There are 638383 relations with 1 large primes.
There are 7603253 relations with 2 large primes.
There are 9671968 relations with 3 large primes.
There are 6325517 relations with 4 large primes.
There are 1706236 relations with 5 large primes.
There are 0 relations with 6 large primes.
After sortByNumLP()... Doing ll_verify(P)...
ll_verify() reports that 'P' appears to be intact.
Deleting 119095 singleton large primes.
Deleting 5523 singleton large primes.
Deleting 2580 singleton large primes.
Deleting 1162 singleton large primes.
Deleting 560 singleton large primes.
Deleting 250 singleton large primes.
Deleting 121 singleton large primes.
Deleting 54 singleton large primes.
Deleting 34 singleton large primes.
Deleting 12 singleton large primes.
Deleting 5 singleton large primes.
Deleting 1 singleton large primes.
Deleting 1 singleton large primes.
Deleting 0 singleton large primes.
Total: 129398 singletons deleted.
makePass:
There are 135003 relations with 0 large primes.
There are 531251 relations with 1 large primes.
There are 7598867 relations with 2 large primes.
There are 9663665 relations with 3 large primes.
There are 6318281 relations with 4 large primes.
There are 1703895 relations with 5 large primes.
There are 0 relations with 6 large primes.
Doing merge on chunk 1/40 (P0=0, P1=1065755)...
Doing 299332 additions...
Doing merge on chunk 2/40 (P0=1065756, P1=2131512)...
Doing 91165 additions...
ll_catFields(): memory reallocation error!
191888103 s32's requested.
Old size was L->maxDataSize=189664691
L->numFields = 25942739, maxShift = 2484826
L->index[25942739] = 189403267
numPairs=91165, numNewEntries=66

Program received signal SIGSEGV, Segmentation fault.
0x5564ac2e in free () from /lib32/libc.so.6

(gdb) where
#0 0x5564ac2e in free () from /lib32/libc.so.6
#1 0x757ed000 in ?? ()
#2 0x015fe000 in ?? ()
#3 0x5e98fc7c in ?? ()
#4 0x00000003 in ?? ()
#5 0x757ed008 in ?? ()
#6 0x0809226e in merge (R=0xffff83b0, P=0x813ca28,
revP=0xffff8200,
P0=1065756, P1=2131512, level=72) at combparts.c:424
#7 0x08092574 in makePass (R=0xffff83b0, P=0x813ca28)
at combparts.c:513
#8 0x08093518 in combParts (R=0xffff83b0, P=0x813ca28,
maxRelsInFF=48,
minFF=1938702) at combparts.c:870
#9 0x0804c85d in doRowOps3 (P=0xffff83dc,
R=0xffff83b0, prelF=0xffffb4a0,
maxRelsInFF=48) at matbuild.c:753
#10 0x0804c91c in getCols (colName=0x80946b5 "cols",
prelF=0xffffb4a0,
lpF=0xffffb450, FB=0xffffb4f0, minFull=1938702,
maxRelsInFF=48)
at matbuild.c:780
#11 0x0804e8ab in main (argC=5, args=0xffffbcc4) at
matbuild.c:1293

Discussion

  • Anton Korobeynikov

    • status: open --> open-accepted
     
  • Anton Korobeynikov

    • priority: 5 --> 6
    • assigned_to: nobody --> asl__
     
  • Anton Korobeynikov

    Logged In: YES
    user_id=1292828

    Yes. It's so. Current llist implementation performs much
    memory fragmentation. That's why, large set will often fail
    even if you have much available memory.

    1. Try to add malloc_trim(0) after realloc call (this can be
    easily added to lx_realloc).
    2. Use today's submitted matbuild-tpie. It will done
    everything one disk. But be prepared: it will be horrible
    slow comparing ot original matbuild ;)

     
  • Anton Korobeynikov

    • priority: 6 --> 9
     
  • Anton Korobeynikov

    • priority: 9 --> 7
     

Log in to post a comment.