#183 Threaded Overloaded operator method segfaults on collect()

open
nobody
5
2014-02-05
2014-01-31
Charles Evans
No

With 10 threads all running 99% in the same overloaded operator's method,
calling collect() in another overloaded operator's method in another thread segfaults.

Discussion

  • Charles Evans
    Charles Evans
    2014-01-31

    test case stressOVLD.icn, a rough mashup of stress.icn and ovld.icn
    requires concurrent threads and operator overloading
    It intentionally avoids optimizations to spend more time in the methods.
    As of SVN on 20140130 it usually runs for a minute or 2 before segfaulting.

     
    Attachments
  • Charles Evans
    Charles Evans
    2014-02-01

    (on 4 core Athlon2)
    (once it ran a long time, dropped to 120% CPU, ran much longer, I killed it, restarted.)
    note: 3 runs shown here. best one is: run stressOVLD 10 1000000
    :::gdb
    ( with #define OVLD 1 and #define Concurrent 1 and CFLAGS= -g -O0 ...)
    make Unicon
    $UNICON stressOVLD
    gdb $ICONX
    (gdb) run stressOVLD
    Starting program: /unisvn/140130/3657/unicon-code/trunk/unicon/bin/iconx stressOVLD
    [Thread debugging using libthread_db enabled]
    10 thread(s) will sum 100000000 ones.
    ...(output, starting 10 threads, 3 threads exited)
    ...(~2min run @ 300% CPU)
    Program received signal SIGSEGV, Segmentation fault.
    [Switching to Thread 0xa5e67b70 (LWP 19737)]
    markblock (dp=0xb7d4cef9) at rmemmgt.r:906
    906 else if ((unsigned int)BlkType(block) == T_Coexpr) {

    (gdb) backtrace
    #0 markblock (dp=0xb7d4cef9) at rmemmgt.r:906
    #1 0x080e55b6 in sweep (ce=0x9fff9008) at rmemmgt.r:1188
    #2 0x080e5029 in markblock (dp=0xb39cd0b0) at rmemmgt.r:919
    #3 0x080e527c in markblock (dp=0xbfffe4e0) at rmemmgt.r:1029
    #4 0x080e55b6 in sweep (ce=0xb2c48008) at rmemmgt.r:1188
    #5 0x080e5029 in markblock (dp=0xa5e66b54) at rmemmgt.r:919
    #6 0x080e5080 in markblock (dp=0xa7beb4a0) at rmemmgt.r:946
    #7 0x080e5790 in sweep_stk (ce=0xa7c3e008) at rmemmgt.r:1310
    #8 0x080e55d1 in sweep (ce=0xa7c3e008) at rmemmgt.r:1193
    #9 0x080e5029 in markblock (dp=0xa7c93054) at rmemmgt.r:919
    #10 0x080e527c in markblock (dp=0xa9426e70) at rmemmgt.r:1029
    #11 0x080e55b6 in sweep (ce=0xa9abe008) at rmemmgt.r:1188
    #12 0x080e5029 in markblock (dp=0xb2d36334) at rmemmgt.r:919
    #13 0x080e553f in markptr (ptr=0xb2d103b4) at rmemmgt.r:1166
    #14 0x080e54fb in markptr (ptr=0xb2d103a0) at rmemmgt.r:1155
    #15 0x080e5238 in markblock (dp=0xb2bf5470) at rmemmgt.r:1018
    #16 0x080e5790 in sweep_stk (ce=0xb2bf3008) at rmemmgt.r:1310
    #17 0x080e55d1 in sweep (ce=0xb2bf3008) at rmemmgt.r:1193
    #18 0x080e5029 in markblock (dp=0x81844fc) at rmemmgt.r:919
    #19 0x080e48e6 in markthread (tcp=0x8182208) at rmemmgt.r:691
    #20 0x080e4a77 in markprogram (pstate=0x815d160) at rmemmgt.r:720
    #21 0x080e44e9 in collect (region=0) at rmemmgt.r:540
    #22 0x08069581 in Zcollect (r_args=0xa8b80570) at fmisc.r:136
    #23 0x080a7bc7 in interp_0 (fsig=0, cargp=0x0) at interp.r:1322
    #24 0x080d55d6 in new_context (fsig=0, cargp=0x0) at rcoexpr.r:416
    #25 0x080d60d6 in nctramp (arg=0x819fa38) at rcoexpr.r:665
    #26 0xb7d47954 in start_thread () from /lib/i386-linux-gnu/libpthread.so.0
    #27 0xb786373e in clone () from /lib/i386-linux-gnu/libc.so.6

    ----
    (gdb) run stressOVLD 10 1000000 # only add 1M ones, very short run.
    10 thread(s) will sum 1000000 ones.
    ...(output,started 10 threads, 3 threads exited.)
    Program received signal SIGSEGV, Segmentation fault.
    [Switching to Thread 0xa78c7b70 (LWP 7640)]
    markblock (dp=0xac2b3b64) at rmemmgt.r:906
    906 else if ((unsigned int)BlkType(block) == T_Coexpr) {
    (gdb) bt
    #0 markblock (dp=0xac2b3b64) at rmemmgt.r:906
    ...(snip nearly identical bt)
    #19 0xb786373e in clone () from /lib/i386-linux-gnu/libc.so.6

    ----
    (gdb) run stressOVLD 1000000 # try 1M threads
    Starting program: /unisvn/140130/3657/unicon-code/trunk/unicon/bin/iconx stressOVLD
    [Thread debugging using libthread_db enabled]
    1000000 thread(s) will sum 100000000 ones.
    ...(output, starts 8 threads, 1 thread exits.)
    record Op__state_202(3)
    Thread 2 is done
    [New Thread 0xab6b4b70 (LWP 4969)]
    [Thread 0xad614b70 (LWP 4967) exited]
    [New Thread 0xaaeb4b70 (LWP 4970)]
    [New Thread 0xa9fc9b70 (LWP 4971)]

    Program received signal SIGSEGV, Segmentation fault.
    [Switching to Thread 0xb2bf2b70 (LWP 4961)]
    0x080716ac in Zspawn (r_args=0xb2bf5598) at fmisc.r:2835
    2835 cp->status = Ts_Posix | Ts_Async;
    (gdb) bt
    #0 0x080716ac in Zspawn (r_args=0xb2bf5598) at fmisc.r:2835
    #1 0x080a7bc7 in interp_0 (fsig=5, cargp=0xb2bf5550) at interp.r:1322
    #2 0x080c9f7c in Obang (r_args=0xb2bf5550) at oref.r:46
    #3 0x080a69b3 in interp_0 (fsig=0, cargp=0x0) at interp.r:258
    #4 0x080d55d6 in new_context (fsig=0, cargp=0x0) at rcoexpr.r:416
    #5 0x080d60d6 in nctramp (arg=0x8182048) at rcoexpr.r:665
    #6 0xb7d47954 in start_thread () from /lib/i386-linux-gnu/libpthread.so.0
    #7 0xb786373e in clone () from /lib/i386-linux-gnu/libc.so.6

     
  • Charles Evans
    Charles Evans
    2014-02-01

    converting test from threaded to normal coexpressions, it works.

     
    Attachments
  • Charles Evans
    Charles Evans
    2014-02-01

    threaded, run stressOVLD 10 100000, 2 more runs, segfaults in interp.r, rstruct.r:

    Program received signal SIGSEGV, Segmentation fault.
    [Switching to Thread 0xa537bb70 (LWP 13712)]
    0x080a6b8b in interp_0 (fsig=5, cargp=0xa4493508) at interp.r:225
    225 fnum = ftabp[fieldnum * *records + rp2->recdesc->Proc.recnum - 1];
    (gdb) bt

    Program received signal SIGSEGV, Segmentation fault.
    [Switching to Thread 0xa30fab70 (LWP 16321)]
    0x080eb404 in memb (pb=0xa1b0a604, x=0xa1b0b6c8, hn=1100, res=0xa30f9a9c) at rstruct.r:671
    671 while ((pe = (struct b_selem )lp) != NULL && BlkType(pe) != T_Table) {

     
  • Jafar
    Jafar
    2014-02-02

    Charles,

    I will look into these issues in the next few days. FYI, when you have multiple threads running under gdb it helps to print the back-trace for all threads. "info threads" under gdb will tell you about the thread you are running. You can switch to any thread by doing "thread thread-id" where thread-id is a serial number given to threads by gdb. For example "thread 4" will switch to thread number 4. bt will print the backtrace of thread number 4 then. That is very useful because from my experience I found that the root pf many GC related problems is not in the thread doing the GC, but in a thread which was stopped at the "wrong" time. So knowing what other threads were doing before the GC helps a lot.

     
  • Jafar
    Jafar
    2014-02-03

    Found a bug in collect(). One of the calls to the c function collect() was unprotected. All threads must be suspended before calling collect(). While the fix is definitely needed and not specific to overloading, it only shifted the place where the segfault takes place. The new behavior suggests that the new bug is related to overloading. I haven't looked exactly at overloading code to understand what exactly we are doing, or if we are missing to tend some critical variables.

     
  • Charles Evans
    Charles Evans
    2014-02-05

    new test stressCollect.icn
    requires Concurrent Threads
    does no overloading, but
    deadlocks often if OVLD is defined.
    Works so far w/o OVLD

     
    Attachments