#183 Threaded Overloaded operator method segfaults on collect()

None
open
Jafar
5
2017-05-27
2014-01-31
No

With 10 threads all running 99% in the same overloaded operator's method,
calling collect() in another overloaded operator's method in another thread segfaults.

Related

Bugs: #122
Bugs: #183
Bugs: #227

Discussion

1 2 > >> (Page 1 of 2)
  • Charles Evans

    Charles Evans - 2014-01-31

    test case stressOVLD.icn, a rough mashup of stress.icn and ovld.icn
    requires concurrent threads and operator overloading
    It intentionally avoids optimizations to spend more time in the methods.
    As of SVN on 20140130 it usually runs for a minute or 2 before segfaulting.

     
  • Charles Evans

    Charles Evans - 2014-02-01

    (on 4 core Athlon2)
    (once it ran a long time, dropped to 120% CPU, ran much longer, I killed it, restarted.)
    note: 3 runs shown here. best one is: run stressOVLD 10 1000000

    ( with #define OVLD 1 and #define Concurrent 1 and CFLAGS= -g -O0 ...)
    make Unicon
    $UNICON stressOVLD
    gdb $ICONX
    (gdb) run stressOVLD
    Starting program: /unisvn/140130/3657/unicon-code/trunk/unicon/bin/iconx stressOVLD
    [Thread debugging using libthread_db enabled]
    10 thread(s) will sum 100000000 ones.
    ...(output, starting 10 threads, 3 threads exited)
    ...(~2min run @ 300% CPU)
    Program received signal SIGSEGV, Segmentation fault.
    [Switching to Thread 0xa5e67b70 (LWP 19737)]
    markblock (dp=0xb7d4cef9) at rmemmgt.r:906
    906        else if ((unsigned int)BlkType(block) == T_Coexpr) {
    
    (gdb) backtrace
    #0  markblock (dp=0xb7d4cef9) at rmemmgt.r:906
    #1  0x080e55b6 in sweep (ce=0x9fff9008) at rmemmgt.r:1188
    #2  0x080e5029 in markblock (dp=0xb39cd0b0) at rmemmgt.r:919
    #3  0x080e527c in markblock (dp=0xbfffe4e0) at rmemmgt.r:1029
    #4  0x080e55b6 in sweep (ce=0xb2c48008) at rmemmgt.r:1188
    #5  0x080e5029 in markblock (dp=0xa5e66b54) at rmemmgt.r:919
    #6  0x080e5080 in markblock (dp=0xa7beb4a0) at rmemmgt.r:946
    #7  0x080e5790 in sweep_stk (ce=0xa7c3e008) at rmemmgt.r:1310
    #8  0x080e55d1 in sweep (ce=0xa7c3e008) at rmemmgt.r:1193
    #9  0x080e5029 in markblock (dp=0xa7c93054) at rmemmgt.r:919
    #10 0x080e527c in markblock (dp=0xa9426e70) at rmemmgt.r:1029
    #11 0x080e55b6 in sweep (ce=0xa9abe008) at rmemmgt.r:1188
    #12 0x080e5029 in markblock (dp=0xb2d36334) at rmemmgt.r:919
    #13 0x080e553f in markptr (ptr=0xb2d103b4) at rmemmgt.r:1166
    #14 0x080e54fb in markptr (ptr=0xb2d103a0) at rmemmgt.r:1155
    #15 0x080e5238 in markblock (dp=0xb2bf5470) at rmemmgt.r:1018
    #16 0x080e5790 in sweep_stk (ce=0xb2bf3008) at rmemmgt.r:1310
    #17 0x080e55d1 in sweep (ce=0xb2bf3008) at rmemmgt.r:1193
    #18 0x080e5029 in markblock (dp=0x81844fc) at rmemmgt.r:919
    #19 0x080e48e6 in markthread (tcp=0x8182208) at rmemmgt.r:691
    #20 0x080e4a77 in markprogram (pstate=0x815d160) at rmemmgt.r:720
    #21 0x080e44e9 in collect (region=0) at rmemmgt.r:540
    #22 0x08069581 in Zcollect (r_args=0xa8b80570) at fmisc.r:136
    #23 0x080a7bc7 in interp_0 (fsig=0, cargp=0x0) at interp.r:1322
    #24 0x080d55d6 in new_context (fsig=0, cargp=0x0) at rcoexpr.r:416
    #25 0x080d60d6 in nctramp (arg=0x819fa38) at rcoexpr.r:665
    #26 0xb7d47954 in start_thread () from /lib/i386-linux-gnu/libpthread.so.0
    #27 0xb786373e in clone () from /lib/i386-linux-gnu/libc.so.6
    
    ----
    (gdb) run stressOVLD 10 1000000 # only add 1M ones, very short run. 
    10 thread(s) will sum 1000000 ones.
    ...(output,started 10 threads, 3 threads exited.)
    Program received signal SIGSEGV, Segmentation fault.
    [Switching to Thread 0xa78c7b70 (LWP 7640)]
    markblock (dp=0xac2b3b64) at rmemmgt.r:906
    906        else if ((unsigned int)BlkType(block) == T_Coexpr) {
    (gdb) bt
    #0  markblock (dp=0xac2b3b64) at rmemmgt.r:906
    ...(snip nearly identical bt)
    #19 0xb786373e in clone () from /lib/i386-linux-gnu/libc.so.6
    
    ----
    (gdb) run stressOVLD 1000000 # try 1M threads
    Starting program: /unisvn/140130/3657/unicon-code/trunk/unicon/bin/iconx stressOVLD
    [Thread debugging using libthread_db enabled]
    1000000 thread(s) will sum 100000000 ones.
    ...(output, starts 8 threads, 1 thread exits.)
    record Op__state_202(3)
    Thread 2 is done
    [New Thread 0xab6b4b70 (LWP 4969)]
    [Thread 0xad614b70 (LWP 4967) exited]
    [New Thread 0xaaeb4b70 (LWP 4970)]
    [New Thread 0xa9fc9b70 (LWP 4971)]
    
    Program received signal SIGSEGV, Segmentation fault.
    [Switching to Thread 0xb2bf2b70 (LWP 4961)]
    0x080716ac in Zspawn (r_args=0xb2bf5598) at fmisc.r:2835
    2835             cp->status = Ts_Posix | Ts_Async;
    (gdb) bt
    #0  0x080716ac in Zspawn (r_args=0xb2bf5598) at fmisc.r:2835
    #1  0x080a7bc7 in interp_0 (fsig=5, cargp=0xb2bf5550) at interp.r:1322
    #2  0x080c9f7c in Obang (r_args=0xb2bf5550) at oref.r:46
    #3  0x080a69b3 in interp_0 (fsig=0, cargp=0x0) at interp.r:258
    #4  0x080d55d6 in new_context (fsig=0, cargp=0x0) at rcoexpr.r:416
    #5  0x080d60d6 in nctramp (arg=0x8182048) at rcoexpr.r:665
    #6  0xb7d47954 in start_thread () from /lib/i386-linux-gnu/libpthread.so.0
    #7  0xb786373e in clone () from /lib/i386-linux-gnu/libc.so.6
    
     
  • Charles Evans

    Charles Evans - 2014-02-01

    converting test from threaded to normal coexpressions, it works.

     
  • Charles Evans

    Charles Evans - 2014-02-01

    threaded, run stressOVLD 10 100000, 2 more runs, segfaults in interp.r, rstruct.r:

    Program received signal SIGSEGV, Segmentation fault.
    [Switching to Thread 0xa537bb70 (LWP 13712)]
    0x080a6b8b in interp_0 (fsig=5, cargp=0xa4493508) at interp.r:225
    225 fnum = ftabp[fieldnum * *records + rp2->recdesc->Proc.recnum - 1];
    (gdb) bt

    Program received signal SIGSEGV, Segmentation fault.
    [Switching to Thread 0xa30fab70 (LWP 16321)]
    0x080eb404 in memb (pb=0xa1b0a604, x=0xa1b0b6c8, hn=1100, res=0xa30f9a9c) at rstruct.r:671
    671 while ((pe = (struct b_selem )lp) != NULL && BlkType(pe) != T_Table) {

     
  • Jafar

    Jafar - 2014-02-02

    Charles,

    I will look into these issues in the next few days. FYI, when you have multiple threads running under gdb it helps to print the back-trace for all threads. "info threads" under gdb will tell you about the thread you are running. You can switch to any thread by doing "thread thread-id" where thread-id is a serial number given to threads by gdb. For example "thread 4" will switch to thread number 4. bt will print the backtrace of thread number 4 then. That is very useful because from my experience I found that the root pf many GC related problems is not in the thread doing the GC, but in a thread which was stopped at the "wrong" time. So knowing what other threads were doing before the GC helps a lot.

     
  • Jafar

    Jafar - 2014-02-03

    Found a bug in collect(). One of the calls to the c function collect() was unprotected. All threads must be suspended before calling collect(). While the fix is definitely needed and not specific to overloading, it only shifted the place where the segfault takes place. The new behavior suggests that the new bug is related to overloading. I haven't looked exactly at overloading code to understand what exactly we are doing, or if we are missing to tend some critical variables.

     
  • Charles Evans

    Charles Evans - 2014-02-05

    new test stressCollect.icn
    requires Concurrent Threads
    does no overloading, but
    deadlocks often if OVLD is defined.
    Works so far w/o OVLD

     
  • Clinton Jeffery

    Clinton Jeffery - 2017-01-12
    • assigned_to: Jafar
    • Group: -->
     
  • Clinton Jeffery

    Clinton Jeffery - 2017-01-12

    So what is the status on this?

     
  • Charles Evans

    Charles Evans - 2017-05-10

    depends on bug [#227]

     

    Related

    Bugs: #227

  • Charles Evans

    Charles Evans - 2017-05-16

    OpTbl is global, needs to be in
    curpstate.
    Testing a fix.

     
  • Charles Evans

    Charles Evans - 2017-05-16

    Remoevd all tended vars,
    no longer crashing.
    TODO: test coverage.

     
  • Charles Evans

    Charles Evans - 2017-05-16

    Help Wanted:
    in rmemmgt.c is:

    ifdef Concurrent

    markthreads();

    else / Concurrent /

    markthread(pstate->tstate);

    endif / Concurrent /

    but much code is using globals inside

    ifndef Concurrent

    e.g.
    struct descrip maps2;/ second cached argument of map /
    struct descrip maps3;/ third cached argument of map /

    endif / Concurrent /

    in init.r
    is curtstate becoming universal?

     
    • Jafar

      Jafar - 2017-05-16

      It's been universal since the inceptions of threads. It encapsulates data that needs to be thread-specific. the main thread has its own data regardless if we have concurrency or not.

      globals are not always "global" becasue h/rmacros.h redefine the globals to be process/thread specific.

       
  • Charles Evans

    Charles Evans - 2017-05-16

    AFAICT all curtstate macros in rmacros.h
    are inside
    #ifdef Concurrent
    #endif
    and all curpstate macros
    are inside
    #ifdef MultiThread
    #endif
    BUT some globals like map2
    do NOT have a curpstate macro,
    just a curtstate macro.

    I am hoping that the use case for
    MultiThread but not Concurrent
    will go away once OVLD is fixed,
    as I am unsure about this 3rd state,
    which would seem to require 2 sets of
    macros for thread state vars when
    there are no threads.

    Should I go ahead and commit code that
    is tested on Concurrent, but not on
    MultiThread without Concurrent?
    It should work unless you are
    monitoring an overloaded op (likely)
    on a platform without Concurrent (less likely)

     
  • Clinton Jeffery

    Clinton Jeffery - 2017-05-16

    MultiThread without Concurrent is not a rare or exotic state, it is the normal default if Concurrent is not enabled. MultiThread is poorly named, but that was Ralph Griswold's call. It is a virtual machine that can load multiple programs in and execute them synchronously, possibly with one monitoring another as in the case of udb loading and monitoring a program that is being debugged.

    Jafar's thinking is that we should use curtstate whether or not we have Concurrent, and one could similarly decide to use curpstate whether or not one enabled MultiThread. The trade-off here is that the extra layer of indirection in memory references costs performance. The amount is small enough that we could get away with it, but it is not like performance does not matter. If we have made (or do make) curtstate universal, we should eventually simplify ifdef's so that becomes clearer. But, I am glad that in some respects we have not been hasty.

    To complicate the discussion: COMPILER has traditionally not supported MultiThread, and has only recently somewhat supported Concurrent.

    You should probably not commit code that is not tested anywhere but Concurrent, you should probably e-mail it to Jafar and myself as an attachment, or devise some other means of testing e.g. that it does not break non-Concurrent, and does not break COMPILER.

    Cheers,
    Clint

     
  • Jafar

    Jafar - 2017-05-16

    The rational for making curtatste universal in a multi-threaded program is to reduce ifdefs and code duplication which we already have a ton of. We already pay the price of an indirect memory access in a multi-threaded application so we are just chaning were we are fetching it from.

    Send me the patch. I will review it asap.

     
  • Charles Evans

    Charles Evans - 2017-05-16

    If I understand, I should then put
    #ifdef OVLD
    #define ofound (curtstate->Ofound)
    #define fieldnum (curtstate->Fieldnum)
    #define lhs (curtstate->Lhs)
    #endif
    inside the

    ifdef MultiThread

    section of rmacros.h, rather than the

    ifdef Concurrent

    since Multiple programs need them?

     
  • Jafar

    Jafar - 2017-05-16

    Charles,

    I don't remember the implementation details of OVLD, but if the change should have a program scope then it probably should be in the program state (curpstate) rather than the thread state (curtstate). If you move it to curtstate it means every thread (in a concurrent program) will/shoukd have its own version of the OVLD functions.

    So, do OVLD functions have a program scope or a thread scope?

     
    Last edit: Jafar 2017-05-16
    • Clinton Jeffery

      Clinton Jeffery - 2017-05-16

      Here is some institutional memory. OVLD was added over ten years ago, prior
      to Jafar's arrival on the scene, by Sudarshan Gaikaiwari. As far as I know,
      it allows you to use objects as operands to various built-in operators,
      such as o1 + o2, which result in method calls to methods in the class of
      the left-hand operand, as in o1.add(o2) or some such. The method names like
      "add" get turned into field numbers in the generated code, so to implement
      o1+o2 there is an extra table added to the generated icode to map from
      virtual machine instructions for various operators onto record field names
      of corresponding methods. It means OVLD requires changes to icont and the
      icode version, not just rebuilding iconx with OVLD turned on.

      This overloading of operators onto method names was completed, but there
      were enough bugs and/or semantic difficulties that it was not accepted into
      the Unicon language canon as-is. It was Sudarshan's nature to build some
      brilliant code and then not persist with it to completion. OVLD code is
      still present, because it might still be accepted into the Unicon language
      if it gets finished enough and we have confidence of its robustness.

      From looking at the code now, I can say that it appears that global
      variables related to OVLD would mostly be tied to curpstate, not curtstate.
      For example the OpTab is per-program, not per-thread. That said, I would
      need to study it some more to tell whether it was thread-safe or would need
      further modification to work if Concurrent is turned on, since Concurrent
      did not exist at the time OVLD was written, and Concurrent was implemented
      without considering OVLD.

      Lastly, I will mention with interest that I am not spotting these variables
      you are specifically mentioning, Charles: ofound, fieldnum, and lhs sound
      familiar and all, but ofound and lhs do not appear in my code. fieldnum
      appears and it is a local in interp() and it is not clear that it should be
      moved into curpstate or curtstate. So maybe you are looking at different
      code than me?

      Clint

      On Tue, May 16, 2017 at 2:26 PM, Jafar jafaral@users.sf.net wrote:

      Charles,

      I don't remember the implementation details of OVLD, but if the change
      should have a program scope then it probably should be in the program state
      (curpstate) rather than the thread state (curtstate). If you move it to
      curtstate it means every thread (in a concurrent program) will/shoukd have
      its own version of the OVLD functions.

      So, do OVLD funtion have program scope or thread scope?

      Status: open
      Group:
      Labels: OVLD thread
      Created: Fri Jan 31, 2014 05:05 PM UTC by Charles Evans
      Last Updated: Tue May 16, 2017 05:19 PM UTC
      Owner: Jafar

      With 10 threads all running 99% in the same overloaded operator's method,
      calling collect() in another overloaded operator's method in another
      thread segfaults.


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/unicon/bugs/183/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       

      Related

      Bugs: #183

  • Charles Evans

    Charles Evans - 2017-05-16

    The vars that were tended are not vital;
    I assume they were for possible
    future program monitoring or debuggging.
    They are certainly thread scope,
    but they should exist even if NoConcurrent. They can't be global,
    or program monitoring will clobber them.

     
  • Charles Evans

    Charles Evans - 2017-05-16

    If we don't care about a NoConcurrent build
    clobbering the unused monitoring vars,
    We are good to go, I can commit a fix.

     
  • Charles Evans

    Charles Evans - 2017-05-16

    Thanks for the feedback!
    I neglected to mention that in fixing
    the OVLD code to be thread-safe,
    I had to rename his tracking vars that collided with other code to put them in curtstate:
    found is now ofound, x is now lhs.
    OpTbl is now in curpstate.
    OVLD now works in the Concurrent build.
    and all thread tests now pass.
    I quoted from working code
    that I have not committed as I did not
    see a practical way to make his tracking code work in all 3 builds: old plain, Multiple Programs, and Concurrent.
    I expect that they will all build,
    but I expect his tracking vars to get
    trashed, so I reset them before each GC.
    OVLD works fine without them.
    I hope everyone will hammer on OVLD - we do need more tests, hopefully even some
    program monitoring that I have not
    tested at all.
    If anything breaks,
    I may just move all his formerly tended tracking vars inside his
    OVLD_DEBUG #ifdefs

    There is more to do to improve performance and reduce code size,
    as time permits.

     
  • Charles Evans

    Charles Evans - 2017-05-17

    fieldnum can be left a local, as
    he had it.
    He saved x, now lhs, that is not used.
    It looks useful for debug, but
    without fieldnum, it it inconclusive.
    I moved fieldnum to curtstate to
    complete the debug picture.

     
  • Charles Evans

    Charles Evans - 2017-05-17

    To clarify my earlier question,
    the existing code is clear in
    the common cases, Concurrent
    and Plain (NoConcurrent, No Multithread)
    The now uncommon case of
    NoConcurrent + MultiThread
    is still a puzzle to me.
    curpstate is clear, but
    where do thread vars go if NoConcurrent?

    NoC+MT was used for OVLD heretofore,
    but OVLD did not really work with a real
    MultiThread. It now should work even
    here, but his formerly tended vars,
    which did not actually need to be saved,
    will be reset by a thread switch in this
    rare case (Concurrent is fine).
    Jafar mentioned that curtstate now
    exists even in NoConcurrent MultiThread.
    This is essential, as he said,
    to limit complexity,
    but I am unsure how to use it properly
    in this specific build state.
    Do I still put thread vars in curtstate?

    I can commit working code now,
    if you like, and if anyone needs
    NoC+MT+OVLD we can discuss this then.
    Thanks for you help.

     
1 2 > >> (Page 1 of 2)

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks