#127 R backend hangs using 100% when after trying to stop operation

CRASH
open
nobody
NEEDINFO (2)
5
2015-02-26
2013-04-10
No

Sometimes, when I try to cancel an operation, rkward.rbackend hangs eating all CPU. I'm not sure how to reproduce it. Below are the stacktraces of rkward.rbackend and rkward.frontend during the hang.

This is with RKWard 0.6.0z+0.6.1+pre1 on Fedora 18 64-bit.

rkward.rbackend:
(gdb) t a a bt

Thread 2 (Thread 0x7f0b2d874700 (LWP 10719)):
#0  0x0000003e820011b5 in tls_get_addr_tail () from /lib64/ld-linux-x86-64.so.2
#1  0x0000003e8f87a66f in get_thread_data () at thread/qthread_unix.cpp:182
#2  QThreadData::current () at thread/qthread_unix.cpp:208
#3  0x0000003e8f878519 in QThread::currentThread () at thread/qthread.cpp:401
#4  0x0000000000416479 in RKRBackendProtocolBackend::inRThread ()
    at /home/milan/Dev/rkward/rkward/rkward/rbackend/rkrbackendprotocol_backend.cpp:213
#5  0x000000000040e8f1 in RKRBackend::tryToDoEmergencySave ()
    at /home/milan/Dev/rkward/rkward/rkward/rbackend/rkrbackend.cpp:552
#6  0x0000000000411c15 in RKSignalSupportPrivate::signal_proxy (signum=11, 
    info=0x7f0b2d8737b0, context=0x7f0b2d873680)
    at /home/milan/Dev/rkward/rkward/rkward/rbackend/rksignalsupport.cpp:72
#7  <signal handler called>
#8  0x00007f0b3563c773 in g_main_context_check (context=context@entry=
    0x7f0b280009a0, max_priority=2147483647, fds=fds@entry=0x7f0b2801c0a0, 
    n_fds=n_fds@entry=2) at gmain.c:3144
#9  0x00007f0b3563ccc2 in g_main_context_iterate (context=context@entry=
    0x7f0b280009a0, block=block@entry=1, dispatch=dispatch@entry=1, 
    self=<optimized out>) at gmain.c:3287
#10 0x00007f0b3563ce44 in g_main_context_iteration (context=0x7f0b280009a0, 
    may_block=1) at gmain.c:3351
#11 0x0000003e8f9a6106 in QEventDispatcherGlib::processEvents (this=
    0x7f0b280008c0, flags=...) at kernel/qeventdispatcher_glib.cpp:424
#12 0x0000003e8f97680f in QEventLoop::processEvents (this=this@entry=
    0x7f0b2d873ca0, flags=...) at kernel/qeventloop.cpp:149
#13 0x0000003e8f976a98 in QEventLoop::exec (this=0x7f0b2d873ca0, flags=...)
    at kernel/qeventloop.cpp:204
#14 0x0000003e8f878980 in QThread::exec (this=<optimized out>)
    at thread/qthread.cpp:542
#15 0x00000000004184c6 in RKRBackendTransmitter::run (this=0x7fff04c7fcf0)
    at /home/milan/Dev/rkward/rkward/rkward/rbackend/rkbackendtransmitter.cpp:69
#16 0x0000003e8f87b95c in QThreadPrivate::start (arg=0x7fff04c7fcf0)
    at thread/qthread_unix.cpp:338
#17 0x0000003e82c07d15 in start_thread (arg=0x7f0b2d874700)
    at pthread_create.c:308
#18 0x0000003e824f248d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f0b33e8f880 (LWP 10717)):
#0  0x0000003e824eebf9 in syscall () from /lib64/libc.so.6
#1  0x0000003e8f87a404 in _q_futex (addr2=0x0, val2=0, timeout=0x0, val=2, op=
    0, addr=0x24bc080) at thread/qmutex_unix.cpp:99
#2  QMutexPrivate::wait (this=0x24bc080, timeout=<optimized out>)
    at thread/qmutex_unix.cpp:113
#3  0x0000003e8f876615 in QMutex::lockInternal (this=<optimized out>)
    at thread/qmutex.cpp:450
#4  0x000000000041612d in RKDebugMessageOutput (type=QtDebugMsg, msg=
    0x25302e0 "Assert 'false' failed at /home/milan/Dev/rkward/rkward/rkward/rbackend/rkrbackend.cpp - function tryToDoEmergencySave line 572")
    at /home/milan/Dev/rkward/rkward/rkward/rbackend/rkrbackendprotocol_backend.cpp:93
#5  0x000000000041624d in RKDebug (flags=<optimized out>, 
    level=<optimized out>, fmt=<optimized out>)
    at /home/milan/Dev/rkward/rkward/rkward/rbackend/rkrbackendprotocol_backend.cpp:115
#6  0x000000000040ea04 in RKRBackend::tryToDoEmergencySave ()
    at /home/milan/Dev/rkward/rkward/rkward/rbackend/rkrbackend.cpp:572
#7  0x0000000000411c15 in RKSignalSupportPrivate::signal_proxy (signum=11, 
    info=0x2532530, context=0x2532400)
    at /home/milan/Dev/rkward/rkward/rkward/rbackend/rksignalsupport.cpp:72
#8  <signal handler called>
#9  0x0000003e8254cb83 in __memmove_ssse3_back () from /lib64/libc.so.6
#10 0x0000003e8f89983a in memmove (__len=18446744073709551608, 
    __src=<optimized out>, __dest=<optimized out>)
    at /usr/include/bits/string3.h:57
#11 QListData::remove (this=this@entry=0x24db5a8, i=<optimized out>, i@entry=
    -1) at tools/qlist.cpp:343
#12 0x0000003e8f899b2c in QListData::erase (this=0x24db5a8, xi=<optimized out>)
    at tools/qlist.cpp:412
#13 0x0000000000409381 in erase (it=..., this=0x24db5a8)
    at /usr/include/QtCore/qlist.h:466
#14 removeLast (this=0x24db5a8) at /usr/include/QtCore/qlist.h:287
#15 pop_back (this=0x24db5a8) at /usr/include/QtCore/qlist.h:303
#16 RKRBackend::commandFinished (this=0x24db580, 
    check_object_updates_needed=check_object_updates_needed@entry=true)
    at /home/milan/Dev/rkward/rkward/rkward/rbackend/rkrbackend.cpp:1419
#17 0x000000000040f576 in RReadConsole (prompt=0x250d1a0 "\002", buf=
    0x7fff04c7eb7c "", buflen=4096, hist=1)
    at /home/milan/Dev/rkward/rkward/rkward/rbackend/rkrbackend.cpp:365
#18 0x0000003e849131e2 in Rf_ReplIteration () from /usr/lib64/R/lib/libR.so
#19 0x0000003e84913580 in R_ReplConsole () from /usr/lib64/R/lib/libR.so
#20 0x0000003e84913aaa in run_Rmainloop () from /usr/lib64/R/lib/libR.so
#21 0x000000000040f29b in RKRBackend::run (this=0x24db580, locale_dir=...)
    at /home/milan/Dev/rkward/rkward/rkward/rbackend/rkrbackend.cpp:1385
#22 0x0000000000407830 in main (argc=5, argv=<optimized out>)
    at /home/milan/Dev/rkward/rkward/rkward/rbackend/rkrbackendprotocol_backend.cpp:159

rkward.frontend trace:

(gdb) t a a bt

Thread 6 (Thread 0x7f01f6a6b700 (LWP 10715)):
#0  0x0000003e824e99ad in poll () from /lib64/libc.so.6
#1  0x0000003e84847d24 in g_main_context_poll (priority=2147483647, n_fds=3, 
    fds=0x7f01f0012b80, timeout=-1, context=0x7f01f00009a0) at gmain.c:3584
#2  g_main_context_iterate (context=context@entry=0x7f01f00009a0, 
    block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>)
    at gmain.c:3285
#3  0x0000003e84847e44 in g_main_context_iteration (context=0x7f01f00009a0, 
    may_block=1) at gmain.c:3351
#4  0x0000003e8f9a6126 in QEventDispatcherGlib::processEvents (this=
    0x7f01f00008c0, flags=...) at kernel/qeventdispatcher_glib.cpp:426
#5  0x0000003e8f97680f in QEventLoop::processEvents (this=this@entry=
    0x7f01f6a6ac10, flags=...) at kernel/qeventloop.cpp:149
#6  0x0000003e8f976a98 in QEventLoop::exec (this=0x7f01f6a6ac10, flags=...)
    at kernel/qeventloop.cpp:204
#7  0x0000003e8f878980 in QThread::exec (this=<optimized out>)
    at thread/qthread.cpp:542
#8  0x0000000000530f73 in RKFrontendTransmitter::run (this=0x25e7610)
    at /home/milan/Dev/rkward/rkward/rkward/rbackend/rkfrontendtransmitter.cpp:88
#9  0x0000003e8f87b95c in QThreadPrivate::start (arg=0x25e7610)
    at thread/qthread_unix.cpp:338
#10 0x0000003e82c07d15 in start_thread (arg=0x7f01f6a6b700)
    at pthread_create.c:308
#11 0x0000003e824f248d in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f01f626a700 (LWP 10716)):
#0  0x0000003e824e99ad in poll () from /lib64/libc.so.6
#1  0x0000003e8f9a3480 in poll (__timeout=-1, __nfds=1, __fds=0x7f01f6269d00)
    at /usr/include/bits/poll2.h:46
#2  qt_safe_poll (fds=0x7f01f6269d00, nfds=1, timeout_ms=<optimized out>, 
    retry_eintr=false) at kernel/qcore_unix.cpp:121
#3  0x0000003e8f955540 in QProcessManager::run (this=
    0x3e8fcd5520 <processManager()::processManager>)
    at io/qprocess_unix.cpp:238
#4  0x0000003e8f87b95c in QThreadPrivate::start (arg=
    0x3e8fcd5520 <processManager()::processManager>)
    at thread/qthread_unix.cpp:338
#5  0x0000003e82c07d15 in start_thread (arg=0x7f01f626a700)
    at pthread_create.c:308
#6  0x0000003e824f248d in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f01f4ade700 (LWP 10720)):

#0  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:165
#1  0x0000003e9a1868c7 in QTWTF::TCMalloc_PageHeap::scavengerThread (this=
    0x3e9a494660 <QTWTF::pageheap_memory>)
    at ../3rdparty/javascriptcore/JavaScriptCore/wtf/FastMalloc.cpp:2359
#2  0x0000003e9a1868f9 in QTWTF::TCMalloc_PageHeap::runScavengerThread (
    context=<optimized out>)
    at ../3rdparty/javascriptcore/JavaScriptCore/wtf/FastMalloc.cpp:1464
#3  0x0000003e82c07d15 in start_thread (arg=0x7f01f4ade700)
    at pthread_create.c:308
#4  0x0000003e824f248d in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f0160857700 (LWP 10734)):
#0  0x0000003e824e99ad in poll () from /lib64/libc.so.6
#1  0x0000003e84847d24 in g_main_context_poll (priority=2147483647, n_fds=3, 
    fds=0x7f01580010c0, timeout=-1, context=0x3320680) at gmain.c:3584
#2  g_main_context_iterate (context=0x3320680, block=block@entry=1, 
    dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3285
#3  0x0000003e84848182 in g_main_loop_run (loop=0x3320610) at gmain.c:3484
#4  0x00007f0201964546 in gdbus_shared_thread_func (user_data=0x3320650)
    at gdbusprivate.c:277
#5  0x0000003e8486b605 in g_thread_proxy (data=0x3311ed0) at gthread.c:797
#6  0x0000003e82c07d15 in start_thread (arg=0x7f0160857700)
    at pthread_create.c:308
#7  0x0000003e824f248d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f01ecb6f700 (LWP 10827)):
#0  0x0000003e824e99ad in poll () from /lib64/libc.so.6
#1  0x0000003e84847d24 in g_main_context_poll (priority=2147483647, n_fds=2, 
    fds=0x7f01640012c0, timeout=-1, context=0x7f0164001700) at gmain.c:3584
#2  g_main_context_iterate (context=context@entry=0x7f0164001700, 
    block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>)
    at gmain.c:3285
#3  0x0000003e84847e44 in g_main_context_iteration (context=0x7f0164001700, 
    may_block=1) at gmain.c:3351
#4  0x0000003e8f9a6126 in QEventDispatcherGlib::processEvents (this=
    0x7f016400e320, flags=...) at kernel/qeventdispatcher_glib.cpp:426
#5  0x0000003e8f97680f in QEventLoop::processEvents (this=this@entry=
    0x7f01ecb6ecd0, flags=...) at kernel/qeventloop.cpp:149
#6  0x0000003e8f976a98 in QEventLoop::exec (this=0x7f01ecb6ecd0, flags=...)
    at kernel/qeventloop.cpp:204
#7  0x0000003e8f878980 in QThread::exec (this=<optimized out>)
    at thread/qthread.cpp:542
#8  0x0000003e8f95703f in QInotifyFileSystemWatcherEngine::run (this=0x2f33470)
    at io/qfilesystemwatcher_inotify.cpp:256
#9  0x0000003e8f87b95c in QThreadPrivate::start (arg=0x2f33470)
    at thread/qthread_unix.cpp:338
#10 0x0000003e82c07d15 in start_thread (arg=0x7f01ecb6f700)
    at pthread_create.c:308
#11 0x0000003e824f248d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f020a17f880 (LWP 10696)):
#0  0x0000003e824e99ad in poll () from /lib64/libc.so.6
#1  0x0000003e84847d24 in g_main_context_poll (priority=2147483647, n_fds=11, 
    fds=0x38066a0, timeout=999, context=0x229dd20) at gmain.c:3584
#2  g_main_context_iterate (context=context@entry=0x229dd20, block=block@entry=
    1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3285
#3  0x0000003e84847e44 in g_main_context_iteration (context=0x229dd20, 
    may_block=1) at gmain.c:3351
#4  0x0000003e8f9a6106 in QEventDispatcherGlib::processEvents (this=0x229a9c0, 
    flags=...) at kernel/qeventdispatcher_glib.cpp:424
#5  0x0000003e9266a73e in QGuiEventDispatcherGlib::processEvents (
    this=<optimized out>, flags=...) at kernel/qguieventdispatcher_glib.cpp:207
#6  0x0000003e8f97680f in QEventLoop::processEvents (this=this@entry=
    0x7fff9b510920, flags=...) at kernel/qeventloop.cpp:149
#7  0x0000003e8f976a98 in QEventLoop::exec (this=0x7fff9b510920, flags=...)
    at kernel/qeventloop.cpp:204
#8  0x0000003e8f97b888 in QCoreApplication::exec ()
    at kernel/qcoreapplication.cpp:1218
#9  0x00000000004356db in main (argc=<optimized out>, argv=0x7fff9b510c28)
    at /home/milan/Dev/rkward/rkward/rkward/main.cpp:194

Discussion

    • labels: --> NEEDINFO
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -2,6 +2,7 @@
    
     This is with RKWard 0.6.0z+0.6.1+pre1 on Fedora 18 64-bit.
    
    +~~~~~
     rkward.rbackend:
     (gdb) t a a bt
    
    @@ -207,3 +208,4 @@
         at kernel/qcoreapplication.cpp:1218
     #9  0x00000000004356db in main (argc=<optimized out>, argv=0x7fff9b510c28)
         at /home/milan/Dev/rkward/rkward/rkward/main.cpp:194
    +~~~~~
    
     
  • Hi!

    As usual, thanks for reporting! Whatever is going on, here, at the point of the backtrace, a SIGSEGV has been raised in the backend, and then we do a poor job at handling that. However, the root problem must have occurred some time earlier, and it is not directly visible in the backtrace.

    Conceivably this might be the same problem that I have fixed in trunk around a week ago (how would that be for coincidental timing?). On my machine this had different symptoms, but would also lead to a SIGSEGV when canceling a command under some conditions, also from RKRBackend::commandFinished() (thread 1, #16 in the backend's bt).

    Thus, please let me know, if you ever see this with a recent checkout from trunk (revision 4687 or later). Of course, if you do find a way to reproduce the problem, that would be useful, too.

    Can you give an estimate on how often you see this hang (percentage of canceled operations)?