Menu

#814 Segmentation fault when reading attribute

closed
nobody
Tango9.2.0 (2)
C++ API
5
2017-01-06
2016-08-16
No

Another segmentation fault we found while upgrading to TANGO 9:

If a device reads an attribute from a device in the same server (i.e. the same process) and from within a thread, the server segfaults. It does not happen for devices in other servers, nor if the read is not done from a thread.

The problem is easy to reproduce and has been observed in both a python device and a C++ device (included), so the issue should lie in libtango.

We are using TANGO 9.2.2. We have not yet tried to reproduce with older v9 versions but this behavior was not present in v8.

Thread 10 "ThreadReadSegfa" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffeb7fe700 (LWP 4848)]
0x00007ffff6b33c22 in omni_thread::get_value(unsigned int) () from /usr/lib/libomnithread.so.3
(gdb) bt
#0  0x00007ffff6b33c22 in omni_thread::get_value(unsigned int) () from /usr/lib/libomnithread.so.3
#1  0x00007ffff78b3138 in Tango::BlackBox::insert_attr (this=0x69b490, names=..., cl_id=..., vers=vers@entry=5, 
    sour=sour@entry=Tango::CACHE_DEV) at blackbox.cpp:643
#2  0x00007ffff7916447 in Tango::Device_5Impl::read_attributes_5 (this=0x69db60, names=..., source=Tango::CACHE_DEV, 
    cl_id=...) at device_5.cpp:114
#3  0x00007ffff7aadcfb in _0RL_lcfn_6fe2f94a21a10053_84000000 (cd=0x7fffeb7fd080, svnt=<optimized out>)
    at tangoSK.cpp:6272
#4  0x00007ffff6e1a94f in omni::omniOrbPOA::dispatch(omniCallDescriptor&, omniLocalIdentity*) ()
   from /usr/lib/libomniORB4.so.1
#5  0x00007ffff6dff939 in omniLocalIdentity::dispatch(omniCallDescriptor&) () from /usr/lib/libomniORB4.so.1
#6  0x00007ffff6e0da65 in omniObjRef::_invoke(omniCallDescriptor&, bool) () from /usr/lib/libomniORB4.so.1
#7  0x00007ffff7aadfbf in Tango::_objref_Device_5::read_attributes_5 (this=this@entry=0x7fffd8000d10, names=..., 
    source=<optimized out>, cl_ident=...) at tangoSK.cpp:6298
#8  0x00007ffff779a6d0 in Tango::DeviceProxy::read_attribute (this=0x7fffeb7fda40, attr_string="something")
    at devapi_base.cpp:5592
#9  0x000000000040dbf4 in Tango::DeviceProxy::read_attribute (this=0x7fffeb7fda40, att_name=0x41bc08 "something")
    at /usr/local/include/tango/DeviceProxy.h:665
#10 0x000000000040d21d in ThreadReadSegfaultTest_ns::ThreadReadSegfaultTest::_read_attribute (this=0x69db60)
    at ThreadReadSegfaultTest.cpp:387
#11 0x00000000004129f5 in std::_Mem_fn_base<void (ThreadReadSegfaultTest_ns::ThreadReadSegfaultTest::*)(), true>::operator()<, void>(ThreadReadSegfaultTest_ns::ThreadReadSegfaultTest*) const (this=0x7fffd4005418, __object=0x69db60)
    at /usr/include/c++/5/functional:600
#12 0x0000000000412949 in std::_Bind<std::_Mem_fn<void (ThreadReadSegfaultTest_ns::ThreadReadSegfaultTest::*)()> (ThreadReadSegfaultTest_ns::ThreadReadSegfaultTest*)>::__call<void, , 0ul>(std::tuple<>&&, std::_Index_tuple<0ul>) (
    this=0x7fffd4005418, 
    __args=<unknown type in /home/johfor/DeviceServers/ThreadReadSegfaultTest, CU 0x0, DIE 0x3d8b0>)
    at /usr/include/c++/5/functional:1074
#13 0x0000000000412899 in std::_Bind<std::_Mem_fn<void (ThreadReadSegfaultTest_ns::ThreadReadSegfaultTest::*)()> (ThreadReadSegfaultTest_ns::ThreadReadSegfaultTest*)>::operator()<, void>() (this=0x7fffd4005418)
    at /usr/include/c++/5/functional:1133
#14 0x000000000041285e in std::_Bind_simple<std::_Bind<std::_Mem_fn<void (ThreadReadSegfaultTest_ns::ThreadReadSegfaultTest::*)()> (ThreadReadSegfaultTest_ns::ThreadReadSegfaultTest*)> ()>::_M_invoke<>(std::_Index_tuple<>) (
    this=0x7fffd4005418) at /usr/include/c++/5/functional:1531
#15 0x00000000004127b4 in std::_Bind_simple<std::_Bind<std::_Mem_fn<void (ThreadReadSegfaultTest_ns::ThreadReadSegfaultTest::*)()> (ThreadReadSegfaultTest_ns::ThreadReadSegfaultTest*)> ()>::operator()() (this=0x7fffd4005418)
    at /usr/include/c++/5/functional:1520
#16 0x0000000000412744 in std::thread::_Impl<std::_Bind_simple<std::_Bind<std::_Mem_fn<void (ThreadReadSegfaultTest_ns::ThreadReadSegfaultTest::*)()> (ThreadReadSegfaultTest_ns::ThreadReadSegfaultTest*)> ()> >::_M_run() (
    this=0x7fffd4005400) at /usr/include/c++/5/thread:115
#17 0x00007ffff6430c80 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#18 0x00007ffff67016fa in start_thread (arg=0x7fffeb7fe700) at pthread_create.c:333
#19 0x00007ffff5e9fb5d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) 
1 Attachments

Discussion

  • Bourtembourg Reynald

    Hi Johan,

    Thank you for the bug report.
    I think this problem comes from the following change introduced in blackbox.cpp (r28946):

    https://sourceforge.net/p/tango-cs/code/28946/tree//api/cpp/cppapi/branches/Tango_900/server/blackbox.cpp?diff=28853

    I tried to remove the omni_thread::release_dummy() part and the example you provided didn't crash.

    It didn't give the same result as when you execute ReadAttribute though.
    I got an exception (Not able to acquire serialization (dev, class or process) monitor) which produces a timeout on the client level.
    I tried with Tango 8 and I got the same behaviour.

    In your real device where you are doing something similar, are you using the default serialization model (by device, as in your example) or did you change it?

    To me, it looks logical to get such an exception since the monitor is already taken by the thread executing the ReadAttributeFromThread command when the newly created thread invokes read_attribute(), which is trying to acquire the monitor again, but from another thread, hence the exception.
    So unless, you are changing the serialization model, I don't see the point of reading its own attribute from a thread started from a command, and wait for the thread to end before returning the result... But you are probably doing something different in your real world example...

    Removing the omni_thread::release_dummy() part should solve your issue but this means we would come back to a version having the memory leak this r28946 revision was trying to solve.
    There might be other locations in the code and some edge cases where some methods are called directly on omni_thread::self(), without being in an omni_thread or having created a dummy omni_thread before, leading to a similar crash as the one you reported.

    Cheers,
    Reynald

     
  • Bourtembourg Reynald

    It might be a good idea to use the omni_thread::ensure_self class which will take care of creating a dummy omni_thread in cases like this where the code tries to invoke methods directly on omni_thread::self(). This class will ensure a dummy thread is created when needed and will take care of releasing it automatically in its destructor.
    It seems that this class is provided by omni_thread for this purpose.

     
  • Bourtembourg Reynald

    Please note that in Tango 8 and in Tango 9 without the omni_thread::release_dummy() part, the blackbox is reporting the attribute read from the thread as being read from polling, which is not correct in that case.

     
  • Emmanuel Taurel

    Emmanuel Taurel - 2016-09-02

    Bug fixed in SVN repo

     
  • Johan Forsberg

    Johan Forsberg - 2016-11-08

    I saw that #823 was closed as a duplicate of this bug. But is it really the same? I don't think the fix mentioned above also solves the issue regarding commands from another thread. It looks like a similar omithread issue but it does not happen in the black box.

    0x00007ffff4014d02 in omni_thread::get_value(unsigned int) () from /usr/local/lib/libomnithread.so.4
    (gdb) bt
    #0  0x00007ffff4014d02 in omni_thread::get_value(unsigned int) () from /usr/local/lib/libomnithread.so.4
    #1  0x00007ffff504098a in Tango::DeviceImpl::get_client_ident (this=<optimized out>) at device.cpp:4416
    #2  0x00007ffff504df15 in Tango::DeviceImpl::check_lock (this=this@entry=0x120ba30, 
        meth=meth@entry=0x7ffff52401c5 "command_inout4", cmd=cmd@entry=0x7fffe57f8680 "DummyCommand") at device.cpp:4820
    #3  0x00007ffff5077792 in Tango::Device_4Impl::command_inout_4 (this=0x120ba30, in_cmd=0x7fffe57f8680 "DummyCommand", 
        in_data=..., source=Tango::CACHE_DEV, cl_id=...) at device_4.cpp:467
    #4  0x00007ffff51f6162 in _0RL_lcfn_6fe2f94a21a10053_a3000000 (cd=0x7fffe57f7f20, svnt=<optimized out>)
        at tangoSK.cpp:5383
    #5  0x00007ffff430917f in omni::omniOrbPOA::dispatch(omniCallDescriptor&, omniLocalIdentity*) ()
       from /usr/local/lib/libomniORB4.so.2
    #6  0x00007ffff42e9cae in omniLocalIdentity::dispatch(omniCallDescriptor&) () from /usr/local/lib/libomniORB4.so.2
    #7  0x00007ffff42f8ebe in omniObjRef::_invoke(omniCallDescriptor&, bool) () from /usr/local/lib/libomniORB4.so.2
    #8  0x00007ffff51ff71c in Tango::_objref_Device_4::command_inout_4 (this=this@entry=0x7fffd4001be0, 
        command=<optimized out>, argin=..., source=<optimized out>, cl_ident=...) at tangoSK.cpp:5410
    #9  0x00007ffff4f1108e in Tango::Connection::command_inout (this=0x7fffd4001440, command="DummyCommand", data_in=...)
        at devapi_base.cpp:1280
    #10 0x00007ffff59bb6a3 in PyConnection::command_inout (self=..., cmd_name="DummyCommand", argin=...)
        at /tmp/pip-k4qsc2hy-build/ext/connection.cpp:25
    ...
    
     
    • Marius Elvert

      Marius Elvert - 2016-11-08

      My crash in 823 was indeed caused by the blackbox, but we're getting your bug too in another case. I just posted it in #827

       
  • Bourtembourg Reynald

    • Status: open --> closed
     

Log in to post a comment.

MongoDB Logo MongoDB