Menu

#103 Numpy Array Data corruption in taurus cache

unassigned
waiting
nobody
None
bug
2015-03-30
2014-08-06
No

This is a worrying bug:

In certain conditions (see below) reading a Tango SPECTRUM attribute with Taurus (which by default uses cache) returns an array full of garbage instead of the right values (actually it seems that the pointer to the numpy array buffer has been moved).

To reproduce, it is necessary that
a) the device pushes events for that attribute
b) a listener is added to that attribute
c) the read is done soon after an event has been emitted

We observe it only in Suse11 machines with Tango8.Python2.6, Numpy 1.3.
It does not seem to affect

I created an automatic test to show it. It automatically takes care of registering a device, trying the reads and cleaning the DB when finishing. So you just need to copy the attached files into some dir and run "python test_arraycorruption.py":

This is an example of the output in an affected system:

sicilia@ct32suse11:/siciliarep/projects/ctgensoft/issues/sprint4/5525/test_arraycorruption> python -c 'import numpy,PyTango,taurus,sardana; print (numpy.__version__, PyTango.__version__, taurus.Release.version, sardana.Release.version)'
('1.3.0', '8.1.1', '3.2.0', '1.3.1')
sicilia@ct32suse11:/siciliarep/projects/ctgensoft/issues/sprint4/5525/test_arraycorruption> python test_arraycorruption.py
/homelocal/sicilia/lib/python/site-packages/PyTango/__init__.pyc
MainThread     INFO     2014-08-06 12:22:10,897 Starter: Starting server ArrayCorruptor/arraycorruptor...
MainThread     INFO     2014-08-06 12:22:11,903 Starter: Server ArrayCorruptor/arraycorruptor has been started
FMainThread     INFO     2014-08-06 12:22:15,086 Starter: Deleted device unittests/arraycorruptor/temp-1
MainThread     INFO     2014-08-06 12:22:15,086 Starter: Hard killing server ArrayCorruptor/arraycorruptor...
MainThread     INFO     2014-08-06 12:22:18,086 Starter: Server ArrayCorruptor/arraycorruptor has been stopped
MainThread     INFO     2014-08-06 12:22:18,098 Starter: Deleted Server ArrayCorruptor/arraycorruptor

======================================================================
FAIL: testArrayCorruption (__main__.ArrayCorruptionTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_arraycorruption.py", line 52, in testArrayCorruption
    self.assertTrue(all(v==self._firstRead), msg=msg)
AssertionError: False is not True : Corrupted (on try 5): array([ -4.22608972e-38,  -7.70346396e-44,   0.00000000e+00,
         0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
        -4.10454254e+77,  -9.00021178e-42]) != array([ 123.4,  123.4,  123.4,  123.4,  123.4,  123.4,  123.4,  123.4])

----------------------------------------------------------------------
Ran 1 test in 7.235s

FAILED (failures=1)

And this is an example of the same test on an system not affected:

sicilia@ct64suse121:/siciliarep/projects/ctgensoft/issues/sprint4/5525/test_arraycorruption> python test_arraycorruption.py
/homelocal/sicilia/lib/python/site-packages/PyTango/__init__.pyc
MainThread     INFO     2014-08-06 12:40:56,664 Starter: Starting server ArrayCorruptor/arraycorruptor...
MainThread     INFO     2014-08-06 12:40:57,677 Starter: Server ArrayCorruptor/arraycorruptor has been started
MainThread     INFO     2014-08-06 12:41:01,396 Starter: Deleted device unittests/arraycorruptor/temp-1
MainThread     INFO     2014-08-06 12:41:01,396 Starter: Hard killing server ArrayCorruptor/arraycorruptor...
MainThread     INFO     2014-08-06 12:41:04,407 Starter: Server ArrayCorruptor/arraycorruptor has been stopped
MainThread     INFO     2014-08-06 12:41:04,418 Starter: Deleted Server ArrayCorruptor/arraycorruptor
.
----------------------------------------------------------------------
Ran 1 test in 7.787s

OK

We tested with several combinations of machines /OS/library versions. Here is a table that summarizes the tests so far:

OS numpy PyTango Taurus Affected?
suse11.1, 64b 1.3 7.2.3 3.2 No
suse11.1, 32b 1.3 8.1.1 3.2 Yes
suse11.1, 64b 1.3 8.1.1 3.2 Yes
suse11.1, 64b 1.8.1 8.1.1 3.2 Yes
suse12.1, 32b 1.6.1 8.1.1 3.2 No
suse12.1, 64b 1.6.1 8.1.1 3.2 No
Debian 8, 64b 1.8.1 8.1.1 3.2 No

Note: The problem may take a varying number of reads to be triggered, and my test "only" reads 1000 times, so it sometimes passes without triggering even if the system is affected So just in case, to be sure, in case the test is passed re-run it 3-4 times to be sure.

2 Attachments

Discussion

  • Carlos Pascual

    Carlos Pascual - 2014-08-06

    Some remarks, thoughts...:

    • The problem looks like related to an outdated buffer pointer or a wrong ref to a numpy Array, so we suspected numpy was the problem... but installing a newer numpy did not solve the problem (maybe it is in some dependency of numpy?)

    • The newer systems do not seem to be affected at all

    • We could not reproduce the problem with Tango7, even in the same machines that are affected when using Tango8. Considering that we need events to reproduce the bug, it may indicate some relation to zmq (???)

     
  • Carlos Falcon

    Carlos Falcon - 2014-08-07

    Other dependency that is different in both systems is Boost.
    We saw that suse12.1, 64b has Boost 1.46 (from system packages) and suse11.1 (32b and 64b) is using the 1.42 (compiled by hand).

    So, we compiled Boost 1.46 for Suse11.1 64b, and recompiled PyTango with the new Boost to try to minimize differences

    When we repeated the test we saw the same problems.

    OS numpy PyTango Taurus Boost Affected?
    suse11.1, 64b 1.8.1* 8.1.1 3.2 1.46 Yes

    PyTango.so suse11.1 64b

    ldd _PyTango.so
    linux-vdso.so.1 => (0x00007fffe09fd000)
    libtango.so.8 => /siciliarep/jenkins/ct64suse11/env/TANGO_RELEASE/8.1.2/lib/libtango.so.8 (0x00007fcfd788f000)
    libomniORB4.so.1 => /siciliarep/jenkins/ct64suse11/env/TANGO_RELEASE/8.1.2/lib/libomniORB4.so.1 (0x00007fcfd74d2000)
    libboost_python.so.1.46.1 => /siciliarep/build/lib/boost/boost_1_46_1.suse111.64/lib/libboost_python.so.1.46.1 (0x00007fcfd727d000)
    libpython2.6.so.1.0 => /usr/lib64/libpython2.6.so.1.0 (0x00007fcfd6ec8000)
    libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007fcfd6bbb000)
    libm.so.6 => /lib64/libm.so.6 (0x00007fcfd6965000)
    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fcfd674d000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fcfd6530000)
    libc.so.6 => /lib64/libc.so.6 (0x00007fcfd61d7000)
    liblog4tango.so.5 => /siciliarep/jenkins/ct64suse11/env/TANGO_RELEASE/8.1.2/lib/liblog4tango.so.5 (0x00007fcfd5fbd000)
    libzmq.so.3 => /siciliarep/jenkins/ct64suse11/env/TANGO_RELEASE/8.1.2/lib/libzmq.so.3 (0x00007fcfd5d6e000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00007fcfd5b6a000)
    libomniDynamic4.so.1 => /siciliarep/jenkins/ct64suse11/env/TANGO_RELEASE/8.1.2/lib/libomniDynamic4.so.1 (0x00007fcfd5665000)
    libCOS4.so.1 => /siciliarep/jenkins/ct64suse11/env/TANGO_RELEASE/8.1.2/lib/libCOS4.so.1 (0x00007fcfd522b000)
    libnsl.so.1 => /lib64/libnsl.so.1 (0x00007fcfd5013000)
    libomnithread.so.3 => /siciliarep/jenkins/ct64suse11/env/TANGO_RELEASE/8.1.2/lib/libomnithread.so.3 (0x00007fcfd4e0b000)
    libutil.so.1 => /lib64/libutil.so.1 (0x00007fcfd4c08000)
    librt.so.1 => /lib64/librt.so.1 (0x00007fcfd49ff000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fcfd8948000)

    PyTango.so suse12.1 64b

    ldd _PyTango.so
    linux-vdso.so.1 => (0x00007fffe91ff000)
    libtango.so.8 => /homelocal/sicilia/lib/libtango.so.8 (0x00007fe32f119000)
    liblog4tango.so.5 => /homelocal/sicilia/lib/liblog4tango.so.5 (0x00007fe32eeff000)
    libzmq.so.3 => /homelocal/sicilia/lib/libzmq.so.3 (0x00007fe32ecb7000)
    libomniDynamic4.so.1 => /homelocal/sicilia/lib/libomniDynamic4.so.1 (0x00007fe32e7bb000)
    libCOS4.so.1 => /homelocal/sicilia/lib/libCOS4.so.1 (0x00007fe32e37b000)
    libomniORB4.so.1 => /homelocal/sicilia/lib/libomniORB4.so.1 (0x00007fe32dfc7000)
    libomnithread.so.3 => /homelocal/sicilia/lib/libomnithread.so.3 (0x00007fe32ddc1000)
    libboost_python.so.1.46.1 => /usr/lib64/libboost_python.so.1.46.1 (0x00007fe32db43000)
    libpython2.7.so.1.0 => /usr/lib64/libpython2.7.so.1.0 (0x00007fe32d796000)
    libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007fe32d48c000)
    libm.so.6 => /lib64/libm.so.6 (0x00007fe32d234000)
    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fe32d01e000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fe32ce01000)
    libc.so.6 => /lib64/libc.so.6 (0x00007fe32ca70000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00007fe32c86c000)
    libnsl.so.1 => /lib64/libnsl.so.1 (0x00007fe32c653000)
    librt.so.1 => /lib64/librt.so.1 (0x00007fe32c44b000)
    libutil.so.1 => /lib64/libutil.so.1 (0x00007fe32c2

     
  • Carlos Pascual

    Carlos Pascual - 2015-02-03
    • Milestone: Jan15 --> unassigned
     
  • Tiago Coutinho

    Tiago Coutinho - 2015-03-30

    Ticket moved from /p/sardana/tickets/230/

    Can't be converted:

    • _category: taurus