This is a worrying bug:
In certain conditions (see below) reading a Tango SPECTRUM attribute with Taurus (which by default uses cache) returns an array full of garbage instead of the right values (actually it seems that the pointer to the numpy array buffer has been moved).
To reproduce, it is necessary that
a) the device pushes events for that attribute
b) a listener is added to that attribute
c) the read is done soon after an event has been emitted
We observe it only in Suse11 machines with Tango8.Python2.6, Numpy 1.3.
It does not seem to affect
I created an automatic test to show it. It automatically takes care of registering a device, trying the reads and cleaning the DB when finishing. So you just need to copy the attached files into some dir and run "python test_arraycorruption.py":
This is an example of the output in an affected system:
sicilia@ct32suse11:/siciliarep/projects/ctgensoft/issues/sprint4/5525/test_arraycorruption> python -c 'import numpy,PyTango,taurus,sardana; print (numpy.__version__, PyTango.__version__, taurus.Release.version, sardana.Release.version)' ('1.3.0', '8.1.1', '3.2.0', '1.3.1') sicilia@ct32suse11:/siciliarep/projects/ctgensoft/issues/sprint4/5525/test_arraycorruption> python test_arraycorruption.py /homelocal/sicilia/lib/python/site-packages/PyTango/__init__.pyc MainThread INFO 2014-08-06 12:22:10,897 Starter: Starting server ArrayCorruptor/arraycorruptor... MainThread INFO 2014-08-06 12:22:11,903 Starter: Server ArrayCorruptor/arraycorruptor has been started FMainThread INFO 2014-08-06 12:22:15,086 Starter: Deleted device unittests/arraycorruptor/temp-1 MainThread INFO 2014-08-06 12:22:15,086 Starter: Hard killing server ArrayCorruptor/arraycorruptor... MainThread INFO 2014-08-06 12:22:18,086 Starter: Server ArrayCorruptor/arraycorruptor has been stopped MainThread INFO 2014-08-06 12:22:18,098 Starter: Deleted Server ArrayCorruptor/arraycorruptor ====================================================================== FAIL: testArrayCorruption (__main__.ArrayCorruptionTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "test_arraycorruption.py", line 52, in testArrayCorruption self.assertTrue(all(v==self._firstRead), msg=msg) AssertionError: False is not True : Corrupted (on try 5): array([ -4.22608972e-38, -7.70346396e-44, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, -4.10454254e+77, -9.00021178e-42]) != array([ 123.4, 123.4, 123.4, 123.4, 123.4, 123.4, 123.4, 123.4]) ---------------------------------------------------------------------- Ran 1 test in 7.235s FAILED (failures=1)
And this is an example of the same test on an system not affected:
sicilia@ct64suse121:/siciliarep/projects/ctgensoft/issues/sprint4/5525/test_arraycorruption> python test_arraycorruption.py /homelocal/sicilia/lib/python/site-packages/PyTango/__init__.pyc MainThread INFO 2014-08-06 12:40:56,664 Starter: Starting server ArrayCorruptor/arraycorruptor... MainThread INFO 2014-08-06 12:40:57,677 Starter: Server ArrayCorruptor/arraycorruptor has been started MainThread INFO 2014-08-06 12:41:01,396 Starter: Deleted device unittests/arraycorruptor/temp-1 MainThread INFO 2014-08-06 12:41:01,396 Starter: Hard killing server ArrayCorruptor/arraycorruptor... MainThread INFO 2014-08-06 12:41:04,407 Starter: Server ArrayCorruptor/arraycorruptor has been stopped MainThread INFO 2014-08-06 12:41:04,418 Starter: Deleted Server ArrayCorruptor/arraycorruptor . ---------------------------------------------------------------------- Ran 1 test in 7.787s OK
We tested with several combinations of machines /OS/library versions. Here is a table that summarizes the tests so far:
OS | numpy | PyTango | Taurus | Affected? |
---|---|---|---|---|
suse11.1, 64b | 1.3 | 7.2.3 | 3.2 | No |
suse11.1, 32b | 1.3 | 8.1.1 | 3.2 | Yes |
suse11.1, 64b | 1.3 | 8.1.1 | 3.2 | Yes |
suse11.1, 64b | 1.8.1 | 8.1.1 | 3.2 | Yes |
suse12.1, 32b | 1.6.1 | 8.1.1 | 3.2 | No |
suse12.1, 64b | 1.6.1 | 8.1.1 | 3.2 | No |
Debian 8, 64b | 1.8.1 | 8.1.1 | 3.2 | No |
Note: The problem may take a varying number of reads to be triggered, and my test "only" reads 1000 times, so it sometimes passes without triggering even if the system is affected So just in case, to be sure, in case the test is passed re-run it 3-4 times to be sure.
Some remarks, thoughts...:
The problem looks like related to an outdated buffer pointer or a wrong ref to a numpy Array, so we suspected numpy was the problem... but installing a newer numpy did not solve the problem (maybe it is in some dependency of numpy?)
The newer systems do not seem to be affected at all
We could not reproduce the problem with Tango7, even in the same machines that are affected when using Tango8. Considering that we need events to reproduce the bug, it may indicate some relation to zmq (???)
Other dependency that is different in both systems is Boost.
We saw that suse12.1, 64b has Boost 1.46 (from system packages) and suse11.1 (32b and 64b) is using the 1.42 (compiled by hand).
So, we compiled Boost 1.46 for Suse11.1 64b, and recompiled PyTango with the new Boost to try to minimize differences
When we repeated the test we saw the same problems.
PyTango.so suse11.1 64b
ldd _PyTango.so
linux-vdso.so.1 => (0x00007fffe09fd000)
libtango.so.8 => /siciliarep/jenkins/ct64suse11/env/TANGO_RELEASE/8.1.2/lib/libtango.so.8 (0x00007fcfd788f000)
libomniORB4.so.1 => /siciliarep/jenkins/ct64suse11/env/TANGO_RELEASE/8.1.2/lib/libomniORB4.so.1 (0x00007fcfd74d2000)
libboost_python.so.1.46.1 => /siciliarep/build/lib/boost/boost_1_46_1.suse111.64/lib/libboost_python.so.1.46.1 (0x00007fcfd727d000)
libpython2.6.so.1.0 => /usr/lib64/libpython2.6.so.1.0 (0x00007fcfd6ec8000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007fcfd6bbb000)
libm.so.6 => /lib64/libm.so.6 (0x00007fcfd6965000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fcfd674d000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fcfd6530000)
libc.so.6 => /lib64/libc.so.6 (0x00007fcfd61d7000)
liblog4tango.so.5 => /siciliarep/jenkins/ct64suse11/env/TANGO_RELEASE/8.1.2/lib/liblog4tango.so.5 (0x00007fcfd5fbd000)
libzmq.so.3 => /siciliarep/jenkins/ct64suse11/env/TANGO_RELEASE/8.1.2/lib/libzmq.so.3 (0x00007fcfd5d6e000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fcfd5b6a000)
libomniDynamic4.so.1 => /siciliarep/jenkins/ct64suse11/env/TANGO_RELEASE/8.1.2/lib/libomniDynamic4.so.1 (0x00007fcfd5665000)
libCOS4.so.1 => /siciliarep/jenkins/ct64suse11/env/TANGO_RELEASE/8.1.2/lib/libCOS4.so.1 (0x00007fcfd522b000)
libnsl.so.1 => /lib64/libnsl.so.1 (0x00007fcfd5013000)
libomnithread.so.3 => /siciliarep/jenkins/ct64suse11/env/TANGO_RELEASE/8.1.2/lib/libomnithread.so.3 (0x00007fcfd4e0b000)
libutil.so.1 => /lib64/libutil.so.1 (0x00007fcfd4c08000)
librt.so.1 => /lib64/librt.so.1 (0x00007fcfd49ff000)
/lib64/ld-linux-x86-64.so.2 (0x00007fcfd8948000)
PyTango.so suse12.1 64b
ldd _PyTango.so
linux-vdso.so.1 => (0x00007fffe91ff000)
libtango.so.8 => /homelocal/sicilia/lib/libtango.so.8 (0x00007fe32f119000)
liblog4tango.so.5 => /homelocal/sicilia/lib/liblog4tango.so.5 (0x00007fe32eeff000)
libzmq.so.3 => /homelocal/sicilia/lib/libzmq.so.3 (0x00007fe32ecb7000)
libomniDynamic4.so.1 => /homelocal/sicilia/lib/libomniDynamic4.so.1 (0x00007fe32e7bb000)
libCOS4.so.1 => /homelocal/sicilia/lib/libCOS4.so.1 (0x00007fe32e37b000)
libomniORB4.so.1 => /homelocal/sicilia/lib/libomniORB4.so.1 (0x00007fe32dfc7000)
libomnithread.so.3 => /homelocal/sicilia/lib/libomnithread.so.3 (0x00007fe32ddc1000)
libboost_python.so.1.46.1 => /usr/lib64/libboost_python.so.1.46.1 (0x00007fe32db43000)
libpython2.7.so.1.0 => /usr/lib64/libpython2.7.so.1.0 (0x00007fe32d796000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007fe32d48c000)
libm.so.6 => /lib64/libm.so.6 (0x00007fe32d234000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fe32d01e000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fe32ce01000)
libc.so.6 => /lib64/libc.so.6 (0x00007fe32ca70000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fe32c86c000)
libnsl.so.1 => /lib64/libnsl.so.1 (0x00007fe32c653000)
librt.so.1 => /lib64/librt.so.1 (0x00007fe32c44b000)
libutil.so.1 => /lib64/libutil.so.1 (0x00007fe32c2
Ticket moved from /p/sardana/tickets/230/
Can't be converted: