From: Andrew R. <and...@us...> - 2011-09-05 21:58:59
|
On Mon, Sep 05, 2011 at 12:33:19PM -0700, Alan Irwin wrote: > Hi Andrew: > > On 2011-09-05 09:55+0100 Andrew Ross wrote: > > > I'll try to put a patched copy of UseQt4.cmake > > into the plplot cmake/modules directory and see if that helps/ > > I saw in your later post that made no difference. That's too bad, but > it was worth a try. Indeed! > > > > > The gdb error message I get exactly matches the one Orion reported > > for Fedora so it's not a Debian specific problem. Interestingly > > the missing symbol is actually present in libQtCore.so which > > libQtGui.so is linked against. > > > > Alan, do you see any issues with Qt 4.7.3 on Linux? > > > I just use the system Qt-4.6.3 libraries for Debian Squeeze which work > fine for me when I run the comprehensive testing script. > > On your platform with the most disk space (it doesn't matter whether > it is Debian or Ubuntu), have you tried downloading vanilla 4.6.3 and > 4.7.3 directly from Nokia, building the svn version of PLplot against > either of those (by putting the appropriate version of qmake on your > PATH) and comparing PLplot results from the two? I have had good > success before with the Nokia versions so that is the first thing I > suggest you try. > > ftp://ftp.qt.nokia.com/qtsdk has a historical record of all releases > since the start of 2009. You can play with the md5sums.txt file there > to see what is available. > irwin@raven> grep linux md5sums.txt |grep _64 |cut --delimiter=" " > --fields=3 > qt-sdk-linux-x86_64-opensource-2009.01.bin > qt-sdk-linux-x86_64-opensource-2009.02.bin > qt-sdk-linux-x86_64-opensource-2009.03.1.bin > qt-sdk-linux-x86_64-opensource-2009.03.bin > qt-sdk-linux-x86_64-opensource-2009.04.bin > qt-sdk-linux-x86_64-opensource-2009.04.1.bin > qt-sdk-linux-x86_64-opensource-2009.05-rc1.bin > qt-sdk-linux-x86_64-opensource-2009.05.bin > qt-sdk-linux-x86_64-opensource-2010.01.bin > qt-sdk-linux-x86_64-opensource-2010.02.bin > qt-sdk-linux-x86_64-opensource-2010.03-setup.bin > qt-sdk-linux-x86_64-opensource-2010.04.bin > qt-sdk-linux-x86_64-opensource-2010.05-rc1.bin > qt-sdk-linux-x86_64-opensource-2010.05.bin > qt-sdk-linux-x86_64-opensource-2010.05.1.bin > > irwin@raven> grep Lin64_offline_v1_1_[0-3] md5sums.txt |cut > --delimiter=" " --fields=3 > Qt_SDK_Lin64_offline_v1_1_1_en.run > Qt_SDK_Lin64_offline_v1_1_2_en.run > Qt_SDK_Lin64_offline_v1_1_3_en.run > > v1_1_3 sdk corresponds to the latest release (Qt-4.7.4) which implies > v1_1_2 sdk corresponds to 4.7.3. Also, from a previous download I know > that 2010.01 corresponds to 4.6.1 which implies 2010.03 corresponds to > 4.6.3. > > If both vanilla 4.6.3 and 4.7.3 from Nokia work fine, then there is > likely to be a bug in the Debian 4.7.3 packaging. If 4.6.3 works > (like the Debian packaged version of it does for me), but 4.7.3 does > not, perhaps there is an upstream bug in the 4.7.x series, and you > should also try 4.7.4 to see if that continues or whether it is > specific to 4.7.3. > > Whatever the 4.7.3 issue is, I would doubt it was anything done on > purpose by the Qt-4 development team since they make such a strong > effort to be backwards compatible, and the early trend in the Qt4 > series of releases was the PLplot qt driver got more and more reliable > until qt with Qt-4.6.3 has been just as reliable for me with > comprehensive testing as the cairo device driver. > > I suppose it is also possible there are new PLplot build-system or > CMake-2.8.5 issues with Qt-4.7.x that did not occur for Qt-4.6.x, but > I view those two possibilities as fairly unlikely since both the > PLplot build system and CMake-2.8.5 try to be as independent as > possible of Qt version. But if Nokia 4.6.3 succeeds for you, but both > 4.7.3 and 4.7.4 from Nokia fail, then you would have to look into > those possibilities. Alan, Thanks for the suggestion. I'll bear this in mind, but I think I am now making progress. I don't think it is a Debian specific error since Orion also sees the problem. The fact that this occurs in the dynamic loader handling code makes me suspect a linkage bug somewhere along the line. I tried a minimal build inside the Debian testing chroot (no fancy packaging stuff) and lo and behold it all works fine. So there must be something different about what the Debian tools do to the library. My first thought was the rpath since Debian doesn't use any rpath information for libraries, but this seems to make no difference. (Incidentally I discovered that -DUSE_RPATH=OFF does not disable all rpath information. You need to use the CMake option -DCMAKE_SKIP_RPATH=ON to do this). My minimal testing build just used qtwidget and epsqt devices. I noticed the only difference in "ldd qt.so" between the Debian build and the test build was the absence of libQtSvg in the latter. Tried again, but including the svgqt device. Suddenly the crashes start again. Looks like either something included in plplot with the svgqt driver or simply having libQtSvg.so linked in causes the problem. Since the problem seemed to be related to dynamic loading I also tried building without dynamic drivers, but with the svgqt driver. This also fixed the problem. Similarly just calling plend1 rather than plend at the end of the example prevented the error. A quick google search shows up the following thread. It seems that this may be exactly the problem I am encountering. http://lists.gnu.org/archive/html/libtool/2009-12/msg00115.html So it looks like somewhere qt is setting an exit handler which is called when the program calls exit(), but the qt libraries have already been unloaded at that point so the handler no longer exists. I still don't understand precisely why the error occurs, but I can at least work around it by marking the qt driver as resident. This stops it being unloaded by lt_dlexit() and so prevents the crash. This is a bit ugly though, but at least it works for now. An alternative approach for packaging is just to disable the svgqt driver for now. Also not ideal. Andrew P.S. For information the relevant part of the valgrind message is ==22259== Invalid write of size 1 ==22259== at 0x400DA5A: _dl_signal_error (dl-error.c:101) ==22259== by 0x400DBD9: _dl_signal_cerror (dl-error.c:152) ==22259== by 0x400A156: _dl_lookup_symbol_x (dl-lookup.c:772) ==22259== by 0x400D721: _dl_fixup (dl-runtime.c:119) ==22259== by 0x4013594: _dl_runtime_resolve (dl-trampoline.S:41) ==22259== by 0x7654623: QGlobalStaticDeleter<QThreadStorage<QFontCache*> >::~QGlobalStaticDeleter() (qthreadstorage.h:137) ==22259== by 0x625AD81: __run_exit_handlers (exit.c:78) ==22259== by 0x625ADD4: exit (exit.c:100) ==22259== by 0x401552: main (in /usr/share/doc/libplplot11/examples/c/x01c) ==22259== Address 0x7fefffab0 is just below the stack ptr. To suppress, use: --workaround-gcc296-bugs=yes ==22259== ==22259== Warning: client switching stacks? SP change: 0x7fefffbe8 --> 0x1e95891cc5213c14 ==22259== to suppress, use: --max-stackframe=2203818314984144940 or greater ==22259== Jump to the invalid address stated on the next line ==22259== at 0xBCA5891CC51A9E8C: ??? ==22259== Address 0xbca5891cc51a9e8c is not stack'd, malloc'd or (recently) free'd ==22259== ==22259== ==22259== Process terminating with default action of signal 11 (SIGSEGV) ==22259== Bad permissions for mapped region at address 0xBCA5891CC51A9E8C ==22259== at 0xBCA5891CC51A9E8C: ??? ==22259== Invalid write of size 8 ==22259== at 0x4A225B0: _vgnU_freeres (in /usr/lib/valgrind/vgpreload_core-amd64-linux.so) ==22259== Address 0x1e95891cc5213c0c is not stack'd, malloc'd or (recently) free'd ==22259== ==22259== ==22259== Process terminating with default action of signal 11 (SIGSEGV) ==22259== General Protection Fault ==22259== at 0x4A225B0: _vgnU_freeres (in /usr/lib/valgrind/vgpreload_core-amd64-linux.so) |