Qt Terminal: make sure that all Qt objects are only initialized after main()
This is necessary to avoid the Static Initialization Order Fiasco, where
Qt objects are initialized before the Qt library. This problem lead to
crashes when trying to run gnuplot with a Qt terminal on Windows (note
that Windows support for Qt Terminal needs some more patches, but this is
the first critical piece).
The initialization is delayed by wrapping all objects in a struct and
accessing the static struct via an accessor function. This makes sure
that the struct is initialized when the function is called first, not
before.
I'd like to understand better how one would diagnose or debug the problem this patch tries to fix. Can you provide a pointer to discussion somewhere?
Meanwhile...
The patch applies cleanly to current CVS, but when I try to compile the revised source I get the following:
Also, are you working with Qt4 or Qt5? Does it matter? I ask partly because when I build the current source with Qt5 there is a problem (segfault) at the time the program exits. This doesn't happen with Qt4, and so far as I can tell the program operates correctly under Qt5 right up until the time it tries to exit. I have been unable to pin down the cause, but now you have me wondering if this patch is related.
Last edit: Ethan Merritt 2013-10-08
So I sprinkled in a few more declarations of
QtGnuplotState & qt = getQtGnuplotState();
to get past the compiler errors.
Now it runs but spews out messages like the following:
QThreadStorage: Thread 0x8f35fb8 exited after QThreadStorage 30 destroyed
QThreadStorage: Thread 0x8f35fb8 exited after QThreadStorage 29 destroyed
What is this telling me?
First off, I'm very sorry for the incomplete patch. I forward-ported the patch from an older CVS version of Gnuplot, checked that the patch applied cleanly, but forgot to test it actually compiled :(
As for the actual issue the patch tries to solve: The general issue is best described in the C++ FAQ: http://www.parashift.com/c++-faq/static-init-order.html
Whenever there are static initializations (at the file level) in several translation units, it is undefined which of these run first.
As a simple example, qapplication.cpp in Qt 4.8.4 contains a static integer variable drag_time, which can be queried by calling QApplication::startDragTime(). This variable is initialized in qapplication.cpp as
However, if there is code in another translation unit which does
the behaviour is undefined, because the static variable drag_time may not have been initialized yet. Initialization is guaranteed to be completed by the time main() is called. Additionally, function scoped static variables are initialized the first time control flow passes over them. That's why I introduced getQtGnuplotState(), with a static variable containing all the state. As long as getQtGnuplotState() is only called after main() (i.e. not from other static initializations), all should be OK because by that time the Qt library is fully initialized.
I encountered this issue when trying to port the Qt terminal to Windows. On Linux it seems to work (maybe the libraries are initialized first), but this is in no way guaranteed.
I still use Qt4 and haven't seen the QThreadStorage error messages you mention. I'm currently upgrading my build system and will see if I find anything.
But the Static Initialization Order Fiasco arises when references in one compilation unit depend on static initialization in a different compilation unit. Your patch touches only qt_term.cpp, which is a single compilation unit. So far as I can see, none of the variables being initialized are referenced even indirectly by other files. So what exactly is triggering the Fiasco? Can you pin it down to a specific external reference that runs amok? I really like to understand the problem better before accepting that this is the proper fix.
The variables in qt_term.cpp indirectly reference static variables in the Qt library. Consider the following example:
This code is currently in qt_term.cpp. qt_codec is initialized before main(). Looking at src/corelib/codecs/qtextcodec.cpp (from Qt 4.8.4), we see that codecForLocale() accesses the static variable localeMapper. At this stage of the program execution, however, localeMapper may have any value, so the function potentially returns an arbitrary invalid pointer, which will lead to crash when the pointer is used later. Note that this problem also encompasses constructor calls, so the following simple line may crash:
The solution implemented in the patch avoids this by not calling any Qt function (not even constructors) until after main().
I investigated the code a bit more and suspect that the two examples cited are actually OK, because they only reference statically initialized POD variables, no statically initialized classes which have constructors. I'm not 100% sure what the standard says here, but I think PODs are OK because they are probably initialized before any constructor runs.
The QLocalSocket code however looks quite suspicious on Windows. I'll look at it again tomorrow and report my findings.
FWIW, your patch cures the segfault-on-exit problem I've been having with Qt5. So that's good. But I'd still like to understand exactly why. Both Qt4 and Qt5 emit the "Thread 0x8f35fb8 exited after QThreadStorage 29 destroyed" messages.
Can you set a breakpoint on where this message happens and send me the backtrace? If we see which class uses QThreadStorage here, we might be able to pinpoint the problem.
In Qt4, this is line 199 of src/corelib/thread/qthreadstorage.cpp, function QThreadStorageData::finish.
backtrace is here:
This is triggered inside the qt_atexit routine in qt_term.cpp. I would guess that the issue really arises at some earlier point when some per-thread data structures are released. I suspect that the release is not really premature in terms of program function, but apparently the garbage collection system thinks it is. Do you know how to trap this hypothetical early release?
The other clue I have is from valgrind. Valgrind has in the past reported a ton of "lost" memory on exit from gnuplot when the qt terminal is active. Your patch reduces this by about 90%. Most of the remaining 10% is flagged as resulting from execution of lines qt_term.cpp:417-418 (reproduced below)
Last edit: Ethan Merritt 2013-10-10
Correction: The backtrace is from after execution of qt_atexit(). If I bypass qt_exit() the same error messages are printed.
Wild guess: maybe QApplication is deleted too late. Does a "delete qt.gnuplot_qt_application" at the end of qt_atexit() help?
Unfortunately I cannot reproduce the issue at the moment. I'll be away till monday; then I will be able to look into this a bit further.
Thomas - to the extent that I understand the error message and valgrind output, I don't think any action taken at exit will fix it. It is [I think] reporting that some memory chunk allocated early on has been lost and cannot be properly freed. In fact commenting out lines 418-419 gets rid of both the valgrind trace and the qt error message. Of course the program doesn't function properly in that case because it donesn't know the font size :-(
Since the qt error message only appeared after your patch that adds a deferred structure initialization, I'm wondering if there needs to be a matching deconstructor routine?
I could finally reproduce the issue. The problem seems to be that qt_atexit() runs too late. qt_atexit() is registered using atexit(3), so it runs after exit(3) has been called. But by setting a breakpoint at QThreadStorageData::~QThreadStorageData and QThreadStorageData::finish, one can see that the runtime library calls the static destructors of the Qt library before handlers registered using atexit(3). Specifically, variables declared using Q_GLOBAL_STATIC are destroyed before custom handlers run.
So it seems to me that calling any Qt destructors in atexit-handlers is dangerous, because there is no guaranteed order of destruction.
I have reworked the patch to simply never delete the Qt variables. This leaks some memory, of course, but this should not be a problem, as the program will exit shortly afterwards and free all memory.
The proper solution would probably be to register such cleanup functions in gnuplot internally and call them before calling exit(3). But this would probably be a larger change.
The attached patch works for me, on Linux using current CVS and Qt 4.8.
Unfortunately v2 of your patch triggers the original problem when linked with qt5 - it segfaults on exit. Using qt4 I see what I assume is the same as you do - the memory allocated by each call to qt_init() is never freed but there are no warnings or segfaults on exit.
The best I have come up with is to go back to your v1 patch add an explicit destructor:
For me that works equally well when built against either qt4 or qt5. Does it work for you?
Which version of Qt5 are you using (and on which platform)? I tried with Qt 5.0.1 on Ubuntu Raring, and am unable to trigger any segfault (i.e. it works fine).
Calling "exit(0)" sounds dangerous - the linux man page says that behaviour is undefined (so it could lead e.g. to an infinite loop). In my tests on Linux the exit() calls worked, but it clobbered any previous return value. This means that when using the qt terminal, gnuplot would always exit with return code 0. I'm not sure if this is relevant, as the return code is probably not very interesting in the interactive case - but still worth mentioning.
I also read up on atexit-handlers a bit. According to the standard, there is a defined order for atexit-handlers and global destructors - they are called in the reverse order they were added/the global constructor ran. So if the variables in the Qt library were already fully instantiated by the time GP_ATEXIT() was called, they should be destructed after qt_atexit(). This doesn't match what I saw in the debugger, so I'm not really sure what is going on there.
Name: Qt5 Core
Description: Qt Core module
Version: 5.0.2
Libs: -L${libdir} -lQt5Core
Libs.private: -lpthread -lz -licui18n -licuuc -lpcre16 -lm -ldl -pthread -lgthread-2.0 -lg
lib-2.0 -lrt
Cflags: -I${includedir}/QtCore -I${includedir}
New problem with v2 patch:
Attached is a new version of the patch. It tries to fix the problems encountered so far by running gnuplot specific exit handlers before exit(3). That way, Qt is still fully initialized when qt_atexit() runs.
This should avoid any problems with intermixed destruction handlers from various libaries. The downside is that the patch now touches more codepaths, some of which I cannot test.
I tested the patch with various terminals on Linux (pngcairo, wxt, qt, x11). The patch also fixes the bug described in the last comment (the code tried to access the qt data when it was not initialized. This was already incorrect in CVS, but didn't lead to a crash there).
What do you mean "this was already incorrect in CVS"? Is there a simple patch to fix just this issue that can be applied to the stable branch (4.6) even apart from the more major revision your patch is turning into?
I am not keen on touching all those other drivers. Modifying gplt_x11.c ?! That's not even in the same executable at anything dealing with Qt. If the program is trying to exit directly from a terminal driver then something has gone horribly wrong already. I don't think preserving qt exit handlers is a priority at that point.
Anyhow, it will take me a while to evaluate this latest round of patches.
Re "this was already incorrect in CVS": qt_set_cursor (and, I think, qt_put_tmptext) are called even if the terminal is not initialized. Both functions don't check if the qt terminal is initialized (in CVS this would be "if (!qt_initialized) return;") but happily write to qt_out and call qt_flushOutBuffer(). I don't think it has any real negative consequences in the CVS version, but I still think the code is not correct as written.
So you could certainly add the check for qt_initialized at the beginning of those two functions. But it is certainly no high priority for CVS.
(Sidenote: it would be great if it was documented which terminal functions may be called before the terminal is initialized; if it is documented, I haven't found it.)
Re gplt_x11.c: At first I didn't want to touch the file (seeing that it forms a separate executable), but getcolor.c is used both by gnuplot and gnuplot_x11. getcolor.c calls exit, so I replaced this function by gp_exit, to make the code consistent. But this meant adding stdfn.c to gnuplot_x11, and then it was only logical to use gp_exit() consistently in gnuplot_x11.
I agree that this makes the patch a bit large and touches many different drivers. It is certainly possible to trim it down a bit (for example: only call gp_exit() if the error happens after basic initialization). The benefits I see for the current patch:
Re: qt_set_cursor and qt_put_tmptext
The intent is that the only terminal function that may be called prior to term->init() is term->options(). I see that in practice term->waitforinput() is also called if present. I suppose that's harmless so long as it returns getc(stdin) if mousing is not yet set up. I have edited .../term/README to state this explicitly.
The pre-initialization calls to term->set_cursor and term->put_tmptext are a bug. I have now fixed that in cvs for both 4.6 and 4.7
Thanks for finding this.
Re: gp_exit(). There is a Feature Request that gnuplot be consistent about returning status to the shell when it exits. That isn't practical right now, but if we do go to the trouble of introducing a single exit path it might become so.
This is looking pretty good. I've given it a workout under qt4 and qt5 on both 32-bit and 64-bit builds and clang/llvm as well as gcc. All my testing used linux, though, so I can't say whether there are wrinkles that will show up elsewhere. I have access to OpenVMS and Sun machines, but they don't have qt installed (if there even is such a port) so there's not much I can test.
I'm going to break this out into 2 or 3 stages.
Stage 1 will be to modify the core code and outboard terminal drivers to use the private gp_exit() and gp_atexit() routines. I'll add this to CVS soon.
I am not happy extending this to the larger set of [antique? legacy? whatever you want to call them] drivers like next and openstep. I am dubious that anyone is using these, and I figure any changes to the code is as likely to break something as fix something.
The change to add a constructor to qt_term.cpp will be stage 2. It's looking good but I still see 1 memory leak in valgrind traces. It would be nice to either squash that or document that it is unavoidable.
I haven't looked at the 2 more recent patches yet.
I agree that patching old drivers is more risk than it's worth, considering that we cannot test the changes.
I looked into the Valgrind traces (with --leak-check=full) and see several errors. The most common ends in "QLibraryPrivate::findOrCreate". I don't think we can do anything about this one (short of opening a bug against Qt). As far as I understood the code, Qt loads various libraries dynamically (things like Xrandr and Xcursor), and does not ever unload them.
The other major offender seems to be glib, when Qt initializes its event loop. I don't know much about glib and its interaction with Qt, so I can't say much about that one. But there seems to be no cleanup function for QEventDispatcherGlibPrivate, which handles access to the glib context, so I suspect that we also cannot do anything about that.
If you see any other valgrind issues, please post them here - maybe I have overlooked some.
One think that would help here is not linking gnuplot against QtGui and not using QApplication. Theoretically it would be enough to link gnuplot_qt against QtGui. But then all font handling would have to move to gnuplot_qt and gnuplot would have to query the terminal for the font settings. This is certainly doable, but needs more time than I currently have for this issue.
Thank you very much for looking at the patches and integrating the first part of it. I greatly appreciate you continuing work on gnuplot!