With gnuplot 6.0.0 under Debian/unstable, using the FVWM window manager and its ManualPlacement configuration, and the wxt terminal, if I run
{ echo 'plot x' ; sleep 3 ; } | gnuplot -persist
there are no issues if I place the window before the 3 seconds, but if I place the window after the 3 seconds, gnuplot hangs and the window gets never drawn (its permanently gets the contents of what's behind the window).
Note that
{ echo 'plot x' ; exec >&- ; sleep 3 ; } | gnuplot -persist
triggers the problem even if I place the window before the 3 seconds.
If I run gnuplot via ssh (even ssh localhost
) or if I use strace
to try to debug, the problem disappears. So this seems to be a race condition.
FVWM's ManualPlacement is described as
ManualPlacement (aka active placement). The user is required to
place every new window manually. The window only shows as a
rubber band until a place is selected manually. The window is
placed when a mouse button or any key except Escape is pressed.
Escape aborts manual placement which places the window in the
top left corner of the screen. If mouse button 2 is pressed
during the initial placement of a window (respectively Shift and
mouse button 1 in case Mwm emulation has been enabled with the
Emulate command), the user is asked to resize the window too.
When the problem occurs, using FVWM's Delete command to remove the window does not have any effect (so this is not just a display issue, the process is frozen in some state); FVWM's Destroy command works, but the "gnuplot -persist" process is still running. These commands are described as follows:
Delete
Sends a message to a window asking that it remove itself,
frequently causing the application to exit.
Destroy
Destroys an application window, which usually causes the
application to crash and burn.
My Debian bug report and various information:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1064982
Actually, the problem also occurs with gnuplot 5. It has actually appeared with the 1.52 pango library: downgrading the pango packages to 1.51 makes the problem disappear, both with gnuplot 5 and 6. Now, I don't know whether this is a bug in pango or a bug in gnuplot that is triggered only with the new version of pango (for instance, it might be possible that pango 1.52 is faster than 1.51, making the problem appear, since it seems to be a race condition).
I've identified the "problematic" commit in Pango:
https://gitlab.gnome.org/GNOME/pango/-/commit/89442dae443eba2aa0f0a526b4d6d39c0c9b13c6
The commit message is
Thus it is possible that this change makes Pango faster, so that the problem now appears in gnuplot (as slowing gnuplot down makes the problem disappear). In this case, this would not be a bug in Pango, just that the old Pango code was hiding the bug in gnuplot.
Just in case, I've also opened an issue in Pango:
https://gitlab.gnome.org/GNOME/pango/-/issues/784
This problem actually appears even without FVWM's ManualPlacement, i.e. if the window is displayed immediately. So I suppose that it is likely to occur with other window managers.
So, a simple command to reproduce the issue:
echo 'set terminal wxt; plot x' | gnuplot -persist
My machines do not yet have pango 1.52 so I cannot directly test this.
Can you get a stack trace from gnuplot while it is in the hung state?
For instance use
ps
to find the pid for the gnuplot process; connect to it withgdb -p <pid>
; typewhere
to generate a stack trace.I did not see this message, grrr... Here's the stack trace:
So it's hunging somewhere in the pango library.
I think I have found the cause of the issue: gnuplot does an
exit_group(0)
, which causes a Pango thread to terminate unexpectedly. Now, I don't know whether the is a bug in gnuplot, a bug in the Pango library, or an API breakage.If this is an API breakage from Pango, this is also an ABI breakage (as the behavior is modified in an incompatible way). I recall the bug I opened on the Pango side: https://gitlab.gnome.org/GNOME/pango/-/issues/784
To fix the bug in Gnuplot,
pango_cairo_font_map_set_default(NULL);
needs to be called before thefork()
, as advised in the Pango issue. See my patch for the gnuplot Debian package.I am very impressed by your effort to track this down and happy to apply the fix you found.
If I understand correctly, this is specifically an issue with cleanup of the main process when it is exiting but (a) the wxt terminal has been used in the session and (b) the
-persist
option is in effect so that calls into the wxt+cairo+pango libraries may be made by the daughter process that is forked to handle continued display of an open plot window.But I may not understand correctly, so maybe you can clarify something:
Your patch calls pango_cairo_font_map_set_default(NULL) before the fork, so it affects both the parent process and the daughter process. Is this intended and necessary, or is it only the parent process that needs this call? I would have guessed that since the daughter process will continue to use fonts it might be unhappy about this call or at any rate it might suddenly change the font used to display labels etc when the plot window is next refreshed. Is it possible that it would be better to move the call into the
else
section a few lines down so that only the parent process releases its font maps?I.e.
Or alternatively, if both the parent and daughter process need this before exiting maybe the call should be moved into the separate routine wxt_cleanup()? That way the parent process would release its font maps immediately and the daughter process would eventually release its maps when the window is closed but not immediately.
For what it's worth, when I compare the valgrind memory tool output before and after applying your fix I can see that the fix prevents eight chunks of "lost" memory associated with fontconfig at program exit. That makes sense. However there are still two chunks of lost memory showing even after applying the fix so maybe further cleanup is possible.
The advice to release the font map before the fork was from Matthias Clasen (see the Pango bug). In any case, I think that this release should be done at least for the child process. Otherwise, the child process would need the thread attached to the parent process (for the current font map), but this thread is killed when the parent process exits.
Some additional information... The
fork(2)
man page says: "The child process is created with a single thread—the one that calledfork()
." This means that the font-map thread created by Pango remains a thread of the parent process, and there is no duplicate thread for the child process. Said otherwise, the child process will communicate with the font-map thread (via a queue) of the parent process, which is probably fine... until this thread is terminated by theexit()
of the parent process. Callingpango_cairo_font_map_set_default(NULL)
in the child process ensures that the old font map will not be used; otherwise, the child process may wait for data in the queue, but if the thread has already been terminated, such data will never come, hence the frozen process I'm seeing. Callingpango_cairo_font_map_set_default(NULL)
in the parent process (after the fork) is probably useless, except to ensure that memory is released for tools like valgrind. Anyway, calling it before the fork will completely release the associated memory (for both the parent and future child), so that this may be the best thing.BTW, the
pthread_atfork(3)
man page mentions issues with mutexes when usingfork()
, which is probably the issue that occurs here.Got it. Thanks again for your hard work in tracking this down. Applied to the development branch and will be in the upcoming 6.0.2 release.