#339 Subscribing to some drivers causes FLTK/OpenGL badness

open
stage (111)
5
2012-12-16
2010-08-03
Anonymous
No

The problem is that in a Stage simulation subscribing to specific
drivers (Wavefront and ND specifically) causes a whole lot of pain,
either X errors, lockups or segfaults. The exact symptoms vary between
systems so I'll go through the permutations one at a time.

I first saw the problem on Ubuntu 10.04 with the fglrx graphics driver
where subscribing to a Wavefront device (which was in turn wired to the
Stage position2d driver) caused a Segfault. Using gdb I captured a
backtrace of this Segfault and saw that the call chain ended up inside
the fglrx binary (full backtrace attached as
bt-10.04-fglrx-sigsegv.log). That backtrace showed the line which
caused the offending call in to the graphics driver was canvas.cc::787,
I commented that line out and instead of a segfault the program locked
solid. The backtrace of the lockup is attached as
bt-10.04-fglrx-lockup.log, it can be seen to be a mutex deadlock. I
therefore blamed the fglrx driver and installed 10.04 in a VirtualBox VM
which of course uses a different, non-accelerated, driver.

In that VM it no longer segfaulted but Stage froze solid (with the
canvas in mid-render) and the screen was bombarded with errors from
within X. Once again using GDB I got a backtrace of the site of these
errors, attached as bt-10.04-vb.log with a sample of the player/stage
output attached as output-10.04-vb.log. Commenting out canvas.cc::787
as above lead to the same deadlock but of course coming from a different
graphics driver.

Wondering why there were so few reports of this and noting that all the
reports I could see were on Ubuntu 10.04 I installed Ubuntu 9.04 in the
VM instead. This largely worked, the failing Wavefront driver came good
when connected straight to the Stage position2d device. When it was
connected through 'nd' though the problem came back intermittently. 50%
of the time the program would run fine and the other 50% of the time it
would throw the same kinds of X errors as on 10.04. This output and a
backtrace of the source of the output are attached as output-9.04-vb.log
and bt-9.04-vb.log respectively. Worth noting here is that if those
errors aren't thrown immediately upon subscription they never will be,
it either works or not. To me this indicates some kind of race.

It seems odd to me that this only happens if I've got the Wavefront or
ND devices subscribed to, I don't know in what way they're supposed to
be changing the display but what ever this method is could be the source
of the problem.

Discussion

  • Tarball containing all outputs and backtraces described in the body

     
    Attachments
  • Jan Schlüter
    Jan Schlüter
    2010-08-07

    I ran into the same problem, but did not find out nearly as much as you did. I installed Ubuntu 10.04 in a VirtualBox OSE VM including the guest additions (on a Ubuntu 10.04 host), then installed the latest SVN version of Player and the latest release of Stage. The simple demo world works perfectly, but with my own project I ran into segfaults, most reliably when I included the wavefront driver. I never experienced a freeze, as far as I remember, only segfaults or sometimes regular crashes.
    I was now going to try finding the source of the problem using valgrind's memcheck.
    If you find out anything new about the problem source or if you find an operating system for which it works, please let me know.

     
  • Jan Schlüter
    Jan Schlüter
    2010-08-08

    Update: I tried anew, this time using Ubuntu 10.04 in VirtualBox PUEL on a Windows XP host.
    When I don't install the guest additions, Stage works, but rendering is done in software. To achieve realtime simulation, I have to resize the Stage window to a very small size (either via the .world file or *after* connecting the client, otherwise I get a bunch of X errors and no rendering). Very often, Stage crashes or segfaults when I disconnect the client. Using valgrind to investigate the problem didn't work, as it slowed down player so much that it could not react to client connection attempts in time any more.

    When installing the guest additions, but disabling 3D acceleration in the VM settings, I get the same behavior, except that Player/Stage output an "OpenGL Warning: Failed to connect to host. Make sure 3D acceleration is enabled for this VM.", which is not surprising.

    When installing the guest additions and enabling 3D acceleration, Player/Stage displays an "OpenGL Warning: No pincher, please call crStateSetCurrentPointers() in your SPU". I can achieve realtime simulation no matter what window size, but Player/Stage segfaults as soon as the client subscribes to the wavefront driver. Valgrind gives out a bunch of warnings about uninitialized values, leaked memory and locates the segmentation fault in the VirtualBox guest additions.

    As I don't think I could easily fix the segfault, I will just go with the second setup for now. I can cope with the small Stage window and Stage crashing on disconnecting the client even saves me a few clicks.

     
  • Jan Schlüter
    Jan Schlüter
    2010-09-06

    Update for those not following the discussion on playerstage-users@sf.net:
    Daniel Dube found that at least one of the segfaults is caused by race conditions in FLTK when Fl::wait() in stage/libstageplugin/p_driver.cc:581 is called by a different thread than the Stage thread. This happens, for example, when wavefront calls Device::Request with the parameter threaded set to false, explaining why the problem occurred especially when using wavefront. To fix that, change player/server/drivers/planner/wavefront/wavefront.cc, lines 1372 and 1383 from "[...] NULL, false) [...]" to "[...] NULL, true) [...]".
    This improved the situation a lot, but still gave segfaults or X errors from time to time. As a more drastic change, I changed the "if(threaded)" check in player/libplayercore/device.cc:282 to "if(1)", effectively forcing all calls to Device::Request to threaded=true. As far as I understand, this may cause deadlocks for non-threaded drivers. However, I did not encounter any deadlocks nor segfaults on startup with this hack, suggesting that there are further threaded drivers apart from wavefront that do not set the threaded parameter correctly -- a real fix would mean finding and patching all those drivers or implementing a dispatcher within Stage that manages access to FLTK, as Daniel Dube suggested.

    On a side note, I set up a minimal Ubuntu 10.04 machine with LXDE and no 3D accelerated drivers, i.e. using the Mesa Software Rasterizer, and with this setup Player/Stage works perfectly. It does not even segfault when I close the Stage window or press CTRL+C in the Stage terminal, which otherwise happened consistently. This suggests that all the stability issues encountered are related to FLTK/OpenGL, and possibly due to race conditions as well, but rooted in other parts of the code than mentioned above.

     
  • zLvB4r <a href="http://pryjwnbwbfan.com/">pryjwnbwbfan</a>, [url=http://ymcgtgnymgud.com/]ymcgtgnymgud[/url], [link=http://jxgphhagmxqs.com/]jxgphhagmxqs[/link], http://upihdwtjpbnd.com/