From: Nathan K. <nat...@sp...> - 2014-11-20 23:28:47
|
A couple quick and rough notes: 1. Deadlock I was recently playing with the new XCB support for QT5 and found an issue. The current xcb event handler tries to query an atom on the same connection the event was polled from. Modern libX11 (at least what was on my Ubuntu 14.04 box) implements XPending like: lock connection xcb_poll_event_etc. vgl event handler tries to query atom which attempts to get the connection mutex, deadlocks. How I repoed: - run fluxbox or Unity (probably any WM would do) - run the QT5 hellogl example app - click on the GL window I worked around this by querying the relevant atoms at XOpenDisplay time (storing them along with the xcb_connection_t* -> Display * hash map). I don't attach a patch because I'd backported the XCB code to my pre-refactor-everything 2.3.4 branch and made the patch there. If interested I can post it. Otherwise I'll try and get time to make a patch for trunk eventually, if I'm not beaten to it. 2. Dynamic (link) XCB support I really hated the idea of shipping two versions of the gl faker. (One with XCB support, one without), so I implemented XCB support dynamically, so it can build and run on e.g. RHEL5 or earlier. (RHEL5 can yum install xcb libs, but they aren't installed by default, in my observation, and I don't want to break existing installations) If interested I can post that patch also. Ideally this would wait to load the xcb libs till it sees an xcb symbol that needs faking used. Unfortunately we need the xcb_connection_t->Display hash map built from the beginning so for now I'm unconditionally loading the xcb symbols into the app symbol namespace. Hmmm, I suppose that could be worked around by building a Display hash list that's used to generate the xcb conn hash on first xcb use. I'm curious if you had any thoughts on this when you implemented. -Nathan |
From: DRC <dco...@us...> - 2014-11-21 04:34:48
|
Thoughts, yes, but this was a situation in which a company sponsored the work in order to fix a specific application issue. There was a limited budget, so I was attempting to solve the problem in the most expedient way possible. I also didn't want to introduce a dynamic loading mechanism because VGL 2.4 is already in beta. Thus, this feature needs to remain isolated for the moment in order to avoid regression. I would be interested in introducing that mechanism in 2.5, however. I am definitely also interested in fixing the deadlock. Please post both patches to the patch tracker. > On Nov 20, 2014, at 5:28 PM, Nathan Kidd <nat...@sp...> wrote: > > A couple quick and rough notes: > > > 1. Deadlock > > I was recently playing with the new XCB support for QT5 and found an issue. > > The current xcb event handler tries to query an atom on the same > connection the event was polled from. > > Modern libX11 (at least what was on my Ubuntu 14.04 box) implements > XPending like: > > lock connection > xcb_poll_event_etc. > vgl event handler tries to query atom which attempts to get the > connection mutex, deadlocks. > > How I repoed: > - run fluxbox or Unity (probably any WM would do) > - run the QT5 hellogl example app > - click on the GL window > > I worked around this by querying the relevant atoms at XOpenDisplay time > (storing them along with the xcb_connection_t* -> Display * hash map). > > I don't attach a patch because I'd backported the XCB code to my > pre-refactor-everything 2.3.4 branch and made the patch there. > > If interested I can post it. Otherwise I'll try and get time to make a > patch for trunk eventually, if I'm not beaten to it. > > 2. Dynamic (link) XCB support > > I really hated the idea of shipping two versions of the gl faker. (One > with XCB support, one without), so I implemented XCB support > dynamically, so it can build and run on e.g. RHEL5 or earlier. (RHEL5 > can yum install xcb libs, but they aren't installed by default, in my > observation, and I don't want to break existing installations) > > If interested I can post that patch also. Ideally this would wait to > load the xcb libs till it sees an xcb symbol that needs faking used. > Unfortunately we need the xcb_connection_t->Display hash map built from > the beginning so for now I'm unconditionally loading the xcb symbols > into the app symbol namespace. Hmmm, I suppose that could be worked > around by building a Display hash list that's used to generate the xcb > conn hash on first xcb use. > > I'm curious if you had any thoughts on this when you implemented. > > > -Nathan > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk > _______________________________________________ > VirtualGL-Devel mailing list > Vir...@li... > https://lists.sourceforge.net/lists/listinfo/virtualgl-devel |
From: Nathan K. <nat...@sp...> - 2014-11-21 16:20:59
|
On 20/11/14 11:34 PM, DRC wrote: > Thoughts, yes, but this was a situation in which a company sponsored the work in order to fix a specific application issue. There was a limited budget, so I was attempting to solve the problem in the most expedient way possible. I can certainly identify with time-perfection tradeoff constraints. Think charitably of me being under similar pressures when you look at those patches. :) > I also didn't want to introduce a dynamic loading mechanism because VGL 2.4 is already in beta. Thus, this feature needs to remain isolated for the moment in order to avoid regression. I would be interested in introducing that mechanism in 2.5, however. I am definitely also interested in fixing the deadlock. > > Please post both patches to the patch tracker. Done. Those work for me. They have not gone through QA at all yet. -Nathan |
From: DRC <dco...@us...> - 2014-11-23 18:58:32
|
I reproduced the deadlock using my build of Qt 5.3.1 on RHEL 6 and checked in a patch to trunk that addresses the issue, but I'm still tripping up on exactly how that deadlock is occurring. Qt5 doesn't ever call XPending(). It uses the XCB event handling functions directly. On 11/20/14 5:28 PM, Nathan Kidd wrote: > A couple quick and rough notes: > > > 1. Deadlock > > I was recently playing with the new XCB support for QT5 and found an issue. > > The current xcb event handler tries to query an atom on the same > connection the event was polled from. > > Modern libX11 (at least what was on my Ubuntu 14.04 box) implements > XPending like: > > lock connection > xcb_poll_event_etc. > vgl event handler tries to query atom which attempts to get the > connection mutex, deadlocks. > > How I repoed: > - run fluxbox or Unity (probably any WM would do) > - run the QT5 hellogl example app > - click on the GL window > > I worked around this by querying the relevant atoms at XOpenDisplay time > (storing them along with the xcb_connection_t* -> Display * hash map). > > I don't attach a patch because I'd backported the XCB code to my > pre-refactor-everything 2.3.4 branch and made the patch there. > > If interested I can post it. Otherwise I'll try and get time to make a > patch for trunk eventually, if I'm not beaten to it. > > 2. Dynamic (link) XCB support > > I really hated the idea of shipping two versions of the gl faker. (One > with XCB support, one without), so I implemented XCB support > dynamically, so it can build and run on e.g. RHEL5 or earlier. (RHEL5 > can yum install xcb libs, but they aren't installed by default, in my > observation, and I don't want to break existing installations) > > If interested I can post that patch also. Ideally this would wait to > load the xcb libs till it sees an xcb symbol that needs faking used. > Unfortunately we need the xcb_connection_t->Display hash map built from > the beginning so for now I'm unconditionally loading the xcb symbols > into the app symbol namespace. Hmmm, I suppose that could be worked > around by building a Display hash list that's used to generate the xcb > conn hash on first xcb use. > > I'm curious if you had any thoughts on this when you implemented. > > > -Nathan > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk > _______________________________________________ > VirtualGL-Devel mailing list > Vir...@li... > https://lists.sourceforge.net/lists/listinfo/virtualgl-devel > |
From: DRC <dco...@us...> - 2014-11-23 19:45:14
|
Never mind. I put it in a debugger and figured out exactly what was happening. XNextEvent() is being called by glib, within the body of QEventDispatcherGlib::processEvents(). XNextEvent() locks the display, then calls xcb_wait_for_event(), which was previously querying the atom on the same display connection and causing the deadlock. On 11/23/14 12:58 PM, DRC wrote: > I reproduced the deadlock using my build of Qt 5.3.1 on RHEL 6 and > checked in a patch to trunk that addresses the issue, but I'm still > tripping up on exactly how that deadlock is occurring. Qt5 doesn't ever > call XPending(). It uses the XCB event handling functions directly. > > > On 11/20/14 5:28 PM, Nathan Kidd wrote: >> A couple quick and rough notes: >> >> >> 1. Deadlock >> >> I was recently playing with the new XCB support for QT5 and found an >> issue. >> >> The current xcb event handler tries to query an atom on the same >> connection the event was polled from. >> >> Modern libX11 (at least what was on my Ubuntu 14.04 box) implements >> XPending like: >> >> lock connection >> xcb_poll_event_etc. >> vgl event handler tries to query atom which attempts to get the >> connection mutex, deadlocks. >> >> How I repoed: >> - run fluxbox or Unity (probably any WM would do) >> - run the QT5 hellogl example app >> - click on the GL window >> >> I worked around this by querying the relevant atoms at XOpenDisplay time >> (storing them along with the xcb_connection_t* -> Display * hash map). >> >> I don't attach a patch because I'd backported the XCB code to my >> pre-refactor-everything 2.3.4 branch and made the patch there. >> >> If interested I can post it. Otherwise I'll try and get time to make a >> patch for trunk eventually, if I'm not beaten to it. >> >> 2. Dynamic (link) XCB support >> >> I really hated the idea of shipping two versions of the gl faker. (One >> with XCB support, one without), so I implemented XCB support >> dynamically, so it can build and run on e.g. RHEL5 or earlier. (RHEL5 >> can yum install xcb libs, but they aren't installed by default, in my >> observation, and I don't want to break existing installations) >> >> If interested I can post that patch also. Ideally this would wait to >> load the xcb libs till it sees an xcb symbol that needs faking used. >> Unfortunately we need the xcb_connection_t->Display hash map built from >> the beginning so for now I'm unconditionally loading the xcb symbols >> into the app symbol namespace. Hmmm, I suppose that could be worked >> around by building a Display hash list that's used to generate the xcb >> conn hash on first xcb use. >> >> I'm curious if you had any thoughts on this when you implemented. >> >> >> -Nathan >> >> ------------------------------------------------------------------------------ >> >> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >> from Actuate! Instantly Supercharge Your Business Reports and Dashboards >> with Interactivity, Sharing, Native Excel Exports, App Integration & more >> Get technology previously reserved for billion-dollar corporations, FREE >> http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk >> >> _______________________________________________ >> VirtualGL-Devel mailing list >> Vir...@li... >> https://lists.sourceforge.net/lists/listinfo/virtualgl-devel >> |
From: Nathan K. <nat...@sp...> - 2014-11-24 15:25:32
|
On 23/11/14 02:45 PM, DRC wrote: > Never mind. I put it in a debugger and figured out exactly what was > happening. XNextEvent() is being called by glib, within the body of > QEventDispatcherGlib::processEvents(). XNextEvent() locks the display, > then calls xcb_wait_for_event(), which was previously querying the atom > on the same display connection and causing the deadlock. On my system it really is XPending -> xcb_intern_atom that locks[1], which merely gives weight to your notion of "be cautious about xcb". -Nathan [1] #2 0x00007ffff5bd4480 in __GI___pthread_mutex_lock (mutex=0x6b8be0) at ../nptl/pthread_mutex_lock.c:79 #3 0x00007ffff4ad5b7a in _XInternalLockDisplay (dpy=0x6d0a00, wskip=0) at ../../src/locking.c:480 #4 0x00007ffff4ae7018 in return_socket (closure=0x6d0a00) at ../../src/xcb_io.c:52 #5 0x00007ffff02f6ef7 in get_socket_back (c=c@entry=0x6d1c50) at ../../src/xcb_out.c:96 #6 0x00007ffff02f74df in xcb_send_request (c=0x6d1c50, flags=flags@entry=1, vector=vector@entry=0x7fffffffdfb0, req=req@entry=0x7ffff03043a0 <xcb_req>) at ../../src/xcb_out.c:242 #7 0x00007ffff02fbe4c in xcb_intern_atom (c=<optimized out>, only_if_exists=<optimized out>, name_len=<optimized out>, name=<optimized out>) at xproto.c:3338 #8 0x00007ffff78cf386 in _xcb_intern_atom.constprop.54 () from libopentextglfaker.so.1 #9 0x00007ffff78cf469 in handleXCBEvent () from libopentextglfaker.so.1 #10 0x00007ffff78e13a3 in xcb_poll_for_event () from libopentextglfaker.so.1 #11 0x00007ffff4ae6bb8 in poll_for_event (dpy=dpy@entry=0x6d0a00) at ../../src/xcb_io.c:257 #12 0x00007ffff4ae6cfc in poll_for_response (dpy=dpy@entry=0x6d0a00) at ../../src/xcb_io.c:289 #13 0x00007ffff4ae6fcd in _XEventsQueued (dpy=dpy@entry=0x6d0a00, mode=mode@entry=2) at ../../src/xcb_io.c:363 #14 0x00007ffff4ad912d in XPending (dpy=0x6d0a00) at ../../src/Pending.c:55 #15 0x00007fffe86c8626 in ?? () from /usr/lib/x86_64-linux-gnu/libgdk-x11-2.0.so.0 #16 0x00007ffff438868d in g_main_context_prepare () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #17 0x00007ffff4388f03 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #18 0x00007ffff43890ec in g_main_context_iteration () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #19 0x00007ffff63e898c in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5 #20 0x00007ffff639a96b in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5 #21 0x00007ffff63a10e1 in QCoreApplication::exec() () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5 #22 0x00000000004051a1 in main () |
From: DRC <dco...@us...> - 2014-11-24 17:23:53
|
I think it might be prudent to enable it only with a 'vglrun +xcb' switch or something like that. I just predict that this is not the last we've heard of issues like this. > On Nov 24, 2014, at 9:26 AM, Nathan Kidd <nat...@sp...> wrote: > >> On 23/11/14 02:45 PM, DRC wrote: >> Never mind. I put it in a debugger and figured out exactly what was >> happening. XNextEvent() is being called by glib, within the body of >> QEventDispatcherGlib::processEvents(). XNextEvent() locks the display, >> then calls xcb_wait_for_event(), which was previously querying the atom >> on the same display connection and causing the deadlock. > > > On my system it really is XPending -> xcb_intern_atom that locks[1], > which merely gives weight to your notion of "be cautious about xcb". > > -Nathan > > [1] > #2 0x00007ffff5bd4480 in __GI___pthread_mutex_lock (mutex=0x6b8be0) at > ../nptl/pthread_mutex_lock.c:79 > #3 0x00007ffff4ad5b7a in _XInternalLockDisplay (dpy=0x6d0a00, wskip=0) > at ../../src/locking.c:480 > #4 0x00007ffff4ae7018 in return_socket (closure=0x6d0a00) at > ../../src/xcb_io.c:52 > #5 0x00007ffff02f6ef7 in get_socket_back (c=c@entry=0x6d1c50) at > ../../src/xcb_out.c:96 > #6 0x00007ffff02f74df in xcb_send_request (c=0x6d1c50, > flags=flags@entry=1, vector=vector@entry=0x7fffffffdfb0, > req=req@entry=0x7ffff03043a0 <xcb_req>) at ../../src/xcb_out.c:242 > #7 0x00007ffff02fbe4c in xcb_intern_atom (c=<optimized out>, > only_if_exists=<optimized out>, > name_len=<optimized out>, name=<optimized out>) at xproto.c:3338 > #8 0x00007ffff78cf386 in _xcb_intern_atom.constprop.54 () > from libopentextglfaker.so.1 > #9 0x00007ffff78cf469 in handleXCBEvent () > from libopentextglfaker.so.1 > #10 0x00007ffff78e13a3 in xcb_poll_for_event () > from libopentextglfaker.so.1 > #11 0x00007ffff4ae6bb8 in poll_for_event (dpy=dpy@entry=0x6d0a00) at > ../../src/xcb_io.c:257 > #12 0x00007ffff4ae6cfc in poll_for_response (dpy=dpy@entry=0x6d0a00) at > ../../src/xcb_io.c:289 > #13 0x00007ffff4ae6fcd in _XEventsQueued (dpy=dpy@entry=0x6d0a00, > mode=mode@entry=2) at ../../src/xcb_io.c:363 > #14 0x00007ffff4ad912d in XPending (dpy=0x6d0a00) at ../../src/Pending.c:55 > #15 0x00007fffe86c8626 in ?? () from > /usr/lib/x86_64-linux-gnu/libgdk-x11-2.0.so.0 > #16 0x00007ffff438868d in g_main_context_prepare () from > /lib/x86_64-linux-gnu/libglib-2.0.so.0 > #17 0x00007ffff4388f03 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 > #18 0x00007ffff43890ec in g_main_context_iteration () from > /lib/x86_64-linux-gnu/libglib-2.0.so.0 > #19 0x00007ffff63e898c in > QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) > () > from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5 > #20 0x00007ffff639a96b in > QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () > from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5 > #21 0x00007ffff63a10e1 in QCoreApplication::exec() () from > /usr/lib/x86_64-linux-gnu/libQt5Core.so.5 > #22 0x00000000004051a1 in main () > > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk > _______________________________________________ > VirtualGL-Devel mailing list > Vir...@li... > https://lists.sourceforge.net/lists/listinfo/virtualgl-devel |
From: Nathan K. <nat...@sp...> - 2014-11-26 03:31:52
|
On 24/11/14 12:23 PM, DRC wrote: > I think it might be prudent to enable it only with a 'vglrun +xcb' > switch or something like that. I just predict that this is not the > last we've heard of issues like this. Oooh, here's a nice one: xcb_get_extension_data doesn't just say "Yes I have GLX". It also returns the extension major opcode that xcb_send_request uses for the io vector index. So when the VGL_PROBEGLX code does its thing we get: #0 XGetXCBConnection (dpy=0x608620) at x11_xcb.c:9 #1 0x00007ffff78d23b6 in _XGetXCBConnection (dpy=0x608620) at /home/nathan/src/virtualgl/server/faker-sym.h:529 #2 0x00007ffff78e97e9 in xcb_get_extension_data (conn=0x61a3a0, ext=0x7ffff5b90050 <xcb_glx_id>) at /home/nathan/src/virtualgl/server/faker-xcb.cpp:149 #3 0x00007ffff55607d8 in xcb_send_request (c=c@entry=0x61a3a0, flags=flags@entry=1, vector=vector@entry=0x7fffffffda50, req=req@entry=0x7ffff5b8f950 <xcb_req.12371>) at xcb_out.c:177 #4 0x00007ffff59850b2 in xcb_glx_query_server_string (c=c@entry=0x61a3a0, screen=screen@entry=0, name=name@entry=2) at glx.c:2375 #5 0x00007ffff76a057e in __glXQueryServerString (dpy=dpy@entry=0x619150, opcode=<optimized out>, screen=screen@entry=0, name=name@entry=2) at glx_query.c:47 #6 0x00007ffff7682dd6 in AllocAndFetchScreenConfigs (priv=0x679dc0, dpy=0x619150) at glxext.c:764 #7 __glXInitialize (dpy=dpy@entry=0x619150) at glxext.c:879 #8 0x00007ffff767f71b in GetGLXPrivScreenConfig (dpy=0x619150, scrn=0, ppriv=ppriv@entry=0x7fffffffdb20, ppsc=ppsc@entry=0x7fffffffdb28) at glxcmds.c:174 #9 0x00007ffff767f7ab in GetGLXPrivScreenConfig (ppsc=0x7fffffffdb28, ppriv=0x7fffffffdb20, scrn=<optimized out>, dpy=<optimized out>) at glxcmds.c:170 #10 glXGetConfig (dpy=<optimized out>, vis=0x631630, attribute=5, value_return=0x63a1b8) at glxcmds.c:880 #11 0x00007ffff78fd498 in _glXGetConfig (dpy=0x619150, vis=0x631630, attrib=5, value=0x63a1b8) at /home/nathan/src/virtualgl/server/faker-sym.h:178 #12 0x00007ffff78fddf8 in buildVisAttribTable (dpy=0x619150, screen=0) at /home/nathan/src/virtualgl/server/glxvisual.cpp:131 #13 0x00007ffff78fee96 in __vglMatchVisual (dpy=0x619150, screen=0, depth=24, c_class=4, level=0, stereo=0, trans=0) at /home/nathan/src/virtualgl/server/glxvisual.cpp:354 #14 0x00007ffff78dc484 in glXChooseVisual (dpy=0x619150, screen=0, attrib_list=0x7fffffffdf00) at /home/nathan/src/virtualgl/server/faker-glx.cpp:424 #15 0x0000000000405909 in main (argc=2, argv=0x7fffffffe078) at /home/nathan/src/virtualgl/glxdemos/glxspheres.c:602 (gdb) c Continuing. [VGL] !!! Replaced xcb connection 0x61a3a0 with 0x609950 from xcb_get_extension_data Breakpoint 7, xcb_glx_query_server_string_string_length (R=R@entry=0x0) at glx.c:2448 2448 glx.c: No such file or directory. (gdb) c Continuing. Program received signal SIGSEGV, Segmentation fault. xcb_glx_query_server_string_string_length (R=R@entry=0x0) at glx.c:2448 I.e. xcb_glx_query_server_string fails because the data tries to send on wrong iovector, and glXQueryServerString isn't prepared to handle a failure. This is on an OpenSuSE 13.1 box with Mesa 9.x + Nouveau, and doesn't happen on modern Mesa or with NVIDIA's binary, but it demonstrates the pitfalls of hooking an API that is used by the underlying implementation, not just applications. Ways I can think of to work around this: 1) just disallow any non-3D-server GLX traffic, and loose the VGL_PROBEGLX benefits. Hmmm, this looks like it kills the possibility of stereo support, though I've never seen a stereo user. 2) Introduce a global (TLS-based should work) don't-fake-xcb-for-now flag that the VGL_PROBEGLX code sets and the XCB hooks honour. -Nathan |
From: DRC <dco...@us...> - 2014-11-26 03:46:44
|
TLS support is spotty on non-Linux platforms, unfortunately. Why couldn't we just use the existing XCB connection hash to maintain an "enabled" state for the XCB faker? Whenever an interposed GLX or X11 function is called, VGL will attempt to find the XCB connection corresponding to the Display* argument. If successful, VGL will set a "disable" bit for that connection, then re-enable it before returning from the interposed function. Then the interposed XCB functions can look up the connection in the hash and make sure it has interposing enabled before proceeding. On 11/25/14 9:31 PM, Nathan Kidd wrote: > On 24/11/14 12:23 PM, DRC wrote: >> I think it might be prudent to enable it only with a 'vglrun +xcb' >> switch or something like that. I just predict that this is not the >> last we've heard of issues like this. > > Oooh, here's a nice one: > > xcb_get_extension_data doesn't just say "Yes I have GLX". It also > returns the extension major opcode that xcb_send_request uses for the io > vector index. So when the VGL_PROBEGLX code does its thing we get: > > #0 XGetXCBConnection (dpy=0x608620) at x11_xcb.c:9 > #1 0x00007ffff78d23b6 in _XGetXCBConnection (dpy=0x608620) at > /home/nathan/src/virtualgl/server/faker-sym.h:529 > #2 0x00007ffff78e97e9 in xcb_get_extension_data (conn=0x61a3a0, > ext=0x7ffff5b90050 <xcb_glx_id>) > at /home/nathan/src/virtualgl/server/faker-xcb.cpp:149 > #3 0x00007ffff55607d8 in xcb_send_request (c=c@entry=0x61a3a0, > flags=flags@entry=1, > vector=vector@entry=0x7fffffffda50, req=req@entry=0x7ffff5b8f950 > <xcb_req.12371>) at xcb_out.c:177 > #4 0x00007ffff59850b2 in xcb_glx_query_server_string > (c=c@entry=0x61a3a0, screen=screen@entry=0, > name=name@entry=2) at glx.c:2375 > #5 0x00007ffff76a057e in __glXQueryServerString > (dpy=dpy@entry=0x619150, opcode=<optimized out>, > screen=screen@entry=0, name=name@entry=2) at glx_query.c:47 > #6 0x00007ffff7682dd6 in AllocAndFetchScreenConfigs (priv=0x679dc0, > dpy=0x619150) at glxext.c:764 > #7 __glXInitialize (dpy=dpy@entry=0x619150) at glxext.c:879 > #8 0x00007ffff767f71b in GetGLXPrivScreenConfig (dpy=0x619150, scrn=0, > ppriv=ppriv@entry=0x7fffffffdb20, > ppsc=ppsc@entry=0x7fffffffdb28) at glxcmds.c:174 > #9 0x00007ffff767f7ab in GetGLXPrivScreenConfig (ppsc=0x7fffffffdb28, > ppriv=0x7fffffffdb20, > scrn=<optimized out>, dpy=<optimized out>) at glxcmds.c:170 > #10 glXGetConfig (dpy=<optimized out>, vis=0x631630, attribute=5, > value_return=0x63a1b8) at glxcmds.c:880 > #11 0x00007ffff78fd498 in _glXGetConfig (dpy=0x619150, vis=0x631630, > attrib=5, value=0x63a1b8) > at /home/nathan/src/virtualgl/server/faker-sym.h:178 > #12 0x00007ffff78fddf8 in buildVisAttribTable (dpy=0x619150, screen=0) > at /home/nathan/src/virtualgl/server/glxvisual.cpp:131 > #13 0x00007ffff78fee96 in __vglMatchVisual (dpy=0x619150, screen=0, > depth=24, c_class=4, level=0, stereo=0, > trans=0) at /home/nathan/src/virtualgl/server/glxvisual.cpp:354 > #14 0x00007ffff78dc484 in glXChooseVisual (dpy=0x619150, screen=0, > attrib_list=0x7fffffffdf00) > at /home/nathan/src/virtualgl/server/faker-glx.cpp:424 > #15 0x0000000000405909 in main (argc=2, argv=0x7fffffffe078) > at /home/nathan/src/virtualgl/glxdemos/glxspheres.c:602 > (gdb) c > Continuing. > [VGL] !!! Replaced xcb connection 0x61a3a0 with 0x609950 from > xcb_get_extension_data > Breakpoint 7, xcb_glx_query_server_string_string_length (R=R@entry=0x0) > at glx.c:2448 > 2448 glx.c: No such file or directory. > (gdb) c > Continuing. > Program received signal SIGSEGV, Segmentation fault. > xcb_glx_query_server_string_string_length (R=R@entry=0x0) at glx.c:2448 > > I.e. xcb_glx_query_server_string fails because the data tries to send > on wrong iovector, and glXQueryServerString isn't prepared to handle a > failure. > > This is on an OpenSuSE 13.1 box with Mesa 9.x + Nouveau, and doesn't > happen on modern Mesa or with NVIDIA's binary, but it demonstrates the > pitfalls of hooking an API that is used by the underlying > implementation, not just applications. > > Ways I can think of to work around this: > > 1) just disallow any non-3D-server GLX traffic, and loose the > VGL_PROBEGLX benefits. Hmmm, this looks like it kills the possibility > of stereo support, though I've never seen a stereo user. > > 2) Introduce a global (TLS-based should work) don't-fake-xcb-for-now > flag that the VGL_PROBEGLX code sets and the XCB hooks honour. > > > -Nathan > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk > _______________________________________________ > VirtualGL-Devel mailing list > Vir...@li... > https://lists.sourceforge.net/lists/listinfo/virtualgl-devel > |
From: Nathan K. <nat...@sp...> - 2014-11-26 15:21:04
|
On 25/11/14 10:46 PM, DRC wrote: > TLS support is spotty on non-Linux platforms, unfortunately. What platforms/issues do you have in mind? > Why couldn't we just use the existing XCB connection hash to maintain an > "enabled" state for the XCB faker? Whenever an interposed GLX or X11 > function is called, VGL will attempt to find the XCB connection > corresponding to the Display* argument. If successful, VGL will set a > "disable" bit for that connection, then re-enable it before returning > from the interposed function. Then the interposed XCB functions can > look up the connection in the hash and make sure it has interposing > enabled before proceeding. Consider this sequence: thread1: xcb_get_extension_data(conn1) get_already_faked(conn1) != true set_already_faked(conn1, true) run on faked connection thread2: xcb_get_extension_data(conn1) get_already_faked(conn1) == true run on original connection thread1: set_already_faked(conn1, false) return Without VGL the connection mutex inside the real xcb_get_extension_data would block so this code would work. But the above method in VGL with a per-connection instead of per-thread flag will give incorrect results, right? (Both expect to be faked, but only one is.) If we added a mutex to the faker function so we get the same blocking behaviour that the underlying implementation relies on then we'd have to store the current thread id along with the "already faked" flag and use e.g. pthread_mutex_trylock first so reentrant calls on the same thread wouldn't block. That's so ugly I don't want to think about it. TLS seems really tempting. -Nathan |
From: DRC <dco...@us...> - 2014-11-26 18:44:13
|
On 11/26/14 9:20 AM, Nathan Kidd wrote: > On 25/11/14 10:46 PM, DRC wrote: >> TLS support is spotty on non-Linux platforms, unfortunately. > > What platforms/issues do you have in mind? Traditionally, TLS wasn't reliable on non-x86 BSD implementations and Solaris, but that may be old information. Solaris 10 is still supported by VGL and will be as long as there is driver support for it, so that's the platform I would have to test to make sure TLS works properly. Bear in mind that, until January of 2009, I had to support Solaris 8 and SPARC platforms as well, so it's likely that whenever my brain jumps to "something doesn't work on non-Linux platforms", it's because it didn't work on Solaris 8 or 9. >> Why couldn't we just use the existing XCB connection hash to maintain an >> "enabled" state for the XCB faker? Whenever an interposed GLX or X11 >> function is called, VGL will attempt to find the XCB connection >> corresponding to the Display* argument. If successful, VGL will set a >> "disable" bit for that connection, then re-enable it before returning >> from the interposed function. Then the interposed XCB functions can >> look up the connection in the hash and make sure it has interposing >> enabled before proceeding. > > Consider this sequence: > > thread1: xcb_get_extension_data(conn1) > get_already_faked(conn1) != true > set_already_faked(conn1, true) > run on faked connection > > thread2: xcb_get_extension_data(conn1) > get_already_faked(conn1) == true > run on original connection > > thread1: > set_already_faked(conn1, false) > return I see your point. set_already_faked() would only be called from within Xlib and GLX functions, so your example above is partially bogus, but you could have something like this: thread1: XlibFunction(dpy, ...) { conn=XGetXCBConnection(dpy); disable_xcb_faker(conn); _XlibFunction(dpy, ...); xcb_function(conn, ...); enable_xcb_faker(conn); } thread2: XlibFunction(dpy, ...) { conn=XGetXCBConnection(dpy); disable_xcb_faker(conn); _XlibFunction(dpy, ...); xcb_function(conn, ...); enable_xcb_faker(conn); } disable_xcb_faker()/enable_xcb_faker() would be atomic, but it's theoretically possible that the entire sequence from thread2 could execute through enable_xcb_faker() between the time that thread1 calls disable_xcb_faker() and _XlibFunction() -- i.e. a race condition. So I concur that TLS is probably the answer. |
From: Nathan K. <nat...@sp...> - 2014-11-27 02:39:03
|
On 26/11/14 01:43 PM, DRC wrote: > On 11/26/14 9:20 AM, Nathan Kidd wrote: >> Consider this sequence: >> >> thread1: xcb_get_extension_data(conn1) >> get_already_faked(conn1) != true >> set_already_faked(conn1, true) >> run on faked connection >> >> thread2: xcb_get_extension_data(conn1) >> get_already_faked(conn1) == true >> run on original connection >> >> thread1: >> set_already_faked(conn1, false) >> return > > I see your point. set_already_faked() would only be called from within > Xlib and GLX functions, so your example above is partially bogus, but > you could have something like this: Ok, xcb_get_extension_data wasn't a good choice, but replace it with xcb_glx_get_server_string() (which will eventually call xcb_get_extension_data via xcb_send_request); the same issue could occur. Eventually any xcb API that fakes the connection will also need set_already_faked(). Incidentally, my current code works around the whole xcb_get_extension_data issue by simply not faking it at all. Exceed onDemand and Exceed Turbo VA X always support GLX (unless you explicitly turn it off), so there's no pressing need to pretend it exists. -Nathan |
From: Nathan K. <nat...@sp...> - 2014-11-27 02:52:44
|
On 26/11/14 09:38 PM, Nathan Kidd wrote: > Incidentally, my current code works around the whole > xcb_get_extension_data issue by simply not faking it at all. Exceed > onDemand and Exceed Turbo VA X always support GLX (unless you > explicitly turn it off), so there's no pressing need to pretend it exists. A work-around for other proxies would be to make the faked function query the asked-for connection first, and only supply fake information if the extension is not found. All the 2D-Xserver GLX queries should be checking if GLX is available anyway, and would not occur if the extension isn't there. I think this change would push the "we need a TLS flag" issue into, if not theoretical, at least unnecessary-for-now territory. -Nathan |
From: Nathan K. <nat...@sp...> - 2014-11-27 03:59:20
|
On 26/11/14 09:52 PM, Nathan Kidd wrote: > On 26/11/14 09:38 PM, Nathan Kidd wrote: >> Incidentally, my current code works around the whole >> xcb_get_extension_data issue by simply not faking it at all. Exceed >> onDemand and Exceed Turbo VA X always support GLX (unless you >> explicitly turn it off), so there's no pressing need to pretend it exists. > > A work-around for other proxies would be to make the faked function > query the asked-for connection first, and only supply fake information > if the extension is not found. All the 2D-Xserver GLX queries should be > checking if GLX is available anyway, and would not occur if the > extension isn't there. I think this change would push the "we need a > TLS flag" issue into, if not theoretical, at least unnecessary-for-now > territory. And the patch that does this attached. -Nathan |
From: DRC <dco...@us...> - 2014-11-27 05:32:44
|
On 11/26/14 8:52 PM, Nathan Kidd wrote: > A work-around for other proxies would be to make the faked function > query the asked-for connection first, and only supply fake information > if the extension is not found. All the 2D-Xserver GLX queries should be > checking if GLX is available anyway, and would not occur if the > extension isn't there. I think this change would push the "we need a > TLS flag" issue into, if not theoretical, at least unnecessary-for-now > territory. OK, now you lost me. How would your patch prevent the issue from occurring in an X proxy like TurboVNC that lacks GLX? It's different with EoD, because your GLX implementation actually does something useful (implements client-side OpenGL rendering instead of server-side.) With the new (Xorg 7.7-based) TurboVNC implementation (the TurboVNC 2.0 evolving pre-release), it is technically feasible to add a GLX extension, but I still don't want to do it for the following reasons: -- TurboVNC has always been designed to work with VirtualGL, so philosophically, I don't admit that a software-only GLX extension is very useful. -- There is an easy workaround for the small percentage of users who might need to use TurboVNC with software OpenGL: http://www.turbovnc.org/Documentation/Mesa -- Implementing and maintaining a Mesa-based GLX extension is a pain. It's less painful for TigerVNC, because they can leverage the pre-compiled Mesa libraries and DRI modules from a specific operating system (but of course, that prevents the build from working on other operating systems.) One of the underlying principles of The VirtualGL Project has always been to avoid maintaining an OpenGL implementation. It is one of the few things that has kept me sane over the last 10 years. -- Not having a GLX extension serves as an important failsafe for VirtualGL. That way, if VGL fails, it's pretty obvious that it is failing, because OpenGL simply won't work. Otherwise, it can produce some hard-to-diagnose issues in which VGL appears to be working but things are just running really slowly. The problem is: performance issues are less likely to be reported as bugs, so this could cause some serious issues to remain unreported, whereas otherwise they would manifest as obvious bugs. |
From: Nathan K. <nat...@sp...> - 2014-11-27 15:33:42
|
On 27/11/14 12:32 AM, DRC wrote: > On 11/26/14 8:52 PM, Nathan Kidd wrote: >> A work-around for other proxies would be to make the faked function >> query the asked-for connection first, and only supply fake information >> if the extension is not found. All the 2D-Xserver GLX queries should be >> checking if GLX is available anyway, and would not occur if the >> extension isn't there. I think this change would push the "we need a >> TLS flag" issue into, if not theoretical, at least unnecessary-for-now >> territory. > > OK, now you lost me. How would your patch prevent the issue from > occurring in an X proxy like TurboVNC that lacks GLX? Because proxies without GLX will never make the failing queries in the first place. static void buildVisAttribTable(Display *dpy, int screen) _XQueryExtension(dpy, "GLX"...) would have failed, so no __glXGetConfig calls. And in my observation, if((atom=XInternAtom(dpy, "SERVER_OVERLAY_VISUALS", True))!=None) would also fail, so no pass through overlays are possible. (Though I think it would be at least theoretically safer to condition it on GLX also being available .) Because an unknown number of xcb functions that may already be hooked may be called from any other glX/X11/xcb functions, causing the "double fake" situation, I think it is still wise in the long term to pursue the TLS-based flag we discussed before. I don't have a huge fire under me to do it right now, however, since the issue is back to theoretical. > -- Not having a GLX extension serves as an important failsafe for > VirtualGL. That way, if VGL fails, it's pretty obvious that it is > failing, because OpenGL simply won't work. Otherwise, it can produce > some hard-to-diagnose issues in which VGL appears to be working but > things are just running really slowly. The problem is: performance > issues are less likely to be reported as bugs, so this could cause some > serious issues to remain unreported, whereas otherwise they would > manifest as obvious bugs. Tell me about it. +logo has been a watch word around here for a long time. -Nathan |
From: Nathan K. <nat...@sp...> - 2014-11-27 21:53:54
|
On 27/11/14 10:33 AM, Nathan Kidd wrote: > On 27/11/14 12:32 AM, DRC wrote: >> On 11/26/14 8:52 PM, Nathan Kidd wrote: >>> A work-around for other proxies would be to make the faked function >>> query the asked-for connection first, and only supply fake information >>> if the extension is not found. All the 2D-Xserver GLX queries should be >>> checking if GLX is available anyway, and would not occur if the >>> extension isn't there. I think this change would push the "we need a >>> TLS flag" issue into, if not theoretical, at least unnecessary-for-now >>> territory. >> >> OK, now you lost me. How would your patch prevent the issue from >> occurring in an X proxy like TurboVNC that lacks GLX? > > Because proxies without GLX will never make the failing queries in the > first place. > > static void buildVisAttribTable(Display *dpy, int screen) > _XQueryExtension(dpy, "GLX"...) Ugh, doh. The first thing _XQueryExtension is going to do is call xcb_get_extension_data().... so I lost you because I was in the weeds. Please disregard this idea; it won't work for TurboVNC or any sans-GLX proxy. -Nathan |
From: Nathan K. <nat...@sp...> - 2014-11-26 15:47:10
|
On 25/11/14 10:31 PM, Nathan Kidd wrote: > Ways I can think of to work around this: > > 1) just disallow any non-3D-server GLX traffic, and loose the > VGL_PROBEGLX benefits. Hmmm, this looks like it kills the possibility > of stereo support, though I've never seen a stereo user. > > 2) Introduce a global (TLS-based should work) don't-fake-xcb-for-now > flag that the VGL_PROBEGLX code sets and the XCB hooks honour. Simplifying a little, if VGL is preloaded then there is no way for an application to ever use the 2D server for GLX, apart from rolling their own protocol which isn't supported anyway. There is just one specific instance, the VGL_PROBEGLX code, that will ever use GLX on the 2D server. If we called buildVisAttribTable() from XOpenDisplay (and later, if more support is added, from xcb_connect()) then we could use a per-connection flag like you suggested, and we'd be guaranteed to be thread safe since we didn't return the connection/Display handle yet. Did I miss something? -Nathan |
From: Nathan K. <nat...@sp...> - 2014-11-26 16:21:39
|
On 26/11/14 10:47 AM, Nathan Kidd wrote: > On 25/11/14 10:31 PM, Nathan Kidd wrote: >> Ways I can think of to work around this: >> >> 1) just disallow any non-3D-server GLX traffic, and loose the >> VGL_PROBEGLX benefits. Hmmm, this looks like it kills the possibility >> of stereo support, though I've never seen a stereo user. >> >> 2) Introduce a global (TLS-based should work) don't-fake-xcb-for-now >> flag that the VGL_PROBEGLX code sets and the XCB hooks honour. > > Simplifying a little, if VGL is preloaded then there is no way for an > application to ever use the 2D server for GLX, apart from rolling their > own protocol which isn't supported anyway. > > There is just one specific instance, the VGL_PROBEGLX code, that will > ever use GLX on the 2D server. If we called buildVisAttribTable() from > XOpenDisplay (and later, if more support is added, from xcb_connect()) > then we could use a per-connection flag like you suggested, and we'd be > guaranteed to be thread safe since we didn't return the > connection/Display handle yet. > > Did I miss something? Ugh, overlays. Unless we were willing to postulate that the intersection between overlay-using apps and xcb-using apps is approaching nil, and say "VGL_XCB=0 is required for overlay support". -Nathan -- Nathan Kidd OpenText Connectivity Solutions nk...@op... Software Developer http://connectivity.opentext.com +1 905-762-6001 |
From: DRC <dco...@us...> - 2014-11-26 18:48:16
|
On 11/26/14 9:47 AM, Nathan Kidd wrote: >> 1) just disallow any non-3D-server GLX traffic, and loose the >> VGL_PROBEGLX benefits. Hmmm, this looks like it kills the possibility >> of stereo support, though I've never seen a stereo user. >> >> 2) Introduce a global (TLS-based should work) don't-fake-xcb-for-now >> flag that the VGL_PROBEGLX code sets and the XCB hooks honour. > > Simplifying a little, if VGL is preloaded then there is no way for an > application to ever use the 2D server for GLX, apart from rolling their > own protocol which isn't supported anyway. > > There is just one specific instance, the VGL_PROBEGLX code, that will > ever use GLX on the 2D server. If we called buildVisAttribTable() from > XOpenDisplay (and later, if more support is added, from xcb_connect()) > then we could use a per-connection flag like you suggested, and we'd be > guaranteed to be thread safe since we didn't return the > connection/Display handle yet. > > Did I miss something? Overlays, as you pointed out, but also, the table that buildVisAttribTable() builds is screen-dependent. If the application is switching screens, then the attribute table has to be rebuilt every time the switch happens. NOTE: This is really not a very clean solution. It would be better to cache a separate table for each active screen. But in either case, that's why it isn't prudent to build the table within the body of XOpenDisplay(). |
From: DRC <dco...@us...> - 2014-12-03 01:34:52
|
Please check out the code in SVN trunk and see if it corrects this issue. VGL now aggressively disables its XCB interposer whenever any "real" XCB, X11, or GLX symbol is called, so that should (unless I miss my guess) guard against any instances of "double interposing." Within VirtualGL, there are basically three versions of an interposed function: function() -- the interposed version _function() -- a wrapper that verifies that the "real" function symbol exists and is loaded, then invokes the "real" function __function() -- the "real" function I modified the wrappers such that they increment a thread-local counter variable (fakerLevel) before calling the "real" function and decrement the same variable whenever the "real" function returns. All of the interposed XCB functions will now pass through to the "real" XCB function, without modification, whenever fakerLevel is non-zero. Unless I miss my guess, this should prevent the XCB interposer from activating unless an application explicitly calls one of the interposed XCB functions. Additionally, VGL has been modified such that you now have to explicitly enable XCB interposition by setting VGL_FAKEXCB=1 or passing +xcb to vglrun. On 11/25/14 9:31 PM, Nathan Kidd wrote: > On 24/11/14 12:23 PM, DRC wrote: >> I think it might be prudent to enable it only with a 'vglrun +xcb' >> switch or something like that. I just predict that this is not the >> last we've heard of issues like this. > > Oooh, here's a nice one: > > xcb_get_extension_data doesn't just say "Yes I have GLX". It also > returns the extension major opcode that xcb_send_request uses for the io > vector index. So when the VGL_PROBEGLX code does its thing we get: > > #0 XGetXCBConnection (dpy=0x608620) at x11_xcb.c:9 > #1 0x00007ffff78d23b6 in _XGetXCBConnection (dpy=0x608620) at > /home/nathan/src/virtualgl/server/faker-sym.h:529 > #2 0x00007ffff78e97e9 in xcb_get_extension_data (conn=0x61a3a0, > ext=0x7ffff5b90050 <xcb_glx_id>) > at /home/nathan/src/virtualgl/server/faker-xcb.cpp:149 > #3 0x00007ffff55607d8 in xcb_send_request (c=c@entry=0x61a3a0, > flags=flags@entry=1, > vector=vector@entry=0x7fffffffda50, req=req@entry=0x7ffff5b8f950 > <xcb_req.12371>) at xcb_out.c:177 > #4 0x00007ffff59850b2 in xcb_glx_query_server_string > (c=c@entry=0x61a3a0, screen=screen@entry=0, > name=name@entry=2) at glx.c:2375 > #5 0x00007ffff76a057e in __glXQueryServerString > (dpy=dpy@entry=0x619150, opcode=<optimized out>, > screen=screen@entry=0, name=name@entry=2) at glx_query.c:47 > #6 0x00007ffff7682dd6 in AllocAndFetchScreenConfigs (priv=0x679dc0, > dpy=0x619150) at glxext.c:764 > #7 __glXInitialize (dpy=dpy@entry=0x619150) at glxext.c:879 > #8 0x00007ffff767f71b in GetGLXPrivScreenConfig (dpy=0x619150, scrn=0, > ppriv=ppriv@entry=0x7fffffffdb20, > ppsc=ppsc@entry=0x7fffffffdb28) at glxcmds.c:174 > #9 0x00007ffff767f7ab in GetGLXPrivScreenConfig (ppsc=0x7fffffffdb28, > ppriv=0x7fffffffdb20, > scrn=<optimized out>, dpy=<optimized out>) at glxcmds.c:170 > #10 glXGetConfig (dpy=<optimized out>, vis=0x631630, attribute=5, > value_return=0x63a1b8) at glxcmds.c:880 > #11 0x00007ffff78fd498 in _glXGetConfig (dpy=0x619150, vis=0x631630, > attrib=5, value=0x63a1b8) > at /home/nathan/src/virtualgl/server/faker-sym.h:178 > #12 0x00007ffff78fddf8 in buildVisAttribTable (dpy=0x619150, screen=0) > at /home/nathan/src/virtualgl/server/glxvisual.cpp:131 > #13 0x00007ffff78fee96 in __vglMatchVisual (dpy=0x619150, screen=0, > depth=24, c_class=4, level=0, stereo=0, > trans=0) at /home/nathan/src/virtualgl/server/glxvisual.cpp:354 > #14 0x00007ffff78dc484 in glXChooseVisual (dpy=0x619150, screen=0, > attrib_list=0x7fffffffdf00) > at /home/nathan/src/virtualgl/server/faker-glx.cpp:424 > #15 0x0000000000405909 in main (argc=2, argv=0x7fffffffe078) > at /home/nathan/src/virtualgl/glxdemos/glxspheres.c:602 > (gdb) c > Continuing. > [VGL] !!! Replaced xcb connection 0x61a3a0 with 0x609950 from > xcb_get_extension_data > Breakpoint 7, xcb_glx_query_server_string_string_length (R=R@entry=0x0) > at glx.c:2448 > 2448 glx.c: No such file or directory. > (gdb) c > Continuing. > Program received signal SIGSEGV, Segmentation fault. > xcb_glx_query_server_string_string_length (R=R@entry=0x0) at glx.c:2448 > > I.e. xcb_glx_query_server_string fails because the data tries to send > on wrong iovector, and glXQueryServerString isn't prepared to handle a > failure. > > This is on an OpenSuSE 13.1 box with Mesa 9.x + Nouveau, and doesn't > happen on modern Mesa or with NVIDIA's binary, but it demonstrates the > pitfalls of hooking an API that is used by the underlying > implementation, not just applications. > > Ways I can think of to work around this: > > 1) just disallow any non-3D-server GLX traffic, and loose the > VGL_PROBEGLX benefits. Hmmm, this looks like it kills the possibility > of stereo support, though I've never seen a stereo user. > > 2) Introduce a global (TLS-based should work) don't-fake-xcb-for-now > flag that the VGL_PROBEGLX code sets and the XCB hooks honour. > > > -Nathan > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk > _______________________________________________ > VirtualGL-Devel mailing list > Vir...@li... > https://lists.sourceforge.net/lists/listinfo/virtualgl-devel > |
From: Nathan K. <nat...@sp...> - 2014-12-03 04:08:07
|
On 02/12/14 08:34 PM, DRC wrote: > Please check out the code in SVN trunk and see if it corrects this > issue. While that code looks like it will help the general problem (one faked function calls another) it doesn't fix the particular SEGV[1] I see here because at the point the XCB code that needs to go to 2D server is called it is still under the first level (glXChooseVisual) function and is not in a 2nd level (_glXChooseVisual). I think the VGL_PROBGLX code needs to be manually wrapped in vglfaker::alreadyInterposed=true; vglfaker::alreadyInterposed=false; but at this hour my brain is not certified for thinking about this any more. -Nathan [1] Breakpoint 4, 0x00007ffff7afaf30 in xcb_get_extension_data () from ./librrfaker.so (gdb) c Continuing. FAKED GLX IN XCB DATA Breakpoint 4, xcb_get_extension_data (c=0x509370, ext=0x7ffff67c3050 <xcb_glx_id>) at xcb_ext.c:89 89 xcb_ext.c: No such file or directory. (gdb) c Continuing. Program received signal SIGSEGV, Segmentation fault. xcb_glx_query_server_string_string_length (R=R@entry=0x0) at glx.c:2448 2448 glx.c: No such file or directory. (gdb) bt #0 xcb_glx_query_server_string_string_length (R=R@entry=0x0) at glx.c:2448 #1 0x00007ffff7891595 in __glXQueryServerString (dpy=dpy@entry=0x518b70, opcode=<optimized out>, screen=screen@entry=0, name=name@entry=2) at glx_query.c:55 #2 0x00007ffff7873dd6 in AllocAndFetchScreenConfigs (priv=0x579550, dpy=0x518b70) at glxext.c:764 #3 __glXInitialize (dpy=dpy@entry=0x518b70) at glxext.c:879 #4 0x00007ffff787071b in GetGLXPrivScreenConfig (dpy=0x518b70, scrn=0, ppriv=ppriv@entry=0x7fffffffdc40, ppsc=ppsc@entry=0x7fffffffdc48) at glxcmds.c:174 #5 0x00007ffff78707ab in GetGLXPrivScreenConfig (ppsc=0x7fffffffdc48, ppriv=0x7fffffffdc40, scrn=<optimized out>, dpy=<optimized out>) at glxcmds.c:170 #6 glXGetConfig (dpy=<optimized out>, vis=0x530dc0, attribute=5, value_return=0x539948) at glxcmds.c:880 #7 0x00007ffff7aff4c1 in buildVisAttribTable(_XDisplay*, int) () from ./librrfaker.so #8 0x00007ffff7b004d9 in glxvisual::matchVisual2D(_XDisplay*, int, int, int, int, int, int) () from ./librrfaker.so #9 0x00007ffff7adb5b0 in glXChooseVisual () from ./librrfaker.so #10 0x000000000040556c in main () |
From: DRC <dco...@us...> - 2014-12-03 06:51:09
|
svn update and try again. I had replaced the boolean alreadyInterposed variable with an integer counter by the time I sent out that message, so I think you must have missed the latest commit somehow. If it still doesn't work, then I don't understand why. The _glXGetConfig() wrapper will increment vglfaker::fakerLevel prior to calling the "real" glXGetConfig() function, so any XCB functions called by the "real" glXGetConfig() function should not be interposed. The reason why I replaced the boolean variable was because of this scenario (this might have been what happened in your test): glXFoo() [interposed] _glXFoo() [wrapper] { vglfaker::alreadyInterposed=true; __glXFoo() [real] { xcb_foo_1() [interposed] { _xcb_foo_1(unmodified args because alreadyInterposed==true) [wrapper] { vglfaker::alreadyInterposed=true; __xcb_foo_1(unmodified args); [real] vglfaker::alreadyInterposed=false; } } xcb_foo_2() [interposed] { _xcb_foo_2(modified args because alreadyInterposed==false) [wrapper] { vglfaker::alreadyInterposed=true; __xcb_foo_2(modified args); [real] vglfaker::alreadyInterposed=false; } } } vglfaker::alreadyInterposed=false; } } In English, if the "real" GLX function called two interposed XCB functions back-to-back, then the first interposed XCB function would leave alreadyInterposed set to false, making the second interposed XCB function think that the XCB interposer should be active (when in fact it shouldn't be.) With the new counter variable, this becomes: glXFoo() [interposed] _glXFoo() [wrapper] { vglfaker::fakerLevel++; // == 1 __glXFoo() [real] { xcb_foo_1() [interposed] { _xcb_foo_1(unmodified args because fakerLevel > 0) [wrapper] { vglfaker::fakerLevel++; // == 2 __xcb_foo_1(unmodified args); [real] vglfaker::fakerLevel--; // == 1 } } xcb_foo_2() [interposed] { _xcb_foo_2(unmodified args because fakerLevel > 0) [wrapper] { vglfaker::fakerLevel++; // == 2 __xcb_foo_2(unmodified args); [real] vglfaker::fakerLevel--; // == 1 } } } vglfaker::fakerLevel--; // == 0 } } If it still doesn't work, then I'm not sure why. On 12/2/14 10:07 PM, Nathan Kidd wrote: > On 02/12/14 08:34 PM, DRC wrote: >> Please check out the code in SVN trunk and see if it corrects this >> issue. > > While that code looks like it will help the general problem (one faked > function calls another) it doesn't fix the particular SEGV[1] I see here > because at the point the XCB code that needs to go to 2D server is > called it is still under the first level (glXChooseVisual) function and > is not in a 2nd level (_glXChooseVisual). > > I think the VGL_PROBGLX code needs to be manually wrapped in > vglfaker::alreadyInterposed=true; > vglfaker::alreadyInterposed=false; > but at this hour my brain is not certified for thinking about this any more. > > > -Nathan > > > [1] > Breakpoint 4, 0x00007ffff7afaf30 in xcb_get_extension_data () from > ./librrfaker.so > (gdb) c > Continuing. > FAKED GLX IN XCB DATA > Breakpoint 4, xcb_get_extension_data (c=0x509370, ext=0x7ffff67c3050 > <xcb_glx_id>) at xcb_ext.c:89 > 89 xcb_ext.c: No such file or directory. > (gdb) c > Continuing. > > Program received signal SIGSEGV, Segmentation fault. > xcb_glx_query_server_string_string_length (R=R@entry=0x0) at glx.c:2448 > 2448 glx.c: No such file or directory. > (gdb) bt > #0 xcb_glx_query_server_string_string_length (R=R@entry=0x0) at glx.c:2448 > #1 0x00007ffff7891595 in __glXQueryServerString > (dpy=dpy@entry=0x518b70, opcode=<optimized out>, > screen=screen@entry=0, name=name@entry=2) at glx_query.c:55 > #2 0x00007ffff7873dd6 in AllocAndFetchScreenConfigs (priv=0x579550, > dpy=0x518b70) at glxext.c:764 > #3 __glXInitialize (dpy=dpy@entry=0x518b70) at glxext.c:879 > #4 0x00007ffff787071b in GetGLXPrivScreenConfig (dpy=0x518b70, scrn=0, > ppriv=ppriv@entry=0x7fffffffdc40, > ppsc=ppsc@entry=0x7fffffffdc48) at glxcmds.c:174 > #5 0x00007ffff78707ab in GetGLXPrivScreenConfig (ppsc=0x7fffffffdc48, > ppriv=0x7fffffffdc40, > scrn=<optimized out>, dpy=<optimized out>) at glxcmds.c:170 > #6 glXGetConfig (dpy=<optimized out>, vis=0x530dc0, attribute=5, > value_return=0x539948) at glxcmds.c:880 > #7 0x00007ffff7aff4c1 in buildVisAttribTable(_XDisplay*, int) () from > ./librrfaker.so > #8 0x00007ffff7b004d9 in glxvisual::matchVisual2D(_XDisplay*, int, int, > int, int, int, int) () > from ./librrfaker.so > #9 0x00007ffff7adb5b0 in glXChooseVisual () from ./librrfaker.so > #10 0x000000000040556c in main () > > > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk > _______________________________________________ > VirtualGL-Devel mailing list > Vir...@li... > https://lists.sourceforge.net/lists/listinfo/virtualgl-devel > |
From: Nathan K. <nat...@sp...> - 2014-12-03 16:50:12
|
On 03/12/14 01:50 AM, DRC wrote: > svn update and try again. I had replaced the boolean alreadyInterposed > variable with an integer counter by the time I sent out that message, so > I think you must have missed the latest commit somehow. Yes, you're right. Sorry about that. With the integer counter (tested r3596) it works. There is no SEGV. -Nathan |
From: DRC <dco...@us...> - 2014-11-23 19:51:56
|
Another note on this-- it seems like we may be duplicating effort in some cases by interposing both XNextEvent() and xcb_wait_for_event(). At the moment, as long as no obvious problems are caused by this, I'm OK with it, but in the future, it may be necessary to somehow set a property on the display connection within all of our interposed Xlib functions that temporarily disables the XCB interposer. That would also make me more comfortable with enabling the XCB faker all the time. On 11/23/14 12:58 PM, DRC wrote: > I reproduced the deadlock using my build of Qt 5.3.1 on RHEL 6 and > checked in a patch to trunk that addresses the issue, but I'm still > tripping up on exactly how that deadlock is occurring. Qt5 doesn't ever > call XPending(). It uses the XCB event handling functions directly. > > > On 11/20/14 5:28 PM, Nathan Kidd wrote: >> A couple quick and rough notes: >> >> >> 1. Deadlock >> >> I was recently playing with the new XCB support for QT5 and found an >> issue. >> >> The current xcb event handler tries to query an atom on the same >> connection the event was polled from. >> >> Modern libX11 (at least what was on my Ubuntu 14.04 box) implements >> XPending like: >> >> lock connection >> xcb_poll_event_etc. >> vgl event handler tries to query atom which attempts to get the >> connection mutex, deadlocks. >> >> How I repoed: >> - run fluxbox or Unity (probably any WM would do) >> - run the QT5 hellogl example app >> - click on the GL window >> >> I worked around this by querying the relevant atoms at XOpenDisplay time >> (storing them along with the xcb_connection_t* -> Display * hash map). >> >> I don't attach a patch because I'd backported the XCB code to my >> pre-refactor-everything 2.3.4 branch and made the patch there. >> >> If interested I can post it. Otherwise I'll try and get time to make a >> patch for trunk eventually, if I'm not beaten to it. >> >> 2. Dynamic (link) XCB support >> >> I really hated the idea of shipping two versions of the gl faker. (One >> with XCB support, one without), so I implemented XCB support >> dynamically, so it can build and run on e.g. RHEL5 or earlier. (RHEL5 >> can yum install xcb libs, but they aren't installed by default, in my >> observation, and I don't want to break existing installations) >> >> If interested I can post that patch also. Ideally this would wait to >> load the xcb libs till it sees an xcb symbol that needs faking used. >> Unfortunately we need the xcb_connection_t->Display hash map built from >> the beginning so for now I'm unconditionally loading the xcb symbols >> into the app symbol namespace. Hmmm, I suppose that could be worked >> around by building a Display hash list that's used to generate the xcb >> conn hash on first xcb use. >> >> I'm curious if you had any thoughts on this when you implemented. >> >> >> -Nathan >> >> ------------------------------------------------------------------------------ >> >> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >> from Actuate! Instantly Supercharge Your Business Reports and Dashboards >> with Interactivity, Sharing, Native Excel Exports, App Integration & more >> Get technology previously reserved for billion-dollar corporations, FREE >> http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk >> >> _______________________________________________ >> VirtualGL-Devel mailing list >> Vir...@li... >> https://lists.sourceforge.net/lists/listinfo/virtualgl-devel >> |