When using an application like Geomview under VirtualGL 2.1 I get a crash when using more then one window when I resize any other window than the newest window. This is reproducible every time on various machines. It is a problem with the VirtualGL client. What appears to happen is the socket from another thread is shutdown by a memory corruption in a different thread and as soon as one thread has a socket problem, all threads abort. This is caused by a line within fbx.c (method fbx_term()). The problem line (526) is
if (s->shminfo.shmid!=-1) shmctl(s->shminfo.shmid,IPC_RMID,0);
I believe this calls into the Exceed XDK but it produces the same problem with both Exceed 2007 and 2008 (including patch).
The fix is to comment out this line, or comment out the call to fbx_term in fbx_init (line 200).
My question is: Although VirtualGL seems very stable with this fix, and even fbx_term completes successfully on a closedown of VirtualGL, is this line really required? Is there another negative symptom that I'm likely to get by commenting out this line?
Any help would be greatly appreciated.
Wow. I think you're like the first person who's ever reported a bug and also provided the workaround. I think I may faint.
Hummingbird seems to come up with new and creative ways to break MIT-SHM with every new release, so it wouldn't surprise me at all if this is yet another failure mode. Sometimes I wonder if we're the only ones who use the Exceed MIT-SHM functionality.
As far as a potential downside to commenting out that line, there probably isn't one, although I still want to try and repro the problem before I make the modification. The MIT-SHM implementation in Exceed has a fake subset of the System V shared memory API that is designed such that you can (theoretically) compile, with few modifications, Unix applications that use the MIT-SHM API. However, the behavior of the extension behind the scenes is not strictly conformant. For instance, I've found that I can't generally rely on it to be thread safe, which is why every call into FBX in the Windows VGLclient is mutexed. This issue may be related to them establishing some sort of global structure and freeing it as soon as the first IPC_RMID command is issued. On a Unix system, failing to execute that command would leave a bunch of shared memory lying about, but I suspect that's not the case with Exceed.
The reason why we rely on Exceed is performance. It's the only X server that can come even close to the drawing performance we need. The next best contender is Cygwin/X, which, when configured to use MIT-SHM (in itself, not a straightforward thing to do) will perform about half as well as Exceed. Most other X servers for Windows don't implement MIT-SHM and thus perform maybe 1/10 as fast.
We have several ideas for how to get around this, such as implementing a special extension that will work more or less like MIT-SHM but use Win32 file mapping primitives instead of SysV shared mem. This could be easily implemented in X.org, but there is unfortunately no longer an actively maintained port of X.org for Windows except for XMing, whose build process is so obscure as to be completely unreproducible.
I just checked in a fix to VGL 2.1.1 (CVS tag stablebranch_2_1) and 2.2 (CVS head.) It turns out that the real problem was that I was pre-emptively removing the shared memory segment as soon as it was created. You can get away with this on Unix, because the shared memory segment isn't actually removed until the last reference to it is removed. So I added the code to pre-emptively remove it so that the shared memory segment doesn't stick around in the event of a program crash. But apparently Exceed doesn't like it if you remove a shared memory segment twice. So I #ifdef'd around the pre-emptive code, and it seems to work now -- for me, at least.