From: Felix <fx...@gm...> - 2004-10-12 23:44:29
|
While investigating rare Xserver (errno=3D22 in select) and X client (losing X connection) crashes that seem to be related to the new linux-core drm I found this in savage_dri.c and several other DDX drivers: ...driverSwapMethod =3D DRI_HIDE_X_CONTEXT; This seems to indicate that the Xserver is somehow involved in direct rendering buffer swap operations and the driver-independent DRI code installs a DRM signal handler for it. I changed it to DRI_KERNEL_SWAP in the savage driver and have not got any crashes since. I'm going to stress this system a bit more tomorrow. GL apps still work, as expected since in the savage driver the kernel is currently not involved in buffer swaps. It's all done client side. Can anybody tell me what DRI_HIDE_X_CONTEXT means exactly and why it's used in so many drivers? I could imagine it has something to do with page flipping in the radeon driver, but what about MGA for example? How does this influence the way the Xserver interacts with the DRM? I'm trying to understand why the changes in linux-core drm started causing Xserver hickups. Till now I'm lost in the Xserver's use of client connections' and device file handles. Thanks, Felix | Felix K=FChling <fx...@gm...> http://fxk.de.vu | | PGP Fingerprint: 6A3C 9566 5B30 DDED 73C3 B152 151C 5CC1 D888 E595 | |
From: Keith W. <ke...@tu...> - 2004-10-13 09:15:29
|
Felix K=FChling wrote: > While investigating rare Xserver (errno=3D22 in select) and X client > (losing X connection) crashes that seem to be related to the new > linux-core drm I found this in savage_dri.c and several other DDX > drivers: >=20 > ...driverSwapMethod =3D DRI_HIDE_X_CONTEXT; >=20 > This seems to indicate that the Xserver is somehow involved in direct > rendering buffer swap operations and the driver-independent DRI code > installs a DRM signal handler for it. I changed it to DRI_KERNEL_SWAP i= n > the savage driver and have not got any crashes since. I'm going to > stress this system a bit more tomorrow. GL apps still work, as expected > since in the savage driver the kernel is currently not involved in > buffer swaps. It's all done client side. This is all to do with the original drm design providing a number of diff= erent=20 ways to implement context switching. All the "modern" drivers use one, I= =20 believe that it was DRI_HIDE_X_CONTEXT, though the names are quite confus= ing.=20 Basically there were provisions to allow context switches to be handled= in=20 the kernel, in the 3d driver or in the X server. I think the reason that you are experiencing failures is because the X se= rver=20 is trying to read or poll the drm filehandle, but Jon has recently change= d the=20 behaviour of that filehandle to return an error on those operations, rath= er=20 than indicating that no bytes are pending. Let me stress that in most drivers there is no need for the X server to b= e=20 looking at this filehandle at all, but the code which does so is widely=20 distributed and this change to the DRM will break it. Keith |
From: Jon S. <jon...@gm...> - 2004-10-13 14:17:49
|
On Wed, 13 Oct 2004 10:13:50 +0100, Keith Whitwell <ke...@tu...> wrote: > Felix K=FChling wrote: > I think the reason that you are experiencing failures is because the X se= rver > is trying to read or poll the drm filehandle, but Jon has recently change= d the > behaviour of that filehandle to return an error on those operations, rath= er > than indicating that no bytes are pending. >=20 > Let me stress that in most drivers there is no need for the X server to b= e > looking at this filehandle at all, but the code which does so is widely > distributed and this change to the DRM will break it. Is it the poll() call that is the problem? I can set it back to returning zero. The kernel people are saying this is wrong and are pushing me to have it return (POLLIN | POLLOUT | POLLRDNORM | POLLWRNORM) which is the correct return for an unimplemented poll function. Should we increment the DRM version in the set_version call and then change the return conditionally and fix the X server? Or is there a way to make this work on existing X servers with the correct return? I don't use the function so I'm just implementing what people tell me to do= . --=20 Jon Smirl jon...@gm... |
From: Keith W. <ke...@tu...> - 2004-10-13 15:40:34
|
Jon Smirl wrote: > On Wed, 13 Oct 2004 10:13:50 +0100, Keith Whitwell > <ke...@tu...> wrote: >=20 >>Felix K=FChling wrote: >>I think the reason that you are experiencing failures is because the X = server >>is trying to read or poll the drm filehandle, but Jon has recently chan= ged the >>behaviour of that filehandle to return an error on those operations, ra= ther >>than indicating that no bytes are pending. >> >>Let me stress that in most drivers there is no need for the X server to= be >>looking at this filehandle at all, but the code which does so is widely >>distributed and this change to the DRM will break it. >=20 >=20 > Is it the poll() call that is the problem? I can set it back to > returning zero. The kernel people are saying this is wrong and are > pushing me to have it return (POLLIN | POLLOUT | POLLRDNORM | > POLLWRNORM) which is the correct return for an unimplemented poll > function. In the very beginning, shared code in the X server listened to this fd, a= nd=20 shared code in the DRM existed to write to this fd, at the possible reque= st of=20 device-dependent DRM code. The first DRM driver, the gamma, used this=20 mechanism for context switching. But no other DRM driver except perhaps = ffb=20 ever has. About 18 months ago, I recognized this and pushed the shared code into th= e=20 gamma driver, and removed the poll() and read() functions from drm_fops.h= .=20 Within the day I recognized that the X server continued to examine the fd= , and=20 that there was a need to restore empty poll() and read() functions. Now = that=20 the gamma driver is dead, maybe we can go about pulling that behaviour ou= t of=20 the X server shared code as well. But right now, you have a situation wh= ere=20 everybody running a DRI-enabled X server is dependent on the poll() (and = maybe=20 read()) function behaving in the way it always has. So, it's not really an unimplemented poll() function, but the=20 backwards-compatible ghost of a real communications channel which is stil= l=20 polled, but never written to. Keith |
From: Jon S. <jon...@gm...> - 2004-10-13 16:24:48
|
On Wed, 13 Oct 2004 16:39:27 +0100, Keith Whitwell <ke...@tu...> wrote: > So, it's not really an unimplemented poll() function, but the > backwards-compatible ghost of a real communications channel which is still > polled, but never written to. One way to fix this would be to alternately return (POLLIN | POLLOUT | POLLRDNORM | POLLWRNORM) and zero. For a normal poll function zero means that data is not ready and to keep on polling. If we alternate between the two a correct polling program would get the zero, and then poll again and get POLLIN.. and work. If X tries the poll function it will get POLLIN.., retry and get zero, so it should work too. Will the current X code retry on a non-zero return? Can we fix the X server to remove this code in the future? Or fix the X server to set a higher interface number and change behavior based on the interface number? Right now I removed the polling code from DRM and the OS defaults took over. -- Jon Smirl jon...@gm... |
From: Jon S. <jon...@gm...> - 2004-10-13 16:54:34
|
I just changed DRM to alternative between zero and POLLIN.... This will make the DRM poll() function work like the kernel expects it to and still work with existing X servers. Can someone please get this incorrect code out of the X server? On Wed, 13 Oct 2004 16:39:27 +0100, Keith Whitwell <ke...@tu...> wrote: > Jon Smirl wrote: > > On Wed, 13 Oct 2004 10:13:50 +0100, Keith Whitwell > > <ke...@tu...> wrote: > > > >>Felix K=FChling wrote: > >>I think the reason that you are experiencing failures is because the X = server > >>is trying to read or poll the drm filehandle, but Jon has recently chan= ged the > >>behaviour of that filehandle to return an error on those operations, ra= ther > >>than indicating that no bytes are pending. > >> > >>Let me stress that in most drivers there is no need for the X server to= be > >>looking at this filehandle at all, but the code which does so is widely > >>distributed and this change to the DRM will break it. > > > > > > Is it the poll() call that is the problem? I can set it back to > > returning zero. The kernel people are saying this is wrong and are > > pushing me to have it return (POLLIN | POLLOUT | POLLRDNORM | > > POLLWRNORM) which is the correct return for an unimplemented poll > > function. >=20 > In the very beginning, shared code in the X server listened to this fd, a= nd > shared code in the DRM existed to write to this fd, at the possible reque= st of > device-dependent DRM code. The first DRM driver, the gamma, used this > mechanism for context switching. But no other DRM driver except perhaps = ffb > ever has. >=20 > About 18 months ago, I recognized this and pushed the shared code into th= e > gamma driver, and removed the poll() and read() functions from drm_fops.h= . > Within the day I recognized that the X server continued to examine the fd= , and > that there was a need to restore empty poll() and read() functions. Now = that > the gamma driver is dead, maybe we can go about pulling that behaviour ou= t of > the X server shared code as well. But right now, you have a situation wh= ere > everybody running a DRI-enabled X server is dependent on the poll() (and = maybe > read()) function behaving in the way it always has. >=20 > So, it's not really an unimplemented poll() function, but the > backwards-compatible ghost of a real communications channel which is stil= l > polled, but never written to. >=20 > Keith >=20 --=20 Jon Smirl jon...@gm... |
From: Felix <fx...@gm...> - 2004-10-13 17:40:03
|
Am Mi, den 13.10.2004 schrieb Jon Smirl um 18:53: > I just changed DRM to alternative between zero and POLLIN.... This > will make the DRM poll() function work like the kernel expects it to > and still work with existing X servers. Can someone please get this > incorrect code out of the X server? As I mentioned earlier, setting driverSwapMethod to DRI_KERNEL_SWAP seems to work for the savage driver. I bet it works for many others too. AFAICT this tells the common DRI code not to use the DRM file handle. Once this is proven to work with all drivers we could axe the code for the other methods. Regards, Felix --=20 | Felix K=FChling <fx...@gm...> http://fxk.de.vu | | PGP Fingerprint: 6A3C 9566 5B30 DDED 73C3 B152 151C 5CC1 D888 E595 | |
From: Keith W. <ke...@tu...> - 2004-10-13 17:48:12
|
Felix K=FChling wrote: > Am Mi, den 13.10.2004 schrieb Jon Smirl um 18:53: >=20 >>I just changed DRM to alternative between zero and POLLIN.... This >>will make the DRM poll() function work like the kernel expects it to >>and still work with existing X servers. Can someone please get this >>incorrect code out of the X server? >=20 >=20 > As I mentioned earlier, setting driverSwapMethod to DRI_KERNEL_SWAP > seems to work for the savage driver. I bet it works for many others too= . I'd rather see the whole context-switching code in the shared code given = a=20 good shaking out & fixed properly. Ie. we should apply the same knife to= the=20 shared DRI code that has been cutting the dead flesh out of DRM. > AFAICT this tells the common DRI code not to use the DRM file handle. > Once this is proven to work with all drivers we could axe the code for > the other methods. But it won't fix the code that's out there now, which is the backwards=20 compatibility concern. Keith |
From: Michel <mi...@da...> - 2004-10-13 20:58:30
|
On Wed, 2004-10-13 at 19:42 +0200, Felix K=C3=BChling wrote: > Am Mi, den 13.10.2004 schrieb Jon Smirl um 18:53: > > I just changed DRM to alternative between zero and POLLIN.... This > > will make the DRM poll() function work like the kernel expects it to > > and still work with existing X servers. Can someone please get this > > incorrect code out of the X server? >=20 > As I mentioned earlier, setting driverSwapMethod to DRI_KERNEL_SWAP > seems to work for the savage driver. I bet it works for many others too. > AFAICT this tells the common DRI code not to use the DRM file handle. > Once this is proven to work with all drivers we could axe the code for > the other methods. I doubt it's that simple, e.g. the DDX driver's SwapContext() function is only called with DRI_HIDE_X_CONTEXT AFAICT. --=20 Earthling Michel D=C3=A4nzer | Debian (powerpc), X and DRI develop= er Libre software enthusiast | http://svcs.affero.net/rm.php?r=3Ddaenzer |
From: Felix <fx...@gm...> - 2004-10-16 14:32:58
|
Am Mi, den 13.10.2004 schrieb Michel D=E4nzer um 22:57: > On Wed, 2004-10-13 at 19:42 +0200, Felix K=FChling wrote: > > Am Mi, den 13.10.2004 schrieb Jon Smirl um 18:53: > > > I just changed DRM to alternative between zero and POLLIN.... This > > > will make the DRM poll() function work like the kernel expects it to > > > and still work with existing X servers. Can someone please get this > > > incorrect code out of the X server? > >=20 > > As I mentioned earlier, setting driverSwapMethod to DRI_KERNEL_SWAP > > seems to work for the savage driver. I bet it works for many others too= . > > AFAICT this tells the common DRI code not to use the DRM file handle. > > Once this is proven to work with all drivers we could axe the code for > > the other methods. >=20 > I doubt it's that simple, e.g. the DDX driver's SwapContext() function > is only called with DRI_HIDE_X_CONTEXT AFAICT. [slowly starting to make sense of all this ...] Right. The default Block and Wakeup handlers only call SwapContext if DRI_HIDE_X_CONTEXT is set. But a driver can install its own Block and Wakeup handlers. I think that's what I'm going to do for the Savage driver. I believe it should work for Radeon (for example) too. The custom handlers would lock/unlock and call RADEONEnterServer or RADEONLeaveServer respectively. If I understand Keith correctly then all the modern drivers don't use the SIGIO method (kernel notifies Xserver of context switches) any more. So there is really no need for the full HIDE_X_CONTEXT+SERVER_SWAP overhead. --=20 | Felix K=FChling <fx...@gm...> http://fxk.de.vu | | PGP Fingerprint: 6A3C 9566 5B30 DDED 73C3 B152 151C 5CC1 D888 E595 | |
From: Roland S. <rsc...@hi...> - 2004-10-28 13:11:18
|
Jon Smirl wrote: > I just changed DRM to alternative between zero and POLLIN.... This > will make the DRM poll() function work like the kernel expects it to > and still work with existing X servers. Can someone please get this > incorrect code out of the X server? I got some mysterious server crashes since some time (week or so), about once per day, so it looks like the alternate polling isn't working. Just as Felix I got the "WaitForSomething(): select: errno=22" error message. The corresponding bugzilla entry is here: http://freedesktop.org/bugzilla/show_bug.cgi?id=1505. This is with the radeon DDX (rv250), and Xorg cvs (10 days old currently). I've switched back to the non-core drm version for now and will see if this indeed fixes that crash. Roland |
From: Jon S. <jon...@gm...> - 2004-10-28 15:56:57
|
On Thu, 28 Oct 2004 15:11:26 +0200, Roland Scheidegger <rsc...@hi...> wrote: > Jon Smirl wrote: > > I just changed DRM to alternative between zero and POLLIN.... This > > will make the DRM poll() function work like the kernel expects it to > > and still work with existing X servers. Can someone please get this > > incorrect code out of the X server? > > I got some mysterious server crashes since some time (week or so), about > once per day, so it looks like the alternate polling isn't working. Just > as Felix I got the "WaitForSomething(): select: errno=22" error message. > The corresponding bugzilla entry is here: > http://freedesktop.org/bugzilla/show_bug.cgi?id=1505. > This is with the radeon DDX (rv250), and Xorg cvs (10 days old currently). > I've switched back to the non-core drm version for now and will see if > this indeed fixes that crash. I just changed this to always return the broken response of zero. When this gets fixed in the Xserver we can use a DRM interface version number to control returning the correct response. Meanwhile I guess the kernel developers are going to have to live with Xserver/DRM making up their own rules for how poll() works. -- Jon Smirl jon...@gm... |