From: Gareth H. <ga...@va...> - 2001-02-16 17:53:33
|
Ove Kaaven wrote: > > Well, I'd have helped nailing down which GL calls were the culprit, if > there would ever appear a DRI version that didn't completely hang my > system after a few seconds, requiring a hard reboot (on a r128 RF 32MB > AGP). > > (Hmm, I also get a system hang in the opening menu of xracer) I'll ask for forgiveness in advance. It's time for a rant. I've seen a lot of bug reports like this. "My system locks up". "I can't run any GL apps". You know what I mean. Here's the thing: I have lots of video cards. The drivers for all of these cards are basically rock solid on my system. If they're not, or if people report problems that a) come with a small demo program that demonstrates the problem, or b) are easy to reproduce with well-known apps (games and such), then we can generally fix the problems quite quickly. Or, at least investigate the problems quickly. One thing that is almost impossible to address is a bug report like "my system hangs". The core parts of pretty much all the drivers have been banged on for a long, long time. Granted, I'm guilty of throwing out lots of code and rewriting it from scratch to address problems with performance, stability and robustness, but I'd love to argue with anyone who thinks this is a bad idea. The point I'm trying to make is that system hangs and the like could come down to some specific set of circumstances only seen on your machine. This includes the actual combination of hardware, as well as versions of software including your kernel, compiler, binutils, libc and so on. Case in point: it sounds like RedHat want to disable the DRI on a lot of cards for their next release, including the r128. I've heard claims "it doesn't work". This is all well and good, and I do believe these reports. Problem is, every time I fire it up here, it's rock solid... So, my question is, what can we do about this? Are the people who claim "it doesn't work" looking into why not? Do you have any idea where the problem is? For instance, have commands been sent to the hardware for rendering, or does the system hang during initialization? I'm not trying to pass the buck (honestly, I'm not -- I want to see these problems resolved as much as the next guy), but I know there isn't a hell of a lot I can do when I see a report of "my system hangs when I use the r128 driver"... Which leads us to... Gareth's Tips and Tricks when Debugging DRI Drivers: 1) Be careful about lots of printk messages in the kernel modules. The internal buffers will fill up easily, causing messages to be lost and thus leaving what looks like a bug in the kernel module. Nice ways to avoid this generally involve slowing the client down, which may cause the problem to disappear. Try sticking something like system( "sync" ) in the client swapbuffers code to flush everything to disk if you really want to capture everything. 2) Stick lots of calls to wait_for_idle() (or equivalent -- every driver should have this) or udelay( 1 ) in the kernel code. If you track the problem down to a single function, especially on startup, this can help you pinpoint the lockup (with appropriate debugging output of course). 3) Make the kernel module synchronous. That is, don't return from an ioctl until the commands it generated have completed. You can often acheive this with a wait_for_idle() before exit. So, you've just flushed a DMA buffer? Wait until you know it's done rendering. This will avoid lots of nasty behaviour if the system is doing things it should be, like overwriting DMA buffers before they're done etc. 4) One of my favourites: Stick this at the end of a swapbuffers or vertex buffer flush function: static int __break_on_fastpath = 0; void fooDDFastPath( ... ) { /* Do the fast path stuff */ if ( __break_on_fastpath ) { FLUSH_BATCH( ... ); /* use this if needed */ drmFooWaitForIdle( ... ); /* use this if needed */ __asm__ __volatile__ ( "int $3" ); } } I'm lazy, and I've had symbol problems with my debugger (maybe I shouldn't have been hacking on it), so this is a nice way to make sure you can break at an exact location with ease. See a problem in Quake3? Attach gdb to the process, hit Ctrl-C, set __break_on_fastpath = 1, and watch every DMA buffer get rendered. Of course, you'll have to hack things so it always renders to the front buffer and maybe turns depth testing off, but I've had a lot of fun with this... 5) Disable certain states in the 3D driver. Turn texturing off. Turn color buffer clears off. Turn depth testing off. Don't do the buffer swap on double-buffered apps. Set new_state bitfields to ~0 all the time. Force all the state to be uploaded to the hardware each time. Experiment. Big caveat: you're SOL if you only have one machine... Okay, I'm done. Please help us help you. -- Gareth |