Here's how I'd do it... setup nfsd on a box on your network. export a
directory read-write, then start your debugger, with the output to a
file in the nfs directory. Yes, nfs is buggy, but you don't loose data
when a box crashes (as long as it's not the server)
Alexander Stohr wrote:
>Total lockups are generally hard to solve.
>Yes, i am searching for a reliable method to
>work on them myselves, but i havent found yet.
>If you can tune features at a fine scale and trigger
>any positive effect then you might have won in locating
>the reason. But often fine tuning is not availabel.
>Rather its so that you throw code with heavily different
>concepts towards the same hardware - so "fine tune" wont apply.
>The problem is that hard lockups are more likely
>a sign of hardware problems than of system corruption.
>I want to say, a bus lockup will surely freeze your box.
>Expect that to happen if you provide wrong data to your
>adapter via DMA or if there are sequences of actions
>that are not tolerated in that way. Lack of locking in
>a multiprocessing/multithreading environment is a common
>reason. If you are in the XServers code, you will see
>multiple of software and hardware related locks.
>You can only track how it comes to the hang, but nothing
>more. But as usual, as soon as you start adding some
>debug prints, the stall might no longer happen or you
>will not get all the messages up to the point where
>the stall happens due to buffers in the system. And
>last but not least, such errors might not even be all
>the same and not really happen synchronously to the
>piece software that raised the problem.
>My last but one approach wach parallel port debugging
>(meaning, setting a bit pattern for each code component).
>Just a set of LEDs and resistors soldered on a SUB-D
>connector. But in my case it wasnt delivering any hints.
>Other methods would be duplicating/logging of DMA buffers
>to some external storage in the drivers code. But this
>again raises the timing and reproducability problem.
>Logic analyzers? Not even them are that helpful.
>Despite their complexity to apply, there are always
>limitations on what you can track and determining
>what really goes on. on the interface between system
>The simulator approach (big cubes that act like some
>hardware due to ASIC rule programming but i.e. at 1/100th
>the speed) isnt really an option because just because
>of their limited availability (including the ASCI rules)
>and in fact of the incompleteness of simulating any sort
>of glitches in a complex system like a PC.
>Concerning your description - the phenomen sounds
>rather compareable to things i had seen myself shortly.
>I am counting on you finding the solution and win the
>nobel price of computer science... *just kidding*
>Its just a problem of information - the computer does
>not tell you what went wrong, even if it is obvious
>that something went wrong. And of course it takes
>ages to turn around, even with a journaling filesystem.
>I'd surely like other folks joining this discussion,
>but there might be only a few.
>PS: If i'd know about any problems in common code
> with that magnitude, i wouldnt hesitate to tell
> anybody about it - its even my benefit if its
> fixed in the upcoming releases of DRI and XFree86.
>>From: Pontus Hedman [mailto:rph@...]
>>Sent: Friday, August 31, 2001 02:41
>>Subject: [Dri-devel] How do I debug a lock-the-box crash?
>>I'm using the DRI X from recent CVS with a 2.4.6 kernel,
>>with a Rage Fury 128 card. Everything works just great,
>>except that many 3D apps cause the machine to lock up
>>solid, more or less at random. I'm talking keyboard
>>unresponsive (no numlock or alt-sysrq-b reaction)
>>and no response to pings.
>>The most reliable way to cause the lockup is to run
>>FlightGear and switch focus between its window and
>>some other window rapidly.
>>I'm at a loss as to how to even start to debug this,
>>since the box locks up without any hint about what's wrong.
>>Any suggestions? Or is it more likely that my
>>QDI Advance 9 motherboard is a flaky piece of junk?
>>Dri-devel mailing list
>Dri-devel mailing list