From: Steve C. <ste...@ya...> - 2004-12-06 09:34:43
|
> By the way: I am still interested in my original question. How would > I use a debugger etc. to find the problem myself in such a situation? I should know the answer because I created the glibc patch that fixes the problem, but it was back in February and I can't remember all the details. It started when I thought, why should I run my AthlonXP in '386 emulator mode' when I can use 'gcc -march=athlon-xp' and actually benefit from the extra instructions my processor supports. This worked fine until I compiled numarray and it failed its own tests with a floating-point exception. But if I used the default gcc settings it worked OK. I filed a numarray bug report (which I can no longer locate, perhaps they get deleted after a certain date), they looked at it and said it was probably a gcc bug. I filed a gcc bug report, and they closed it saying it was not a gcc bug. Then I thought it might be a bug with the way kernel handles FP exceptions and started looking through the kernel sources, but did not make much progress. So I went back to the numarray source code and tried no narrow down where the problem was occurring. Now to answer your question: Consider you are on a TV game show where you have to guess a number x in the range 1 to y and are told 'higher', 'lower' or 'correct' after each turn. You can use a binary search and always guess the mid point of the range - you are either correct or eliminate half of the possibilities each turn, so in ceil log(y, 2) turns or less you locate the correct number. You can use a similar kind of binary search to locate bugs in software. You know the bug occurs on some line x of the source code with y lines. Use gdb and insert breakpoints in the code (I think I just inserted printf() statements instead of using gdb) and see if the error occurs before or after the breakpoint, move the breakpoint and try again. The problem is that source code is rarely a linear list of statements in one file that are executed in order, but a set of procedures/functions in many files where the execution order can vary. You can start at the main () function, split it in half and insert a breakpoint (or printf()) run it and see in which half the error occurs, repeat the process working your way down into other functions until you pinpoint the error. Hope that makes sense. You could now reinstall the old glibc, forget that you know that glibc is the problem and start again to locate the bug, it will be useful practise for the next bug that comes along! Steve |