On Mon, Jul 14, 2008 at 10:46 PM, Jeff Dike <jdike@addtoit.com> wrote:
On Mon, Jul 14, 2008 at 05:06:49PM +0800, Jiaying Zhang wrote:
> The 2.6.24 kernels are OK, but I have seen this problem with all of the
> 2.6.25 kernels I have tried. There have been a lot of changes between
> 2.6.24 kernels and 2.6.25 kernels. I am not sure which one may lead
> to this problem.

So bisect it.

The problem seems to be related to the getting rid of fastcall changes
introduced in 2.6.25 kernels. I found the problem started to happen from
commit 82f74e7159749cc511ebf5954a7b9ea6ad634949: x86: unify include/asm-x86/linkage_[32|64].h.
After that, several commits related to __down_interruptible had been
checked in, but they did not solve the crashing problem I saw.
In particular, I thought the d50efc6c40620b2e11648cac64ebf4a824e40382
x86: fix UML and -regparm=3 commit would solve the problem because it
adds the asmregparm macro that is the same as fastcall and uses the macro
for  __down_failed_interruptible declaration. Unfortunately, I tried that version
of git code and saw the same problem happened.


> Looks like the problem happens when __down_interruptible is called.
> I checked the semaphore passed to __down_interruptible under gdb
> and found it was corrupted:
> (gdb) f 18
> #18 __down_interruptible (sem=0x9f68d08) at include/linux/list.h:50
> 50              prev->next = new;
> (gdb) p sem
> $15 = (struct semaphore *) 0x9f68d08
> (gdb) p *sem
> $16 = {count = {counter = -268435295}, sleepers = 4, wait = {lock =
> {raw_lock = {<No data fields>}}, task_list = {
>       next = 0x9f68d5c, prev = 0x18124}}}
>
> But the semaphore looks correct before calling down_interruptible:

What's the problem with debugging this, then?  You step through the
code starting when the semaphore is good and see exactly when it gets
corrupted.

Yes. Looks like the corruption happens when __down_failed_interruptible()
calls __down_interruptible() and it has something to do with the 2.6.25's x86
gcc attribute changes.

Jiaying