The patch below solves the 2.6.25 uml crash problem for me. Looks like the
problem should be away in 2.6.26 kernel because down_interruptible has
changed to the C code since 2.6.26. But I got kernel panic while booting
the 2.6.26 kernel :(.

--- linux-2.6.25.4/lib/semaphore-sleepers.c     2008-05-15 23:00:12.000000000 +0800
+++ linux-2.6.25.4-new/lib/semaphore-sleepers.c 2008-07-17 12:20:47.000000000 +0800
@@ -48,12 +48,12 @@
  *    we cannot lose wakeup events.
  */

-void __up(struct semaphore *sem)
+asmregparm void __up(struct semaphore *sem)
 {
        wake_up(&sem->wait);
 }

-void __sched __down(struct semaphore *sem)
+asmregparm void __sched __down(struct semaphore *sem)
 {
        struct task_struct *tsk = current;
        DECLARE_WAITQUEUE(wait, tsk);
@@ -90,7 +90,7 @@ void __sched __down(struct semaphore *se
        tsk->state = TASK_RUNNING;
 }

-int __sched __down_interruptible(struct semaphore *sem)
+asmregparm int __sched __down_interruptible(struct semaphore *sem)
 {
        int retval = 0;
        struct task_struct *tsk = current;
@@ -153,7 +153,7 @@ int __sched __down_interruptible(struct
  * single "cmpxchg" without failure cases,
  * but then it wouldn't work on a 386.
  */
-int __down_trylock(struct semaphore *sem)
+asmregparm int __down_trylock(struct semaphore *sem)
 {
        int sleepers;
        unsigned long flags;

Jiaying

On Wed, Jul 16, 2008 at 5:52 PM, Jiaying Zhang <jiayingz@google.com> wrote:


On Mon, Jul 14, 2008 at 10:46 PM, Jeff Dike <jdike@addtoit.com> wrote:
On Mon, Jul 14, 2008 at 05:06:49PM +0800, Jiaying Zhang wrote:
> The 2.6.24 kernels are OK, but I have seen this problem with all of the
> 2.6.25 kernels I have tried. There have been a lot of changes between
> 2.6.24 kernels and 2.6.25 kernels. I am not sure which one may lead
> to this problem.

So bisect it.

The problem seems to be related to the getting rid of fastcall changes
introduced in 2.6.25 kernels. I found the problem started to happen from
commit 82f74e7159749cc511ebf5954a7b9ea6ad634949: x86: unify include/asm-x86/linkage_[32|64].h.
After that, several commits related to __down_interruptible had been
checked in, but they did not solve the crashing problem I saw.
In particular, I thought the d50efc6c40620b2e11648cac64ebf4a824e40382
x86: fix UML and -regparm=3 commit would solve the problem because it
adds the asmregparm macro that is the same as fastcall and uses the macro
for  __down_failed_interruptible declaration. Unfortunately, I tried that version
of git code and saw the same problem happened.


> Looks like the problem happens when __down_interruptible is called.
> I checked the semaphore passed to __down_interruptible under gdb
> and found it was corrupted:
> (gdb) f 18
> #18 __down_interruptible (sem=0x9f68d08) at include/linux/list.h:50
> 50              prev->next = new;
> (gdb) p sem
> $15 = (struct semaphore *) 0x9f68d08
> (gdb) p *sem
> $16 = {count = {counter = -268435295}, sleepers = 4, wait = {lock =
> {raw_lock = {<No data fields>}}, task_list = {
>       next = 0x9f68d5c, prev = 0x18124}}}
>
> But the semaphore looks correct before calling down_interruptible:

What's the problem with debugging this, then?  You step through the
code starting when the semaphore is good and see exactly when it gets
corrupted.

Yes. Looks like the corruption happens when __down_failed_interruptible()
calls __down_interruptible() and it has something to do with the 2.6.25's x86
gcc attribute changes.

Jiaying