[uml-devel] Re: Race condition in ptrace

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Nick Piggin wrote:
> Bodo Stroesser wrote:
> 
>> Working with the new UML skas0 mode on my Xeon HT host, sporadically I 
>> saw
>> some processes on UML segfaulting.
>>
>> In all cases, I could track this down to be caused by a gs segment 
>> register,
>> that had the wrong contents.
>>
>> This again is caused by a problem in the host linux: A ptraced child 
>> going to
>> stop and having woken up its parent, will save some of its registers 
>> (on i386
>> they are fs, gs and the fp-registers) very late in switch_to. The 
>> parent is
>> granted access to child's registers as soon, as the child is removed from
>> the runqueue. Thus, in rare cases, the parent might access child's 
>> register
>> savearea before the registers really are saved.
>>
>> This problem might also be the reason for problems with floatpoint on 
>> UML,
>> that were reported some time ago.
>>
>> I've written a test program, that reproduces the problem on my 2.6.9 
>> vanilla
>> host quite quick. Using SuSE kernel 2.6.5-7.97-smp, I can't reproduce the
>> problem, although the relevant parts seem to be unchanged. Maybe not 
>> related
>> changes modify the timing?
>>
>> I also created a patch, that fixes the problem on my 2.6.9 host. This 
>> probably
>> isn't a sane patch, but is enough to demonstrate, where I think, the 
>> bug is.
>> Both files are attached.
>>
>>        Bodo
>>
>>
>> ------------------------------------------------------------------------
>>
>> --- a/include/linux/sched.h    2005-02-02 22:15:51.000000000 +0100
>> +++ b/include/linux/sched.h    2005-02-02 22:22:54.000000000 +0100
>> @@ -584,6 +584,7 @@ struct task_struct {
>>        struct mempolicy *mempolicy;
>>        short il_next;        /* could be shared with used_math */
>>  #endif
>> +    volatile long saving;
>>  };
>>  
>>  static inline pid_t process_group(struct task_struct *tsk)
>> --- a/kernel/sched.c    2005-02-02 21:32:51.000000000 +0100
>> +++ b/kernel/sched.c    2005-02-02 22:12:14.000000000 +0100
>> @@ -2689,8 +2689,10 @@ need_resched:
>>          if (unlikely((prev->state & TASK_INTERRUPTIBLE) &&
>>                  unlikely(signal_pending(prev))))
>>              prev->state = TASK_RUNNING;
>> -        else
>> +        else {
>> +            prev->saving = 1;
>>              deactivate_task(prev, rq);
>> +        }
>>      }
>>  
>>      cpu = smp_processor_id();
>> --- a/kernel/ptrace.c    2005-02-02 22:12:33.000000000 +0100
>> +++ b/kernel/ptrace.c    2005-02-02 22:20:46.000000000 +0100
>> @@ -96,6 +96,7 @@ int ptrace_check_attach(struct task_stru
>>  
>>      if (!ret && !kill) {
>>          wait_task_inactive(child);
>> +        while ( child->saving ) ;
>>      }
>>  
>>      /* All systems go.. */
>> --- a/arch/i386/kernel/process.c    2005-02-02 22:18:29.000000000 +0100
>> +++ b/arch/i386/kernel/process.c    2005-02-02 22:19:22.000000000 +0100
>> @@ -577,6 +577,9 @@ struct task_struct fastcall * __switch_t
>>      asm volatile("movl %%fs,%0":"=m" (*(int *)&prev->fs));
>>      asm volatile("movl %%gs,%0":"=m" (*(int *)&prev->gs));
>>  
>> +    wmb();
>> +    prev_p->saving=0;
>> +
>>      /*
>>       * Restore %fs and %gs if needed.
>>       */
>>
> 
> I don't see how this could help because AFAIKS, child->saving is only
> set and cleared while the runqueue is locked. And the same runqueue lock
> is taken by wait_task_inactive.
> 

Sorry, that not right. There are some routines called by sched(), that release
and reacquire the runqueue lock.

Bodo