Thread: RE: [Kgdb-bugreport] [discuss] 2.4 kgdb SMP fixes

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi George,

> >
> >  The current code resolves this problem by sending an IPI to
> the other CPUs to enter the gdb_wait() state. This is on similar
>
> lines of the kdb code. When a CPU enters the master debugger it
> sends an IPI to
> the other CPUs. The other CPUs on receving this IPI
>
> would enter the gdb_wait() function till the master CPU has quit
> the debugger. A
> new function kgdb_smp_stop() has been added to stop
>
> other CPUs when a CPU has entered the debugger.
>
> IMHO an IPI is not strong enough unless it is an NMI IPI.
> Otherwise you are
> depending on the other cpu(s) being interruptable and if you are
> debugging the
> kernel, well, there just might be a problem where they are not
> interruptable and
> will not become so for some time, if ever.

  Right. The IPI is issued in the form of an NMI and issued to all other
CPUs except for the one which entered the debugger. Its should be reasonably
safe.

> >
> > Fix for Instruction Pointer
> > ----------------------------
> > On hitting a breakpoint gdb does the following sequence.
> >  1. Restore the original value at the Instruction address of
> the breakpoint.
> >  2. Decrement the instruction pointer by 1
> >  3. Issue a single step to the kgdb stub. We enable a trap flag
> and after
> >  executing a single instruction the debug exception is hit.
> >  4. The remote gdb on receipt of this debug exception,
> reinserts the Breakpoint
> >  at the breakpoint address.
>
> I would expect most breakpoints to stop AT the instruction, not
> after it.  Why
> is this being done at all?
>
> If this is to be done, then the SS should be treated just like a
> gdb commanded
> SS, i.e. the other cpu(s) should be held while the SS is being
> done.  See the
> code in the mm-kgdb on this.
> >

  We would stop at the breakpoint but the EIP/RIP would point to the byte
after the breakpoint intruction (INT3 instr). When we set a breakpoint at
the gdb console, gdb saves the original value of the byte at the breakpoint
address, and inserts the breakpoint instruction opcode at the breakpoint
address. When we hit the breakpoint gdb would then replace back the original
value at the breakpoint address and decrement back the instruction pointer
to restart the instruction again.
  The double fault is when gdb doesn't do the above and continues from where
the EIP/RIP is pointing to.

> >  The problem faced is that when two CPUs try to enter the
> debugger at the same time on hitting a breakpoint,
>
> we experience a double fault. The reason being that for CPU1
> steps 1 and 2 are
> executed.
>
> The trap flag is enabled and we return from the debugger. Now
> CPU2 enters and
> gdb possibly thinks that this is the single
>
> step debug exception and doesnt perform step 1 and 2.
>
> Clearly the first cpu should be completely in kgdb prior to doing
> any SS stuff.
>   By which time the other cpus should be captured in the wait loop.

  How does the kgdb-mm stub handle situations where a breakpoint is hit on
both the CPU at the same time. Which one enters the debugger? In the 2.4
kgdb code it would be possible both of them enter the debugger
(handle_exception()) and one of them acquires kgdb_lock while one waits on
the lock. Now the problem arises when
1. Both cpus hit a breakpoint and enter the kgdb execption handler
(handle_exception)
1. CPU 0 acquired the lock and contacted gdb. we continue at the gdb prompt.
2. gdb issues the "step" instruction and expects the debug exception to be
called wherein it would reinsert all the breakpoints back.
3. However since CPU1 was waiting for the lock it would get it and when it
contacts gdb, gdb thinks its the trap it was expecting, reinserts all
breakpoints but doesnt decrement back the instr pointer of CPU1.
4. CPU1 continues at the value pointing to in the instruction pointer and we
get the double fault.

> > 2. When  we detach from the debugger we need to handle the
> situation where
> >  another CPU already entered the debugger code and was waiting for the
> >  kgdb lock. Similary for the 'k' (quit from the debugger) packet.
>
> I am not sure what the given kgbd does here, but usually kgdb
> should treat the
> detach as just a "c" or continue.  It should then be in a state
> where it can
> handle a subsequent breakpoint and, at that point, attach again.
> It is up to
> gdb to make sure all breakpoints are cleared at this point.
  Right. The detach functions as a continue. But the problem is as mentioned
above when both/multiple CPUs try to enter the debugger at the same time.
One of them is waiting for a lock while the other contacts gdb, recevies the
"detach" command and quits the debugger. gdb has now quit too. Now the
second CPU aquires the lock and waits for commands from gdb, which is
probably no longer present. However this isnt a big problem and i havent
tried to fix it for the same reason. I believe we can still reattach and
then detach if "detach" is what we really want.

> I have not had any problems just aborting gdb on the host system
> in the middle
> of a kgdb session.  I, in fact, do this to test new gdbs on given
> issues.  With
> the mm-kgdb I can do this either when in kgdb (i.e. a breakpoint)
> or when not in
> kgdb, however, this ladder case requires that I clear breakpoints
> first.  When
> in a kgdb session (i.e. a breakpoint) kgdb clears all code
> resident breakpoints
> prior to its first prompt after a breakpoint, so this is not an
> issue at that time.
  Some of the issues are probably not present in the mm-kgdb code. I believe
that its code is lot different from the 2.4 stub's. I'll try it soon to see
if these problems are reproducible there. I havent been following the recent
kgdb mails that well. Is the kgdb-mm code merged with the kgdb  patch
available at sourceforge.net. If not could you point me to the location i
can retreive it from. Thanks in advance.

Best Regards,
Shivram U

> --
> George Anzinger   ge...@mv...
> High-res-timers:  http://sourceforge.net/projects/high-res-timers/
> Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml
>

Thread: RE: [Kgdb-bugreport] [discuss] 2.4 kgdb SMP fixes

kgdb-bugreport