Re: [Jfs-discussion] JFS and preemption with lock break patch causes freeze

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi,

On Sun, Apr 28, 2002 at 03:28:56PM -0400, Robert Love <rm...@te...> wrote:
> On Sun, 2002-04-28 at 15:04, Christoph Hellwig wrote:
> 
> > Robert, have you also looked at the initial bug report?
> > 
> > It triggers a debug_lock_break(551), and my reading of the lock_break
> > stuff is that 551 is your magic constant for "should not happen"..
> 
> Not exactly - 551 is my value for "I have not received a value here
> yet".  I.e., I do not assume I know the lock depth ever.  Instead, I set
> all the debugs to 551 and filled them in appropriately.  I did this to
> make sure I was right - if I just filled in "1" or whatever I thought
> the depth was, I would see no debug message and thus not know if I was
> right or not hitting the code path.
 Ok, but as in my first post I wrote about strange things what I still have
with JFS+preemption+lockbreak:
kernel/exit.c reports in function close_files() that the depth is three not 2.
fs/ext3/namei.c in function ext3_find_entry() notes the depth is two not 1.
(I have an ext3 partition as well).
fs/jbd/commit.c, journal_commit_transaction()
I do not know what jbd is. :-( Btw, I also get a lot of
process-name-put-here-what-you-like[pid] exited with preempt_count 1
Can it be that some kernel function when JFS is in, set a lock or whatsoever,
and forget to release it, and that's why the lock depth is bigger by one? At
least the above examples are not the only ones where I see a debug message
that the lock depth is one more deep. I do not know who or where makes this
mistake, but I get it early:
INIT: version 2.84 booting
S01devfsd[pid] exited with preempt_count 1

> > GCS: could you please rerun your testcase after you replaced the occurance
> > of debug_lock_break(551); in fsync_inode_buffers with debug_lock_break(1)
> > and retry?
 I rerun my test, but with a different number - 686. Thus I have seen once
that the execution hit that path. It stated the depth is two not 686. Could
not reproduce next time, but I think it is possible to hit it again.

> Well, if DEBUG is set and the code is 551 - you should see a message in
> syslog whenever you hit the path with the correct lock depth.  Is it 1?
 I do not use syslog for the test, as this bug locks all the devices, and
I think the message could not get flushed to the disk. Instead I stop the
syslog, and watch the messages on the console. As I wrote above, the depth
seems to be two, but I think you are right, and the real depth is one.
Without JFS, everything is fine from that I do not get '... exited with
preempt_count 1' messages to that it does not lock up when I am writing on
loop devices.

Hope it helps bring us closer to the real bug,
GCS