[Jfs-discussion] JFS bug?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

We got a kernel error in /var/log/messages when cancelling an unmount of a
JFS filesystem with ctrl-c on kernel 2.4.24 + jfsutils 1.1.4 on Intel.  I
am wondering if we should not have cancelled the unmount.  It had been
hanging for a few minutes and there did not appear to be any IO.  Here is
what is in the log:

Feb 18 09:19:09 rchs93hd kernel: BUG at jfs_logmgr.c:1483
assert(list_empty(&log->synclist))
Feb 18 09:19:09 rchs93hd kernel: kernel BUG at jfs_logmgr.c:1483!
Feb 18 09:19:09 rchs93hd kernel: invalid operand: 0000
Feb 18 09:19:09 rchs93hd kernel: CPU:    0
Feb 18 09:19:09 rchs93hd kernel: EIP:    0010:[<f0908dca>]    Not tainted
Feb 18 09:19:09 rchs93hd kernel: EFLAGS: 00010282
Feb 18 09:19:09 rchs93hd kernel: eax: 0000003f   ebx: 00000320   ecx:
00000000   edx: 00000001
Feb 18 09:19:09 rchs93hd kernel: esi: eccd2000   edi: ef05fd18   ebp:
ef05fc80   esp: eccd3eec
Feb 18 09:19:09 rchs93hd kernel: ds: 0018   es: 0018   ss: 0018
Feb 18 09:19:09 rchs93hd kernel: Process umount (pid: 8954,
stackpage=eccd3000)
Feb 18 09:19:09 rchs93hd kernel: Stack: f09165ee f09165e1 000005cb f09166ba
c1912248 ef88cca0 ef88cca0 ef88cca0
Feb 18 09:19:09 rchs93hd kernel:        eccd3f50 c015edc1 efffb470 ef88cca0
eefb4520 efc17d60 ef05fc80 eefd5400
Feb 18 09:19:09 rchs93hd kernel:        f08ed3f7 ef05fc80 00000002 00000000
eefb4120 eefd5400 efc17d60 f0918f60
Feb 18 09:19:09 rchs93hd kernel: Call Trace:    [<f09165ee>] [<f09165e1>]
[<f09166ba>] [<c015edc1>] [<f08ed3f7>]

Output of ksymoops:

>>EIP; f0908dca <[jfs]jfs_flush_journal+9a/250>   <=====
Trace; f09165ee <[jfs].rodata.end+496b/735d>
Trace; f09165e1 <[jfs].rodata.end+495e/735d>
Trace; f09166ba <[jfs].rodata.end+4a37/735d>
Trace; c015edc1 <destroy_inode+51/60>
Trace; f08ed3f7 <[jfs]jfs_umount+47/150>
Trace; f0918f60 <[jfs].rodata.end+72dd/735d>
Trace; f08e8240 <[jfs]jfs_put_super+0/80>
Trace; f08e8266 <[jfs]jfs_put_super+26/80>
Trace; c014d806 <kill_super+146/180>
Trace; c0162dff <sys_umount+3f/a0>
Trace; c01339c3 <sys_munmap+43/70>
Trace; c0162e77 <sys_oldumount+17/20>
Trace; c01093cf <system_call+33/38>
Code;  f0908dca <[jfs]jfs_flush_journal+9a/250>
00000000 <_EIP>:
Code;  f0908dca <[jfs]jfs_flush_journal+9a/250>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  f0908dcc <[jfs]jfs_flush_journal+9c/250>
   2:   cb                        lret
Code;  f0908dcd <[jfs]jfs_flush_journal+9d/250>
   3:   05 e1 65 91 f0            add    $0xf09165e1,%eax
Code;  f0908dd2 <[jfs]jfs_flush_journal+a2/250>
   8:   eb c0                     jmp    ffffffca <_EIP+0xffffffca>
Code;  f0908dd4 <[jfs]jfs_flush_journal+a4/250>
   a:   c7 44 24 0c d5 66 91      movl   $0xf09166d5,0xc(%esp,1)
Code;  f0908ddb <[jfs]jfs_flush_journal+ab/250>
  11:   f0
Code;  f0908ddc <[jfs]jfs_flush_journal+ac/250>
  12:   c7 44 00 00 00 00 00      movl   $0x0,0x0(%eax,%eax,1)
Code;  f0908de3 <[jfs]jfs_flush_journal+b3/250>
  19:   00

We were trying to unmount the filesystem to replay the log and remount
because we noticed that the filesystem had been remounted readonly the
night before.  The filesystem was 99% full(~4Gig free out of ~280Gig) when
were looking into this on 2/18, so maybe the filesystem filled up?  I don't
see any indications of hardware problems on the raid array holding this
filesystem.

Feb 17 19:32:50 rchs93hd kernel: ERROR: (device sd(8,17)): __get_metapage:
mp->logical_size != size
Feb 17 19:32:50 rchs93hd kernel: ERROR: (device sd(8,17)): remounting
filesystem as read-only
Feb 17 19:32:50 rchs93hd kernel:
Feb 17 19:32:50 rchs93hd kernel: ERROR: (device sd(8,17)): __get_metapage:
mp->logical_size != size
Feb 17 19:32:50 rchs93hd last message repeated 73 times

Thanks,

John Janosik