I have a repeatable kernel BUG when invoking
passemble -r all
kernel is 2.4.21-pre5, opengfs is CVS version of 07-03-2003
ksymoops output is given below:
kernel BUG at block_dev.c:382!
invalid operand: 0000
CPU: 1
EIP: 0010:[<c0141fc6>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010202
eax: 00000001 ebx: f7cda1c8 ecx: 00000001 edx:
00000000
esi: f7cda1c0 edi: c0352340 ebp: 00000000 esp:
c7535bac
ds: 0018 es: 0018 ss: 0018
Process passemble (pid: 19457, stackpage=c7535000)
Stack: ca3d8f98 00000000 00000000 f8939827 f7cda1c0
00000000 00000298 f893b598
c20bc000 f7cdb480 00000000 f8939e98 00000802
ca3d8f98 00000000 c20bc248
00000248 00000000 00000000 f893decc f893dfec
00000001 f893dd80 f893a8c8
Call Trace: [<f8939827>] [<f893b598>] [<f8939e98>]
[<f893decc>] [<f893dfec>]
[<f893dd80>] [<f893a8c8>] [<f893decc>] [<f8938d09>]
[<c013d7f8>] [<c013d81f>]
[<c013d7f8>] [<c013d81f>] [<c013da56>] [<c0211679>]
[<c0176db4>] [<c0176dc8>]
[<c0151994>] [<c0151ba7>] [<c0177711>] [<c014ef1c>]
[<c0145d34>] [<c0146657>]
[<c0135276>] [<c012e08b>] [<c012e14b>] [<c012dfd0>]
[<c012a33d>] [<c0121b29>]
[<c012a55e>] [<c01209be>] [<f893dd80>] [<f8938220>]
[<f893dd80>] [<c01169d9>]
[<c0142634>] [<c013ade2>] [<c013ac5e>] [<c01427c3>]
[<c014a390>] [<c0108c73>]
Code: 0f 0b 7e 01 7c fc 29 c0 8b 06 8b 56 04 89 50 04
89 02 8b 5e
>>EIP; c0141fc6 <bdput+26/c0> <=====
Trace; f8939827 <[ogfs]drop_glock+d7/140>
Trace; f893b598 <[ogfs]ogfs_glock_cb+518/680>
Trace; f8939e98 <[ogfs]ogfs_glock+c8/5d0>
Trace; f893decc <[ogfs]ogfs_dealloc_inodes+6c/1b0>
Trace; f893dfec <[ogfs]ogfs_dealloc_inodes+18c/1b0>
Trace; f893dd80 <[ogfs]inode_dealloc+230/310>
Trace; f893a8c8 <[ogfs]ogfs_force_gl_drop+18/250>
Trace; f893decc <[ogfs]ogfs_dealloc_inodes+6c/1b0>
Trace; f8938d09 <[ogfs]ogfs_unhold_lvb+49/2d0>
Trace; c013d7f8 <getblk+28/60>
Trace; c013d81f <getblk+4f/60>
Trace; c013d7f8 <getblk+28/60>
Trace; c013d81f <getblk+4f/60>
Trace; c013da56 <bread+16/70>
Trace; c0211679 <__ide_dma_begin+29/30>
Trace; c0176db4 <ext2_read_inode+234/3f0>
Trace; c0176dc8 <ext2_read_inode+248/3f0>
Trace; c0151994 <get_new_inode+144/160>
Trace; c0151ba7 <iget4+e7/f0>
Trace; c0177711 <ext2_lookup+51/70>
Trace; c014ef1c <dput+1c/180>
Trace; c0145d34 <real_lookup+b4/110>
Trace; c0146657 <link_path_walk+797/a20>
Trace; c0135276 <__alloc_pages+46/190>
Trace; c012e08b <filemap_nopage+bb/210>
Trace; c012e14b <filemap_nopage+17b/210>
Trace; c012dfd0 <filemap_nopage+0/210>
Trace; c012a33d <do_no_page+3d/1a0>
Trace; c0121b29 <sysctl_string+89/160>
Trace; c012a55e <handle_mm_fault+be/d0>
Trace; c01209be <do_sysctl_strategy+be/140>
Trace; f893dd80 <[ogfs]inode_dealloc+230/310>
Trace; f8938220 <[ogfs]ogfs_get_glstruct+170/610>
Trace; f893dd80 <[ogfs]inode_dealloc+230/310>
Trace; c01169d9 <do_page_fault+2f9/4e7>
Trace; c0142634 <blkdev_open+24/30>
Trace; c013ade2 <dentry_open+172/190>
Trace; c013ac5e <filp_open+4e/60>
Trace; c01427c3 <blkdev_ioctl+23/30>
Trace; c014a390 <sys_ioctl+90/207>
Trace; c0108c73 <system_call+33/38>
Code; c0141fc6 <bdput+26/c0>
00000000 <_EIP>:
Code; c0141fc6 <bdput+26/c0> <=====
0: 0f 0b ud2a <=====
Code; c0141fc8 <bdput+28/c0>
2: 7e 01 jle 5 <_EIP+0x5>
c0141fcb <bdput+2b/c0>
Code; c0141fca <bdput+2a/c0>
4: 7c fc jl 2 <_EIP+0x2>
c0141fc8 <bdput+28/c0>
Code; c0141fcc <bdput+2c/c0>
6: 29 c0 sub %eax,%eax
Code; c0141fce <bdput+2e/c0>
8: 8b 06 mov (%esi),%eax
Code; c0141fd0 <bdput+30/c0>
a: 8b 56 04 mov 0x4(%esi),%edx
Code; c0141fd3 <bdput+33/c0>
d: 89 50 04 mov %edx,0x4(%eax)
Code; c0141fd6 <bdput+36/c0>
10: 89 02 mov %eax,(%edx)
Code; c0141fd8 <bdput+38/c0>
12: 8b 5e 00 mov 0x0(%esi),%ebx
Logged In: YES
user_id=365026
Did you unmount any and all OpenGFS file systems before
running passemble -r all??
It looks, by the stack trace, like the kernel is trying to
use the ogfs lock services, but these should not be invoked
if the filesystem is not mounted (true?), and the filesystem
*should not be mounted* if you are using passemble -r !
Logged In: YES
user_id=641943
Yes, I unmounted OpenGFS filesystem, moreover, this BUG is
triggered even if filesystem never has been mounted. IMHO in
any case usage of passemble -r on mounted filesystem should
not trigger kernel BUG.
To be precise, I did the following operations:
1) uramdisk -n 1048576
...
DISK: 1 logical units (1048576 blocks, 512 bytes/block)
DISK: LU 0: 512 MB ramdisk
2) insmod intel_iscsi.o
3) Creating patitions:
cfdisk /dev/sda
/proc/partitions now contains two valid partitions:
/dev/sda1 and /dev/sda2
4) modprobe memexp
modprobe ogfs
4) ptool pool0.cf
ptool pool0cidev.cf
passemble
Till that moment all goes well
/proc/patitions now contains two pools with a size of 2Tb
(is it valid?)
Now I'm able to make a filesystem, mount and umount it,
etc., but I will not do anything for a clean example.
5) passemble -r all
Segmentation fault
And a log contains kernel BUG message similar to already
reported.
Logged In: YES
user_id=365026
What do the pool device names look like in /proc/partitions?
If you put /dev in front of the pool device name showing in
/proc/partitions, would that match an actual /dev node on
your machine (e.g. /dev/pool/pool0)?
Logged In: YES
user_id=611130
You may want to back down to 2.4.20. the 21pre sreies have a bunch of
block layer/ide changes. Some of those changes interefere with our patches,
and the only pre series kernel I tried so far showed lots of instability.
Logged In: NO
I tried 2.4.20 (with the only opengfs patch) and on another
machine. The kernel BUG still there, though ksymoops output
differs (and it is more reliable from my point of view). The
output goes below:
ksymoops 2.4.7 on i686 2.4.20-opengfs. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.20-opengfs/ (default)
-m /boot/System.map-2.4.20-opengfs (specified)
kernel BUG at block_dev.c:382!
invalid operand: 0000
CPU: 0
EIP: 0010:[bdput+44/192] Not tainted
EFLAGS: 00010202
eax: 00000001 ebx: d6236bf8 ecx: df5aff28 edx: 00000000
esi: df5aff20 edi: 00000000 ebp: 00000000 esp: c3c73bd4
ds: 0018 es: 0018 ss: 0018
Process passemble (pid: 14865, stackpage=c3c73000)
Stack: df5aff28 c03966e0 d6236bf8 00000000 00000000 e0922833
df5aff20 00000000
00000002 de3c9800 e0926b38 00000000 d10ba6a0 e0922e88
00000822 d6236bf8
00000000 e09268cc e09269ec 00000001 00000000 e0923825
e09268cc 00000002
Call Trace: [<e0922833>] [<e0926b38>] [<e0922e88>]
[<e09268cc>] [<e09269ec>]
[<e0923825>] [<e09268cc>] [<e09268a0>] [<e0926780>]
[<e0921c44>] [do_page_faul
t+0/1315]
Code: 0f 0b 7e 01 53 39 2e c0 8b 56 04 8d 7e 30 8b 06 89 02
8b 5e
Using defaults from ksymoops -t elf32-i386 -a i386
>>ebx; d6236bf8 <_end+15e41df4/204e225c>
>>ecx; df5aff28 <_end+1f1bb124/204e225c>
>>esi; df5aff20 <_end+1f1bb11c/204e225c>
>>esp; c3c73bd4 <_end+387edd0/204e225c>
Trace; e0922833 <[pool]bdclose+43/e0>
Trace; e0926b38 <[pool].bss.start+3b8/2980>
Trace; e0922e88 <[pool]close_devices+58/e0>
Trace; e09268cc <[pool].bss.start+14c/2980>
Trace; e09269ec <[pool].bss.start+26c/2980>
Trace; e0923825 <[pool]remove_pool+f5/100>
Trace; e09268cc <[pool].bss.start+14c/2980>
Trace; e09268a0 <[pool].bss.start+120/2980>
Trace; e0926780 <[pool]pools+0/0>
Trace; e0921c44 <[pool]pool_ioctl+734/8d0>
Code; 00000000 Before first symbol
00000000 <_EIP>:
Code; 00000000 Before first symbol
0: 0f 0b ud2a
Code; 00000002 Before first symbol
2: 7e 01 jle 5 <_EIP+0x5>
00000005 Before first symbol
Code; 00000004 Before first symbol
4: 53 push %ebx
Code; 00000005 Before first symbol
5: 39 2e cmp %ebp,(%esi)
Code; 00000007 Before first symbol
7: c0 8b 56 04 8d 7e 30 rorb $0x30,0x7e8d0456(%ebx)
Code; 0000000e Before first symbol
e: 8b 06 mov (%esi),%eax
Code; 00000010 Before first symbol
10: 89 02 mov %eax,(%edx)
Code; 00000012 Before first symbol
12: 8b 5e 00 mov 0x0(%esi),%ebx
Logged In: YES
user_id=365026
What do the pool device names look like in /proc/partitions?
If you put /dev in front of the pool device name showing in
/proc/partitions, would that match an actual /dev node on
your machine (e.g. /dev/pool/pool0)?
Logged In: YES
user_id=641943
> What do the pool device names look like in /proc/partitions?
# cat /proc/partitions
major minor #blocks name rio rmerge rsect ruse wio
wmerge wsect wuse running use aveq
121 1 2147483647 poolb 0 0 0 0 0 0 0 0 0 0 0
121 2 2147483647 poolc 0 0 0 0 0 0 0 0 0 0 0
.....
> If you put /dev in front of the pool device name showing in
> /proc/partitions, would that match an actual /dev node on
> your machine (e.g. /dev/pool/pool0)
No.
# ls -l /proc/pool
total 0
brw------- 1 root root 121, 1 Mar 14 12:45 pool0
brw------- 1 root root 121, 2 Mar 14 12:45
pool0cidev