I am running FC4 (2.6.11-1.1369_FC4) and have compiled
the linux-iscsi.4.0.2.2 driver for that kernel.
I know I should probably be using the open-iscsi but at
this point in the game it is not an option or one that
I really dont want to take.
The target is an Overland REO 4000 virtual tape
device....has to be a tape device, panic does not
happen with disk devices. When I use sg_utils
(sg_test_rwbuf) command to issue a w/r greater than a
page (4k) the system panics. I can do =<4k sg io all
day without issue. This only happens when the iSCSI
connection is across an gigE link....does not happen
when connection is across an 10/100 link. NIC's, both
10/100 and GIG, are Intel MB embedded adapters using
the e100 and e1000 respectively. Attached is a capture
of the call trace. This one has me twisting.....ugh!
Any help would be greatly appreciated.
Regards
Godfrey
sg_utils call:
./sg_test_rwbuf --size=4096 /dev/sg0 count=5 => this works
./sg_test_rwbuf --size=8192 /dev/sg0 count=5 => this panics
Call trace of Kernel Panic
Logged In: YES
user_id=827494
what version of sg_utils is this with?
Logged In: NO
Does this bug only occur with sg_test_rwbuf? Are other
programs like sg_dd ok?
Logged In: NO
Where can I get the linux-iscsi.4.0.2.2 ?
Thanks
Logged In: YES
user_id=40524
I'm pretty sure open-iscsi will have the same problem. I
tracked down a similiar issue in open-iscsi, which turned
out to be a (bad) interaction between the sg driver using
higher-order kmalloc calls for indirect I/O buffers (to get
physically contiguous RAM), and the tcp_sendpage code doing
get_page/put_page. It may be caused by the scsi_lib/block
code not doing the right thing when ripping apart
scatter/gather lists and rebuilding them 1 page at a time,
since for higher-order allocations certain fields are only
set on the first struct page of the allocation.
I haven't looked an linux-iscsi in a long time, so I'm not
sure what workaround/fix might be easiest. Some possible
workarounds/fixes:
1) Use SG_FLAG_DIRECT to avoid indirect I/O.
2) Somehow disable the use of sendpage in the iscsi kernel
module (and add a data copy).
3) Change the SG driver to not use higher-order allocations
when the SCSI host has clustering disabled.
4) Figure out if the SCSI/block code is doing something
improper when it rebuilds scatter/gather lists, and make
sure every page has a non-zero page count before being
passed to the network stack.