#32 Kernel Panic when doing sg IO >4k over iscsi


I am running FC4 (2.6.11-1.1369_FC4) and have compiled
the linux-iscsi. driver for that kernel.

I know I should probably be using the open-iscsi but at
this point in the game it is not an option or one that
I really dont want to take.

The target is an Overland REO 4000 virtual tape
device....has to be a tape device, panic does not
happen with disk devices. When I use sg_utils
(sg_test_rwbuf) command to issue a w/r greater than a
page (4k) the system panics. I can do =<4k sg io all
day without issue. This only happens when the iSCSI
connection is across an gigE link....does not happen
when connection is across an 10/100 link. NIC's, both
10/100 and GIG, are Intel MB embedded adapters using
the e100 and e1000 respectively. Attached is a capture
of the call trace. This one has me twisting.....ugh!
Any help would be greatly appreciated.


sg_utils call:

./sg_test_rwbuf --size=4096 /dev/sg0 count=5 => this works
./sg_test_rwbuf --size=8192 /dev/sg0 count=5 => this panics


    Call trace of Kernel Panic

    Mike Christie

    what version of sg_utils is this with?

    Does this bug only occur with sg_test_rwbuf? Are other
    programs like sg_dd ok?

    Where can I get the linux-iscsi. ?

    I'm pretty sure open-iscsi will have the same problem. I
    tracked down a similiar issue in open-iscsi, which turned
    out to be a (bad) interaction between the sg driver using
    higher-order kmalloc calls for indirect I/O buffers (to get
    physically contiguous RAM), and the tcp_sendpage code doing
    get_page/put_page. It may be caused by the scsi_lib/block
    code not doing the right thing when ripping apart
    scatter/gather lists and rebuilding them 1 page at a time,
    since for higher-order allocations certain fields are only
    set on the first struct page of the allocation.

    I haven't looked an linux-iscsi in a long time, so I'm not
    sure what workaround/fix might be easiest. Some possible

    1) Use SG_FLAG_DIRECT to avoid indirect I/O.
    2) Somehow disable the use of sendpage in the iscsi kernel
    module (and add a data copy).
    3) Change the SG driver to not use higher-order allocations
    when the SCSI host has clustering disabled.
    4) Figure out if the SCSI/block code is doing something
    improper when it rebuilds scatter/gather lists, and make
    sure every page has a non-zero page count before being
    passed to the network stack.