Re: [Iscsitarget-devel] Kernel bug or?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi,

In parallel I raised same question on QuakeNet #debian channel. After some
research of iscsi_trgt.ko ix found :

16:19 <@ix> i checked on you iscsi problem and unfotunately it's not one of
the easily fixed ones | the BUG is caused be disk_check_ua
17:11 <@ix> hard to tell. disk_check_ua tries to fetch a unit attention
structure from a linked list and disk_check_ua does not expect that there
is no such
            structure. which is why it's using the fetched ua without
checking, hence NULL pointer deref when there is no such structure | i
don't know much
            about the scsi subsystem, but my best guess would be that newer
kernels also might not always deliver an ua | you could try to handle the
case
            that there is no ua

Hopes this info will be helpful on correcting the bug.

Some time after I upgraded kernel to 3.2.0-0.bpo.2-686-pae. This went fine,
without any problems. But there is a problem with adding DKMS:

DKMS make.log for iscsitarget-1.4.20.2 for kernel 3.2.0-0.bpo.2-686-pae
(i686)
make: Entering directory `/usr/src/linux-headers-3.2.0-0.bpo.2-686-pae'
  LD      /var/lib/dkms/iscsitarget/1.4.20.2/build/built-in.o
  LD      /var/lib/dkms/iscsitarget/1.4.20.2/build/kernel/built-in.o
  CC [M]  /var/lib/dkms/iscsitarget/1.4.20.2/build/kernel/tio.o
  CC [M]  /var/lib/dkms/iscsitarget/1.4.20.2/build/kernel/iscsi.o
  CC [M]  /var/lib/dkms/iscsitarget/1.4.20.2/build/kernel/nthread.o
  CC [M]  /var/lib/dkms/iscsitarget/1.4.20.2/build/kernel/wthread.o
/var/lib/dkms/iscsitarget/1.4.20.2/build/kernel/wthread.c: In function
‘worker_thread’:
/var/lib/dkms/iscsitarget/1.4.20.2/build/kernel/wthread.c:75: error:
implicit declaration of function ‘copy_io_context’
make[4]: *** [/var/lib/dkms/iscsitarget/1.4.20.2/build/kernel/wthread.o]
Error 1
make[3]: *** [/var/lib/dkms/iscsitarget/1.4.20.2/build/kernel] Error 2
make[2]: *** [_module_/var/lib/dkms/iscsitarget/1.4.20.2/build] Error 2
make[1]: *** [sub-make] Error 2
make: *** [all] Error 2
make: Leaving directory `/usr/src/linux-headers-3.2.0-0.bpo.2-686-pae'

Any ideas how to solve this issue? Hope to you feedback, thanks.

Regards,
Igor.

On Mon, Jul 16, 2012 at 3:34 PM, EpsiloN EpsiloN <ep...@gm...> wrote:

> Hi,
>
> I measured IO for some time before and after I tuned (packet elevator
> disabled, extended surface scan delay, write cache 100%, enabled device
> write cache)  HP SmartArray. During measurements I didn't find any problems
> with IO again before and after.
>
> Here are my results taken after SmartArtay tuning:
>
> # sar
>  12:00:01 AM     CPU      %usr     %nice      %sys   %iowait
> %steal      %irq     %soft    %guest     %idle
> Average:        all      0.01      0.00      1.64      0.00      0.00
> 0.02      0.52      0.00     97.81
>
> 12:00:01 AM  pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s
> pgscand/s pgsteal/s    %vmeff
> Average:        17.21   6512.16     72.98      0.00   1730.86
> 0.00      0.00      0.00      0.00
>
> 12:00:01 AM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit
> %commit
> Average:      2846544    258640      8.33    114553     77173
> 488793     13.6
>
> 12:00:01 AM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s
> txcmp/s  rxmcst/s
> Average:           lo      0.00      0.00      0.00      0.00
> 0.00      0.00      0.00
> Average:         eth2    112.56      0.14     35.88      0.01
> 0.00      0.00      0.00
> Average:         eth3   4840.63      0.14     31.34      0.01
> 0.00      0.00      0.00
> Average:         eth0     15.56   2509.20      6.05     55.30
> 0.00      0.00      0.02
> Average:         eth1     12.15      0.14      0.96      0.01
> 0.00      0.00      0.02
>
> 12:00:01 AM     IFACE   rxerr/s   txerr/s    coll/s  rxdrop/s  txdrop/s
> txcarr/s  rxfram/s  rxfifo/s  txfifo/s
> Average:           lo      0.00      0.00      0.00      0.00
> 0.00      0.00      0.00      0.00      0.00
> Average:         eth2      0.00      0.00      0.00      0.00
> 0.00      0.00      0.00      0.00      0.00
> Average:         eth3      0.00      0.00      0.00      0.00
> 0.00      0.00      0.00      0.00      0.00
> Average:         eth0      0.00      0.00      0.00      0.00
> 0.00      0.00      0.00      0.00      0.00
> Average:         eth1      0.00      0.00      0.00      0.00
> 0.00      0.00      0.00      0.00      0.00
>
> After this tuning server worked for a week, maybe a little bit more and
> stuck again:
>
> # messages
> Jul 14 11:23:46 ISCSI kernel: [951754.937998] BUG: unable to handle kernel
> NULL pointer dereference at 00000010
>
> Any idea what can be done? Upgrade to newest kernel? Just to remind, right
> now I am using Debian 6.0.5 with 2.6.32-5-686 kernel.
>
>
>
>
> Another interesting thing that Debian changes MAC addressed of of the
> interfaces:
>
> # daemon.log
> Jul 14 01:16:56 ISCSI ntpd[2355]: 192.168.10.3 interface 192.168.10.12 ->
> 192.168.10.9
> Jul 14 01:26:56 ISCSI ntpd[2355]: 192.168.10.3 interface 192.168.10.9 ->
> 192.168.10.8
> Jul 14 01:36:56 ISCSI ntpd[2355]: 192.168.10.3 interface 192.168.10.8 ->
> 192.168.10.12
> Jul 14 01:46:56 ISCSI ntpd[2355]: 192.168.10.3 interface 192.168.10.12 ->
> 192.168.10.9
> Jul 14 03:26:56 ISCSI ntpd[2355]: 192.168.10.3 interface 192.168.10.9 ->
> 192.168.10.8
> Jul 14 03:36:56 ISCSI ntpd[2355]: 192.168.10.3 interface 192.168.10.8 ->
> 192.168.10.12
> Jul 14 03:46:56 ISCSI ntpd[2355]: 192.168.10.3 interface 192.168.10.12 ->
> 192.168.10.12
> Jul 14 03:56:56 ISCSI ntpd[2355]: 192.168.10.3 interface 192.168.10.12 ->
> 192.168.10.9
> Jul 14 04:06:56 ISCSI ntpd[2355]: 192.168.10.3 interface 192.168.10.9 ->
> 192.168.10.12
> Jul 14 04:16:56 ISCSI ntpd[2355]: 192.168.10.3 interface 192.168.10.12 ->
> 192.168.10.9
> Jul 14 04:26:56 ISCSI ntpd[2355]: 192.168.10.3 interface 192.168.10.9 ->
> 192.168.10.8
> Jul 14 04:36:56 ISCSI ntpd[2355]: 192.168.10.3 interface 192.168.10.8 ->
> 192.168.10.12
> Jul 14 05:36:56 ISCSI ntpd[2355]: 192.168.10.3 interface 192.168.10.12 ->
> 192.168.10.12
>
> # dmesg
> [    8.700280] udev[591]: renamed network interface eth0 to eth0-eth2
> [    8.700754] udev[592]: renamed network interface eth1 to eth1-eth3
> [    8.708680] udev[562]: renamed network interface eth2 to eth0
> [    8.709146] udev[565]: renamed network interface eth3 to eth1
> [    8.750784] udev[591]: renamed network interface eth0-eth2 to eth2
> [    8.751307] udev[592]: renamed network interface eth1-eth3 to eth3
>
> This some kind of "Layer 2 fail-over". My server has 4 physical NICs and 4
> IPs assigned to ethX interfaces. I take three cables off and was able still
> ping all four IP addreses. My ARP table showed that a;; server's IP
> adressed now resolve to same MAC, MAC of the remained interface. Is this
> some kind on normal behavior?
>
> Regards,
> Igor.
>
>
>
> On Fri, Jun 29, 2012 at 2:45 PM, EpsiloN EpsiloN <ep...@gm...>wrote:
>
>> Hello,
>>
>> Thanks for the replies.
>> So I will measure the IO that filesystem is requesting and IO that I get
>> from RAID controller.
>> And create as many targets as possible. I will let you know about results.
>>
>> Regards,
>> Igor.
>>
>>
>> On Fri, Jun 29, 2012 at 1:45 PM, Emmanuel Florac <ef...@in...>wrote:
>>
>>> Le Thu, 28 Jun 2012 19:59:34 +0300
>>> Pasi Kärkkäinen <pa...@ik...> écrivait:
>>>
>>> > IETD shouldn't crash anyway..
>>> >
>>>
>>> Sure. As I said, he triggered a bug. However as he's using the standard
>>> Debian port, his best bet is a workaround rather than switching to
>>> trunk and recompile (without any warranty that this particular bug has
>>> been squashed).
>>>
>>> --
>>> ------------------------------------------------------------------------
>>> Emmanuel Florac     |   Direction technique
>>>                    |   Intellique
>>>                    |   <ef...@in...>
>>>                    |   +33 1 78 94 84 02
>>> ------------------------------------------------------------------------
>>>
>>
>>
>