Thread: [Iscsitarget-devel] Iscsi Read Consumes 100% CPU Problem

iscsitarget-devel

[Iscsitarget-devel] Iscsi Read Consumes 100% CPU Problem

From: rae l <cr...@gm...> - 2008-08-29 03:41:36

A simple nullio configuration:

Target iqn.2001-04.com.example:storage.disk2.sys1.xyzz
	Lun 0 Type=nullio

A windows client initiator logon this target and use iometer to test it,
While iometer is reading, this cpu load is observed on the target server:

gektop@tux ~ $ dstat -M cpu,net 5
----total-cpu-usage---- -net/total-
usr sys idl wai hiq siq| recv  send
 30   4  66   0   0   0|   0     0
 23  41   0   0  17  19| 246k   10M
 22  36   0   0  21  21| 264k   11M
 24  38   0   0  17  21| 261k   11M
 24  36   0   0  21  19| 268k   11M

on the desktop computer the only one nic is a 100Mb network adapter,
so the network is near the limit,

but the problem is: why it consumes so high CPU?

on another server hardware with the same ietd.conf and dual-core CPU
and 1000Mb nic, it was observed that "sys,hiq,siq" consumes all the
CPU resources, too.

with systemtap debugging, I found the sendpage (really tcp_sendpage)
would very likely return -EAGAIN, I think this loop consumes so high
CPU,

but I have done a simple test, sendpage without MSG_DONTWAIT, in this
situation, the target consumed very low CPU resources, average under
1%, with the same nullio performance, 110MB, nearly the 1000Mb limit,

I'm not very clear about the meangings of tcp_sendpage with or without
MSG_DONTWAIT, so please review this patch:

Index: iscsitarget-r168/kernel/nthread.c
===================================================================
--- iscsitarget-r168.orig/kernel/nthread.c
+++ iscsitarget-r168/kernel/nthread.c
@@ -294,7 +294,7 @@ static int write_data(struct iscsi_conn
 	struct iovec *iop;
 	int saved_size, size, sendsize;
 	int offset, idx;
-	int flags, res;
+	int res;

 	file = conn->file;
 	saved_size = size = conn->write_size;
@@ -351,12 +351,11 @@ static int write_data(struct iscsi_conn

 	sock = conn->sock;
 	sendpage = sock->ops->sendpage ? : sock_no_sendpage;
-	flags = MSG_DONTWAIT;

 	while (1) {
 		sendsize = PAGE_CACHE_SIZE - offset;
 		if (size <= sendsize) {
-			res = sendpage(sock, tio->pvec[idx], offset, size, flags);
+			res = sendpage(sock, tio->pvec[idx], offset, size, 0);
 			dprintk(D_DATA, "%s %#Lx:%u: %d(%lu,%u,%u)\n",
 				sock->ops->sendpage ? "sendpage" : "writepage",
 				(unsigned long long ) conn->session->sid, conn->cid,
@@ -377,7 +376,7 @@ static int write_data(struct iscsi_conn
 			continue;
 		}

-		res = sendpage(sock, tio->pvec[idx], offset,sendsize, flags | MSG_MORE);
+		res = sendpage(sock, tio->pvec[idx], offset,sendsize, MSG_MORE);
 		dprintk(D_DATA, "%s %#Lx:%u: %d(%lu,%u,%u)\n",
 			sock->ops->sendpage ? "sendpage" : "writepage",
 			(unsigned long long ) conn->session->sid, conn->cid,


Thanks.

-- 
Denis Cheng
Linux Application Developer

"One of my most productive days was throwing away 1000 lines of code."
 - Ken Thompson.

Re: [Iscsitarget-devel] Iscsi Read Consumes 100% CPU Problem

From: Arne R. <ag...@po...> - 2008-08-29 15:55:45

Am Freitag, den 29.08.2008, 11:41 +0800 schrieb rae l:
> A simple nullio configuration:
> 
> Target iqn.2001-04.com.example:storage.disk2.sys1.xyzz
> 	Lun 0 Type=nullio
> 
> A windows client initiator logon this target and use iometer to test it,
> While iometer is reading, this cpu load is observed on the target server:
> 
> gektop@tux ~ $ dstat -M cpu,net 5
> ----total-cpu-usage---- -net/total-
> usr sys idl wai hiq siq| recv  send
>  30   4  66   0   0   0|   0     0
>  23  41   0   0  17  19| 246k   10M
>  22  36   0   0  21  21| 264k   11M
>  24  38   0   0  17  21| 261k   11M
>  24  36   0   0  21  19| 268k   11M
> 
> on the desktop computer the only one nic is a 100Mb network adapter,
> so the network is near the limit,
> 
> but the problem is: why it consumes so high CPU?
> 
> on another server hardware with the same ietd.conf and dual-core CPU
> and 1000Mb nic, it was observed that "sys,hiq,siq" consumes all the
> CPU resources, too.
> 
> with systemtap debugging, I found the sendpage (really tcp_sendpage)
> would very likely return -EAGAIN, I think this loop consumes so high
> CPU,
> 
> but I have done a simple test, sendpage without MSG_DONTWAIT, in this
> situation, the target consumed very low CPU resources, average under
> 1%, with the same nullio performance, 110MB, nearly the 1000Mb limit,
> 
> I'm not very clear about the meangings of tcp_sendpage with or without
> MSG_DONTWAIT, so please review this patch:

Can you give more details about your H/W, in particular how much RAM the
target has and whether you're running x86 or x86_64 (kernel)?

As you observed, MSG_DONTWAIT will lead to tcp_sendpage() returning
errors if it cannot get hold of memory for data transmission. Removing
this flag will make tcp_sendpage() try a bit harder.

I'll take a closer look at the implications of your patch, but I'd also
like to get some more data points before making such modifications -
anyone else willing to repeat the above tests?

Thanks,
Arne

> Index: iscsitarget-r168/kernel/nthread.c
> ===================================================================
> --- iscsitarget-r168.orig/kernel/nthread.c
> +++ iscsitarget-r168/kernel/nthread.c
> @@ -294,7 +294,7 @@ static int write_data(struct iscsi_conn
>  	struct iovec *iop;
>  	int saved_size, size, sendsize;
>  	int offset, idx;
> -	int flags, res;
> +	int res;
> 
>  	file = conn->file;
>  	saved_size = size = conn->write_size;
> @@ -351,12 +351,11 @@ static int write_data(struct iscsi_conn
> 
>  	sock = conn->sock;
>  	sendpage = sock->ops->sendpage ? : sock_no_sendpage;
> -	flags = MSG_DONTWAIT;
> 
>  	while (1) {
>  		sendsize = PAGE_CACHE_SIZE - offset;
>  		if (size <= sendsize) {
> -			res = sendpage(sock, tio->pvec[idx], offset, size, flags);
> +			res = sendpage(sock, tio->pvec[idx], offset, size, 0);
>  			dprintk(D_DATA, "%s %#Lx:%u: %d(%lu,%u,%u)\n",
>  				sock->ops->sendpage ? "sendpage" : "writepage",
>  				(unsigned long long ) conn->session->sid, conn->cid,
> @@ -377,7 +376,7 @@ static int write_data(struct iscsi_conn
>  			continue;
>  		}
> 
> -		res = sendpage(sock, tio->pvec[idx], offset,sendsize, flags | MSG_MORE);
> +		res = sendpage(sock, tio->pvec[idx], offset,sendsize, MSG_MORE);
>  		dprintk(D_DATA, "%s %#Lx:%u: %d(%lu,%u,%u)\n",
>  			sock->ops->sendpage ? "sendpage" : "writepage",
>  			(unsigned long long ) conn->session->sid, conn->cid,
> 
> 
> Thanks.
>

Re: [Iscsitarget-devel] Iscsi Read Consumes 100% CPU Problem

From: Arne R. <ag...@po...> - 2008-08-29 16:17:58

Am Freitag, den 29.08.2008, 17:55 +0200 schrieb Arne Redlich:
> Am Freitag, den 29.08.2008, 11:41 +0800 schrieb rae l:
> > A simple nullio configuration:
> > 
> > Target iqn.2001-04.com.example:storage.disk2.sys1.xyzz
> > 	Lun 0 Type=nullio
> > 
> > A windows client initiator logon this target and use iometer to test it,
> > While iometer is reading, this cpu load is observed on the target server:
> > 
> > gektop@tux ~ $ dstat -M cpu,net 5
> > ----total-cpu-usage---- -net/total-
> > usr sys idl wai hiq siq| recv  send
> >  30   4  66   0   0   0|   0     0
> >  23  41   0   0  17  19| 246k   10M
> >  22  36   0   0  21  21| 264k   11M
> >  24  38   0   0  17  21| 261k   11M
> >  24  36   0   0  21  19| 268k   11M
> > 
> > on the desktop computer the only one nic is a 100Mb network adapter,
> > so the network is near the limit,
> > 
> > but the problem is: why it consumes so high CPU?
> > 
> > on another server hardware with the same ietd.conf and dual-core CPU
> > and 1000Mb nic, it was observed that "sys,hiq,siq" consumes all the
> > CPU resources, too.
> > 
> > with systemtap debugging, I found the sendpage (really tcp_sendpage)
> > would very likely return -EAGAIN, I think this loop consumes so high
> > CPU,
> > 
> > but I have done a simple test, sendpage without MSG_DONTWAIT, in this
> > situation, the target consumed very low CPU resources, average under
> > 1%, with the same nullio performance, 110MB, nearly the 1000Mb limit,
> > 
> > I'm not very clear about the meangings of tcp_sendpage with or without
> > MSG_DONTWAIT, so please review this patch:
> 
> Can you give more details about your H/W, in particular how much RAM the
> target has and whether you're running x86 or x86_64 (kernel)?
> 
> As you observed, MSG_DONTWAIT will lead to tcp_sendpage() returning
> errors if it cannot get hold of memory for data transmission. Removing
> this flag will make tcp_sendpage() try a bit harder.

... at the expense of taking longer (sleeping), I forgot to add.

You might also want to play with your tcp wmem settings and see if that
improves the situation.

Arne
> 
> I'll take a closer look at the implications of your patch, but I'd also
> like to get some more data points before making such modifications -
> anyone else willing to repeat the above tests?
> 
> Thanks,
> Arne
> 
> > Index: iscsitarget-r168/kernel/nthread.c
> > ===================================================================
> > --- iscsitarget-r168.orig/kernel/nthread.c
> > +++ iscsitarget-r168/kernel/nthread.c
> > @@ -294,7 +294,7 @@ static int write_data(struct iscsi_conn
> >  	struct iovec *iop;
> >  	int saved_size, size, sendsize;
> >  	int offset, idx;
> > -	int flags, res;
> > +	int res;
> > 
> >  	file = conn->file;
> >  	saved_size = size = conn->write_size;
> > @@ -351,12 +351,11 @@ static int write_data(struct iscsi_conn
> > 
> >  	sock = conn->sock;
> >  	sendpage = sock->ops->sendpage ? : sock_no_sendpage;
> > -	flags = MSG_DONTWAIT;
> > 
> >  	while (1) {
> >  		sendsize = PAGE_CACHE_SIZE - offset;
> >  		if (size <= sendsize) {
> > -			res = sendpage(sock, tio->pvec[idx], offset, size, flags);
> > +			res = sendpage(sock, tio->pvec[idx], offset, size, 0);
> >  			dprintk(D_DATA, "%s %#Lx:%u: %d(%lu,%u,%u)\n",
> >  				sock->ops->sendpage ? "sendpage" : "writepage",
> >  				(unsigned long long ) conn->session->sid, conn->cid,
> > @@ -377,7 +376,7 @@ static int write_data(struct iscsi_conn
> >  			continue;
> >  		}
> > 
> > -		res = sendpage(sock, tio->pvec[idx], offset,sendsize, flags | MSG_MORE);
> > +		res = sendpage(sock, tio->pvec[idx], offset,sendsize, MSG_MORE);
> >  		dprintk(D_DATA, "%s %#Lx:%u: %d(%lu,%u,%u)\n",
> >  			sock->ops->sendpage ? "sendpage" : "writepage",
> >  			(unsigned long long ) conn->session->sid, conn->cid,
> > 
> > 
> > Thanks.
> > 
> 
> 
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Iscsitarget-devel mailing list
> Isc...@li...
> https://lists.sourceforge.net/lists/listinfo/iscsitarget-devel
>

Re: [Iscsitarget-devel] Iscsi Read Consumes 100% CPU Problem

From: rae l <cr...@gm...> - 2008-08-29 18:46:10

On Fri, Aug 29, 2008 at 11:55 PM, Arne Redlich <ag...@po...> wrote:
> Can you give more details about your H/W, in particular how much RAM the
> target has and whether you're running x86 or x86_64 (kernel)?
This "read iscsi consume high CPU" phenomenon is always reproducible
in our laboratory on serveral types of HW and SW:
1. x86 kernel 2.6.22.16 with iscsitarget-0.4.16, 2GB RAM;
2. x86_64 kernel 2.6.26.3 with iscsitarget-svn-r168, 4GB RAM;
3. destkop computer with x86_64 2.6.26.3, iscsitarget-r168, 1GB RAM;

>
> As you observed, MSG_DONTWAIT will lead to tcp_sendpage() returning
> errors if it cannot get hold of memory for data transmission. Removing
> this flag will make tcp_sendpage() try a bit harder.

On Sat, Aug 30, 2008 at 12:18 AM, Arne Redlich <ag...@po...> wrote:
>
> ... at the expense of taking longer (sleeping), I forgot to add.
>
> You might also want to play with your tcp wmem settings and see if that
> improves the situation.
tcp mem is set as the recommended value as in init scripts (/etc/init.d/ietd)

# sysctl -a |grep 'net.*mem'
net.ipv4.tcp_mem = 1048576	1048576	1048576
net.ipv4.tcp_wmem = 1048576	1048576	2056192
net.ipv4.tcp_rmem = 1048576	1048576	2056192
net.core.wmem_max = 1048576
net.core.rmem_max = 1048576
net.core.wmem_default = 1048576
net.core.rmem_default = 1048576

all 1048576.

We have tried with 2MB or more, but that seems do not help.

>
> I'll take a closer look at the implications of your patch, but I'd also
> like to get some more data points before making such modifications -
> anyone else willing to repeat the above tests?
This patch is very simple, just drop MSG_DONTWAIT when calling tcp_sendpage,
the flags variable is then not used, so removed, too.

>
> Thanks,
> Arne
>
>> Index: iscsitarget-r168/kernel/nthread.c
>> ===================================================================
>> --- iscsitarget-r168.orig/kernel/nthread.c
>> +++ iscsitarget-r168/kernel/nthread.c
>> @@ -294,7 +294,7 @@ static int write_data(struct iscsi_conn
>>       struct iovec *iop;
>>       int saved_size, size, sendsize;
>>       int offset, idx;
>> -     int flags, res;
>> +     int res;
>>
>>       file = conn->file;
>>       saved_size = size = conn->write_size;
>> @@ -351,12 +351,11 @@ static int write_data(struct iscsi_conn
>>
>>       sock = conn->sock;
>>       sendpage = sock->ops->sendpage ? : sock_no_sendpage;
>> -     flags = MSG_DONTWAIT;
>>
>>       while (1) {
>>               sendsize = PAGE_CACHE_SIZE - offset;
>>               if (size <= sendsize) {
>> -                     res = sendpage(sock, tio->pvec[idx], offset, size, flags);
>> +                     res = sendpage(sock, tio->pvec[idx], offset, size, 0);
>>                       dprintk(D_DATA, "%s %#Lx:%u: %d(%lu,%u,%u)\n",
>>                               sock->ops->sendpage ? "sendpage" : "writepage",
>>                               (unsigned long long ) conn->session->sid, conn->cid,
>> @@ -377,7 +376,7 @@ static int write_data(struct iscsi_conn
>>                       continue;
>>               }
>>
>> -             res = sendpage(sock, tio->pvec[idx], offset,sendsize, flags | MSG_MORE);
>> +             res = sendpage(sock, tio->pvec[idx], offset,sendsize, MSG_MORE);
>>               dprintk(D_DATA, "%s %#Lx:%u: %d(%lu,%u,%u)\n",
>>                       sock->ops->sendpage ? "sendpage" : "writepage",
>>                       (unsigned long long ) conn->session->sid, conn->cid,

Re: [Iscsitarget-devel] Iscsi Read Consumes 100% CPU Problem

From: Ross S. W. W. <RW...@me...> - 2008-08-29 18:57:22

rae l wrote:
> 
> On Fri, Aug 29, 2008 at 11:55 PM, Arne Redlich 
> <ag...@po...> wrote:
> > Can you give more details about your H/W, in particular how much RAM the
> > target has and whether you're running x86 or x86_64 (kernel)?
> 
> This "read iscsi consume high CPU" phenomenon is always reproducible
> in our laboratory on serveral types of HW and SW:
> 1. x86 kernel 2.6.22.16 with iscsitarget-0.4.16, 2GB RAM;
> 2. x86_64 kernel 2.6.26.3 with iscsitarget-svn-r168, 4GB RAM;
> 3. destkop computer with x86_64 2.6.26.3, iscsitarget-r168, 1GB RAM;
> 

Which distro?

Are these hosts only running IET at the time?

What is the config of the nullio luns?

> >
> > As you observed, MSG_DONTWAIT will lead to tcp_sendpage() returning
> > errors if it cannot get hold of memory for data transmission. Removing
> > this flag will make tcp_sendpage() try a bit harder.
> >
> > ... at the expense of taking longer (sleeping), I forgot to add.
> >
> > You might also want to play with your tcp wmem settings and see if that
> > improves the situation.
> 
> tcp mem is set as the recommended value as in init scripts 
> (/etc/init.d/ietd)
> 
> # sysctl -a |grep 'net.*mem'
> net.ipv4.tcp_mem = 1048576	1048576	1048576
> net.ipv4.tcp_wmem = 1048576	1048576	2056192
> net.ipv4.tcp_rmem = 1048576	1048576	2056192
> net.core.wmem_max = 1048576
> net.core.rmem_max = 1048576
> net.core.wmem_default = 1048576
> net.core.rmem_default = 1048576
> 
> all 1048576.
> 
> We have tried with 2MB or more, but that seems do not help.

We actually don't suggest pre-setting these at all any more. The 2.6
kernel's TCP stack has under gone major changes since these values
were first proposed and now we suggest you let your stack self-tune
to the proper running values, so remove the sysctl settings from
the init scripts and let us know if anything changes.

> >
> > I'll take a closer look at the implications of your patch, but I'd also
> > like to get some more data points before making such modifications -
> > anyone else willing to repeat the above tests?
> 
> This patch is very simple, just drop MSG_DONTWAIT when calling tcp_sendpage,
> the flags variable is then not used, so removed, too.
> 

I haven't seen this happen with nullio on the RHEL 2.6.18 kernels, but
I can try a newer kernel over the weekend and see if I can reproduce
it on FC8 or FC9.

-Ross

______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.

Re: [Iscsitarget-devel] Iscsi Read Consumes 100% CPU Problem

From: rae l <cr...@gm...> - 2008-08-31 14:44:51

On Sat, Aug 30, 2008 at 2:57 AM, Ross S. W. Walker
<RW...@me...> wrote:
> Which distro?
Distro not important here, in fact serveral distros tested:
1. Gentoo 2008.0 x64
2. CentOS-5.1 x64
3. Our production server, a completely self built system from scratch
(LFS similar);

we tested 2.6.22-26, x86-32 and x86-64, not the default distro kernel,

>
> Are these hosts only running IET at the time?
Yes, when we were testing iSCSI functionality, only ietd server is
running on the server, (of course with sshd and so on, those proved no
problem)

>
> What is the config of the nullio luns?
In the first mail:

A simple nullio configuration:

Target iqn.2001-04.com.example:storage.disk2.sys1.xyzz
       Lun 0 Type=nullio

All the thing originates:
1. create a software RAID5 (md) with 11 SATA drives, the bandwidth is 400+ MB/s;
2. create LVM on RAID5;
3. iscsi target with Lun 0 Type=fileio or blockio, Path=/dev/vg1/lv1
4. iscsi initiator logon this target and read its content;

then 100% CPU consuming observed on ther target server,

if the client stop to read, the target server's load will fall to 99%
idle immediately.

so the problem must be from ietd and its kernel module, right?

> We actually don't suggest pre-setting these at all any more. The 2.6
> kernel's TCP stack has under gone major changes since these values
> were first proposed and now we suggest you let your stack self-tune
> to the proper running values, so remove the sysctl settings from
> the init scripts and let us know if anything changes.
This is also tested, but that doesn't help, we have tried all kinds of
tcp mem configure, including no tcp mem sysctl configuration.

>
>> >
>> > I'll take a closer look at the implications of your patch, but I'd also
>> > like to get some more data points before making such modifications -
>> > anyone else willing to repeat the above tests?
>>
>> This patch is very simple, just drop MSG_DONTWAIT when calling tcp_sendpage,
>> the flags variable is then not used, so removed, too.
>>
>
> I haven't seen this happen with nullio on the RHEL 2.6.18 kernels, but
> I can try a newer kernel over the weekend and see if I can reproduce
> it on FC8 or FC9.

On Sat, Aug 30, 2008 at 3:03 AM, Arne Redlich <ag...@po...> wrote:
> It _looks_ simple, but the implications of it aren't: The function is
> called from the network thread (there is one per target), so if
> tcp_sendpage sends the thread to sleep because a connection's socket has
> no buffers, all other connections to this target also have to wait. With
> MSG_DONTWAIT set, another connection could be served meanwhile.
>
> So this needs to be adressed differently.
You mean there will be problems with multiple clients if dropping MSG_DONTWAIT?

I'll test that.

>
> HTH,
> Arne


-- 
Denis ChengRq
Linux Application Developer

"One of my most productive days was throwing away 1000 lines of code."
 - Ken Thompson.

Re: [Iscsitarget-devel] Iscsi Read Consumes 100% CPU Problem

From: Arne R. <ag...@po...> - 2008-09-01 05:44:15

Am Sonntag, den 31.08.2008, 22:45 +0800 schrieb rae l:
> On Sat, Aug 30, 2008 at 2:57 AM, Ross S. W. Walker
> <RW...@me...> wrote:
> > Which distro?
> Distro not important here, in fact serveral distros tested:
> 1. Gentoo 2008.0 x64
> 2. CentOS-5.1 x64
> 3. Our production server, a completely self built system from scratch
> (LFS similar);
> 
> we tested 2.6.22-26, x86-32 and x86-64, not the default distro kernel,
> 
> >
> > Are these hosts only running IET at the time?
> Yes, when we were testing iSCSI functionality, only ietd server is
> running on the server, (of course with sshd and so on, those proved no
> problem)
> 
> >
> > What is the config of the nullio luns?
> In the first mail:
> 
> A simple nullio configuration:
> 
> Target iqn.2001-04.com.example:storage.disk2.sys1.xyzz
>        Lun 0 Type=nullio
> 
> All the thing originates:
> 1. create a software RAID5 (md) with 11 SATA drives, the bandwidth is 400+ MB/s;

That's a bit oversized, no? Unless you have a very specific application
or a very small chunk size you're very unlikely to write full stripes
with that many disks, leading to read-modify-write degrading your
performance. And I'd also be a bit nervous having only a single
redundant disk at this size.

So you're also seeing the 100% CPU load with this setup? Strange.

Did you test your components individually, e.g. what happens if you
perform I/O locally on the DM/MD devices? Which results does netperf
yield between your target and initiator boxes?

> >>
> >> This patch is very simple, just drop MSG_DONTWAIT when calling tcp_sendpage,
> >> the flags variable is then not used, so removed, too.
> >>
> >
> > I haven't seen this happen with nullio on the RHEL 2.6.18 kernels, but
> > I can try a newer kernel over the weekend and see if I can reproduce
> > it on FC8 or FC9.

Thanks Ross, I cannot test it myself at the moment. Please keep me
updated.

> On Sat, Aug 30, 2008 at 3:03 AM, Arne Redlich <ag...@po...> wrote:
> > It _looks_ simple, but the implications of it aren't: The function is
> > called from the network thread (there is one per target), so if
> > tcp_sendpage sends the thread to sleep because a connection's socket has
> > no buffers, all other connections to this target also have to wait. With
> > MSG_DONTWAIT set, another connection could be served meanwhile.
> >
> > So this needs to be adressed differently.
> You mean there will be problems with multiple clients if dropping MSG_DONTWAIT?

Yes.

Cheers,
Arne

Re: [Iscsitarget-devel] Iscsi Read Consumes 100% CPU Problem

From: Arne R. <ag...@po...> - 2008-08-29 19:03:00

Am Samstag, den 30.08.2008, 02:46 +0800 schrieb rae l:
> On Fri, Aug 29, 2008 at 11:55 PM, Arne Redlich <ag...@po...> wrote:
> > Can you give more details about your H/W, in particular how much RAM the
> > target has and whether you're running x86 or x86_64 (kernel)?
> This "read iscsi consume high CPU" phenomenon is always reproducible
> in our laboratory on serveral types of HW and SW:
> 1. x86 kernel 2.6.22.16 with iscsitarget-0.4.16, 2GB RAM;
> 2. x86_64 kernel 2.6.26.3 with iscsitarget-svn-r168, 4GB RAM;
> 3. destkop computer with x86_64 2.6.26.3, iscsitarget-r168, 1GB RAM;
> 
> >
> > As you observed, MSG_DONTWAIT will lead to tcp_sendpage() returning
> > errors if it cannot get hold of memory for data transmission. Removing
> > this flag will make tcp_sendpage() try a bit harder.
> 
> On Sat, Aug 30, 2008 at 12:18 AM, Arne Redlich <ag...@po...> wrote:
> >
> > ... at the expense of taking longer (sleeping), I forgot to add.
> >
> > You might also want to play with your tcp wmem settings and see if that
> > improves the situation.
> tcp mem is set as the recommended value as in init scripts (/etc/init.d/ietd)
> 
> # sysctl -a |grep 'net.*mem'
> net.ipv4.tcp_mem = 1048576	1048576	1048576
> net.ipv4.tcp_wmem = 1048576	1048576	2056192
> net.ipv4.tcp_rmem = 1048576	1048576	2056192
> net.core.wmem_max = 1048576
> net.core.rmem_max = 1048576
> net.core.wmem_default = 1048576
> net.core.rmem_default = 1048576
> 
> all 1048576.
> 
> We have tried with 2MB or more, but that seems do not help.
> 
> >
> > I'll take a closer look at the implications of your patch, but I'd also
> > like to get some more data points before making such modifications -
> > anyone else willing to repeat the above tests?
> This patch is very simple, just drop MSG_DONTWAIT when calling tcp_sendpage,
> the flags variable is then not used, so removed, too.

It _looks_ simple, but the implications of it aren't: The function is
called from the network thread (there is one per target), so if
tcp_sendpage sends the thread to sleep because a connection's socket has
no buffers, all other connections to this target also have to wait. With
MSG_DONTWAIT set, another connection could be served meanwhile.

So this needs to be adressed differently.

HTH,
Arne

Re: [Iscsitarget-devel] Iscsi Read Consumes 100% CPU Problem

From: Arne R. <ag...@po...> - 2008-09-01 05:55:31

Am Montag, den 01.09.2008, 07:44 +0200 schrieb Arne Redlich:
> Am Sonntag, den 31.08.2008, 22:45 +0800 schrieb rae l:
> > On Sat, Aug 30, 2008 at 2:57 AM, Ross S. W. Walker
> > <RW...@me...> wrote:
> > > Which distro?
> > Distro not important here, in fact serveral distros tested:
> > 1. Gentoo 2008.0 x64
> > 2. CentOS-5.1 x64
> > 3. Our production server, a completely self built system from scratch
> > (LFS similar);
> > 
> > we tested 2.6.22-26, x86-32 and x86-64, not the default distro kernel,

Forgot to ask: what NICs are you using?

Thanks,
Arne

Re: [Iscsitarget-devel] Iscsi Read Consumes 100% CPU Problem

From: Ross S. W. W. <RW...@me...> - 2008-09-01 22:03:46

Arne Redlich wrote:
> Am Montag, den 01.09.2008, 07:44 +0200 schrieb Arne Redlich:
> > Am Sonntag, den 31.08.2008, 22:45 +0800 schrieb rae l:
> > > On Sat, Aug 30, 2008 at 2:57 AM, Ross S. W. Walker
> > > <RW...@me...> wrote:
> > > > Which distro?
> > > Distro not important here, in fact serveral distros tested:
> > > 1. Gentoo 2008.0 x64
> > > 2. CentOS-5.1 x64
> > > 3. Our production server, a completely self built system from scratch
> > > (LFS similar);
> > > 
> > > we tested 2.6.22-26, x86-32 and x86-64, not the default distro kernel,
> 
> Forgot to ask: what NICs are you using?

Yes, yes, I just remember reading on one of the lists CentOS or Xen where
a poster had high CPU, and even CPU usage when there was no activity on
some NIC, can't remember which, I'll look it up. Anyways the answer was
to upgrade the NIC driver from the stock kernel driver to the manufacturers
posted driver.

If you can try that on one of your distros just to see if that is the
fix and let us know the make/model of the card.

nullio really only stresses out the NIC during usage, so it makes sense
that that should be one area to concentrate on.

-Ross

______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.

Re: [Iscsitarget-devel] Iscsi Read Consumes 100% CPU Problem

From: Ross S. W. W. <RW...@me...> - 2008-09-01 22:12:14

Ross S. W. Walker wrote:
> Arne Redlich wrote:
> > Am Montag, den 01.09.2008, 07:44 +0200 schrieb Arne Redlich:
> > > Am Sonntag, den 31.08.2008, 22:45 +0800 schrieb rae l:
> > > > On Sat, Aug 30, 2008 at 2:57 AM, Ross S. W. Walker
> > > > <RW...@me...> wrote:
> > > > > Which distro?
> > > > Distro not important here, in fact serveral distros tested:
> > > > 1. Gentoo 2008.0 x64
> > > > 2. CentOS-5.1 x64
> > > > 3. Our production server, a completely self built system from scratch
> > > > (LFS similar);
> > > > 
> > > > we tested 2.6.22-26, x86-32 and x86-64, not the default distro kernel,
> > 
> > Forgot to ask: what NICs are you using?
> 
> Yes, yes, I just remember reading on one of the lists CentOS or Xen where
> a poster had high CPU, and even CPU usage when there was no activity on
> some NIC, can't remember which, I'll look it up. Anyways the answer was
> to upgrade the NIC driver from the stock kernel driver to the manufacturers
> posted driver.
> 
> If you can try that on one of your distros just to see if that is the
> fix and let us know the make/model of the card.
> 
> nullio really only stresses out the NIC during usage, so it makes sense
> that that should be one area to concentrate on.

The thread I was thinking of:

http://lists.centos.org/pipermail/centos/2008-July/061249.html

-Ross

______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.

Re: [Iscsitarget-devel] Iscsi Read Consumes 100% CPU Problem

From: rae l <cr...@gm...> - 2008-09-02 01:57:57

On Mon, Sep 1, 2008 at 1:55 PM, Arne Redlich <ag...@po...> wrote:
>> > we tested 2.6.22-26, x86-32 and x86-64, not the default distro kernel,
>
> Forgot to ask: what NICs are you using?

Two types of NICs we have tested:

04:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit
Ethernet Controller (Copper) (rev 01)
	Subsystem: Inventec Corporation Device 0023
	Flags: bus master, fast devsel, latency 0, IRQ 381
	Memory at fcdc0000 (32-bit, non-prefetchable) [size=128K]
	I/O ports at ce80 [size=32]
	Capabilities: [c8] Power Management version 2
	Capabilities: [d0] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+
	Capabilities: [e0] Express Endpoint, MSI 00
	Capabilities: [100] Advanced Error Reporting <?>
	Capabilities: [140] Device Serial Number 5c-fd-e7-ff-ff-d1-a0-00
	Kernel driver in use: e1000

02:09.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX (rev 02)
	Subsystem: Dell Device 01e5
	Flags: bus master, fast devsel, latency 64, IRQ 10
	Memory at dfcfe000 (32-bit, non-prefetchable) [size=8K]
	Capabilities: <access denied>
	Kernel driver in use: b44
	Kernel modules: b44

One gigabit and one megabit, both have high CPU consuming when iSCSI reading.

>
> Thanks,
> Arne

On Tue, Sep 2, 2008 at 6:03 AM, Ross S. W. Walker <RW...@me...> wrote:
> Yes, yes, I just remember reading on one of the lists CentOS or Xen where
> a poster had high CPU, and even CPU usage when there was no activity on
> some NIC, can't remember which, I'll look it up. Anyways the answer was
> to upgrade the NIC driver from the stock kernel driver to the manufacturers
> posted driver.
But I don't think the NIC drivers have problems:
1. All other network applications (NFS and Samba and http) performs
well, 110MB/s on gigabit and 12MB/s on megabit, all near the limit of
the NIC.

>
> If you can try that on one of your distros just to see if that is the
> fix and let us know the make/model of the card.
>
> nullio really only stresses out the NIC during usage, so it makes sense
> that that should be one area to concentrate on.
>
> -Ross

On Tue, Sep 2, 2008 at 6:12 AM, Ross S. W. Walker <RW...@me...> wrote:
> The thread I was thinking of:
>
> http://lists.centos.org/pipermail/centos/2008-July/061249.html

The difference with this:
1. in our scenario, if iSCSI initiators don't read, the target CPU
fall to idle 99%, immediately; if initiators begin to read, target CPU
will climb up to 100% (sys+hiq+siq), all immediately;

Re: [Iscsitarget-devel] Iscsi Read Consumes 100% CPU Problem

From: Ross S. W. W. <RW...@me...> - 2008-09-02 18:24:10

rae l wrote:
> On Mon, Sep 1, 2008 at 1:55 PM, Arne Redlich 
> <ag...@po...> wrote:
> >> > we tested 2.6.22-26, x86-32 and x86-64, not the default distro kernel,
> >
> > Forgot to ask: what NICs are you using?
> 
> Two types of NICs we have tested:
> 
> 04:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit
> Ethernet Controller (Copper) (rev 01)
> 	Subsystem: Inventec Corporation Device 0023
> 	Flags: bus master, fast devsel, latency 0, IRQ 381
> 	Memory at fcdc0000 (32-bit, non-prefetchable) [size=128K]
> 	I/O ports at ce80 [size=32]
> 	Capabilities: [c8] Power Management version 2
> 	Capabilities: [d0] Message Signalled Interrupts: Mask- 
> 64bit+ Queue=0/0 Enable+
> 	Capabilities: [e0] Express Endpoint, MSI 00
> 	Capabilities: [100] Advanced Error Reporting <?>
> 	Capabilities: [140] Device Serial Number 5c-fd-e7-ff-ff-d1-a0-00
> 	Kernel driver in use: e1000
> 
> 02:09.0 Ethernet controller: Broadcom Corporation BCM4401-B0 
> 100Base-TX (rev 02)
> 	Subsystem: Dell Device 01e5
> 	Flags: bus master, fast devsel, latency 64, IRQ 10
> 	Memory at dfcfe000 (32-bit, non-prefetchable) [size=8K]
> 	Capabilities: <access denied>
> 	Kernel driver in use: b44
> 	Kernel modules: b44
> 
> One gigabit and one megabit, both have high CPU consuming 
> when iSCSI reading.

I haven't been able to reproduce this yet (haven't had access to
the test machine).

I can tell you though that for nullio targets all write data
is discarded and all read data is uninitialized memory, or
basically random data.

If you see high load on reading and not writing I would
check the transmit path of the server. Comparing NFS/CIFS
and iSCSI usage of the network adapter isn't quite the
same.

Can you give us a 'modinfo e1000' and a 'modinfo b44'?

Can you also run a 'vmstat 1' during the high CPU usage
and send us a screen's worth.

How are the initiators configured? What is the MRDSL set
to?

-Ross

______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.

Re: [Iscsitarget-devel] Iscsi Read Consumes 100% CPU Problem

From: Ming Z. <bla...@gm...> - 2008-09-02 18:26:37

my 2c, a 5 second tcpdump from login to read might reveal a lot of
things. ;)


On Tue, 2008-09-02 at 14:24 -0400, Ross S. W. Walker wrote:
> rae l wrote:
> > On Mon, Sep 1, 2008 at 1:55 PM, Arne Redlich 
> > <ag...@po...> wrote:
> > >> > we tested 2.6.22-26, x86-32 and x86-64, not the default distro kernel,
> > >
> > > Forgot to ask: what NICs are you using?
> > 
> > Two types of NICs we have tested:
> > 
> > 04:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit
> > Ethernet Controller (Copper) (rev 01)
> > 	Subsystem: Inventec Corporation Device 0023
> > 	Flags: bus master, fast devsel, latency 0, IRQ 381
> > 	Memory at fcdc0000 (32-bit, non-prefetchable) [size=128K]
> > 	I/O ports at ce80 [size=32]
> > 	Capabilities: [c8] Power Management version 2
> > 	Capabilities: [d0] Message Signalled Interrupts: Mask- 
> > 64bit+ Queue=0/0 Enable+
> > 	Capabilities: [e0] Express Endpoint, MSI 00
> > 	Capabilities: [100] Advanced Error Reporting <?>
> > 	Capabilities: [140] Device Serial Number 5c-fd-e7-ff-ff-d1-a0-00
> > 	Kernel driver in use: e1000
> > 
> > 02:09.0 Ethernet controller: Broadcom Corporation BCM4401-B0 
> > 100Base-TX (rev 02)
> > 	Subsystem: Dell Device 01e5
> > 	Flags: bus master, fast devsel, latency 64, IRQ 10
> > 	Memory at dfcfe000 (32-bit, non-prefetchable) [size=8K]
> > 	Capabilities: <access denied>
> > 	Kernel driver in use: b44
> > 	Kernel modules: b44
> > 
> > One gigabit and one megabit, both have high CPU consuming 
> > when iSCSI reading.
> 
> I haven't been able to reproduce this yet (haven't had access to
> the test machine).
> 
> I can tell you though that for nullio targets all write data
> is discarded and all read data is uninitialized memory, or
> basically random data.
> 
> If you see high load on reading and not writing I would
> check the transmit path of the server. Comparing NFS/CIFS
> and iSCSI usage of the network adapter isn't quite the
> same.
> 
> Can you give us a 'modinfo e1000' and a 'modinfo b44'?
> 
> Can you also run a 'vmstat 1' during the high CPU usage
> and send us a screen's worth.
> 
> How are the initiators configured? What is the MRDSL set
> to?
> 
> -Ross
> 
> ______________________________________________________________________
> This e-mail, and any attachments thereto, is intended only for use by
> the addressee(s) named herein and may contain legally privileged
> and/or confidential information. If you are not the intended recipient
> of this e-mail, you are hereby notified that any dissemination,
> distribution or copying of this e-mail, and any attachments thereto,
> is strictly prohibited. If you have received this e-mail in error,
> please immediately notify the sender and permanently delete the
> original and any copy or printout thereof.
> 
> 
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Iscsitarget-devel mailing list
> Isc...@li...
> https://lists.sourceforge.net/lists/listinfo/iscsitarget-devel

Re: [Iscsitarget-devel] Iscsi Read Consumes 100% CPU Problem

From: Ross S. W. W. <RW...@me...> - 2008-09-02 19:02:47

Ming Zhang wrote:
> 
> my 2c, a 5 second tcpdump from login to read might reveal a lot of
> things. ;)
> 

Yes, specifically what segment sizes were actually negotiated on,
but also the TCP mss too.

If the MRDSL negotiated was too small, say 256 bytes like another
poster accidentally set, then the CPU might be dying by interrupt
load, which a vmstat can also point out.

Ok, so starting with a system with no initiators connected, do a:

# tcpdump -c 750 -i <interface> -w iscsi.dmp tcp port 3260

Connect with 1 initiator starting a read right away. Then send
the output to me, compressed if >100K and I'll take a look at
it.

-Ross

______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.

Re: [Iscsitarget-devel] Iscsi Read Consumes 100% CPU Problem

From: Arne R. <ag...@po...> - 2008-09-02 18:55:10

Am Dienstag, den 02.09.2008, 14:26 -0400 schrieb Ming Zhang:
> my 2c, a 5 second tcpdump from login to read might reveal a lot of
> things. ;)

Splendid idea, Ming :)

Denis, could you provide one?

Thanks,
Arne
> 
> On Tue, 2008-09-02 at 14:24 -0400, Ross S. W. Walker wrote:
> > rae l wrote:
> > > On Mon, Sep 1, 2008 at 1:55 PM, Arne Redlich 
> > > <ag...@po...> wrote:
> > > >> > we tested 2.6.22-26, x86-32 and x86-64, not the default distro kernel,
> > > >
> > > > Forgot to ask: what NICs are you using?
> > > 
> > > Two types of NICs we have tested:
> > > 
> > > 04:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit
> > > Ethernet Controller (Copper) (rev 01)
> > > 	Subsystem: Inventec Corporation Device 0023
> > > 	Flags: bus master, fast devsel, latency 0, IRQ 381
> > > 	Memory at fcdc0000 (32-bit, non-prefetchable) [size=128K]
> > > 	I/O ports at ce80 [size=32]
> > > 	Capabilities: [c8] Power Management version 2
> > > 	Capabilities: [d0] Message Signalled Interrupts: Mask- 
> > > 64bit+ Queue=0/0 Enable+
> > > 	Capabilities: [e0] Express Endpoint, MSI 00
> > > 	Capabilities: [100] Advanced Error Reporting <?>
> > > 	Capabilities: [140] Device Serial Number 5c-fd-e7-ff-ff-d1-a0-00
> > > 	Kernel driver in use: e1000
> > > 
> > > 02:09.0 Ethernet controller: Broadcom Corporation BCM4401-B0 
> > > 100Base-TX (rev 02)
> > > 	Subsystem: Dell Device 01e5
> > > 	Flags: bus master, fast devsel, latency 64, IRQ 10
> > > 	Memory at dfcfe000 (32-bit, non-prefetchable) [size=8K]
> > > 	Capabilities: <access denied>
> > > 	Kernel driver in use: b44
> > > 	Kernel modules: b44
> > > 
> > > One gigabit and one megabit, both have high CPU consuming 
> > > when iSCSI reading.
> > 
> > I haven't been able to reproduce this yet (haven't had access to
> > the test machine).
> > 
> > I can tell you though that for nullio targets all write data
> > is discarded and all read data is uninitialized memory, or
> > basically random data.
> > 
> > If you see high load on reading and not writing I would
> > check the transmit path of the server. Comparing NFS/CIFS
> > and iSCSI usage of the network adapter isn't quite the
> > same.
> > 
> > Can you give us a 'modinfo e1000' and a 'modinfo b44'?
> > 
> > Can you also run a 'vmstat 1' during the high CPU usage
> > and send us a screen's worth.
> > 
> > How are the initiators configured? What is the MRDSL set
> > to?
> > 
> > -Ross
> > 
> > ______________________________________________________________________
> > This e-mail, and any attachments thereto, is intended only for use by
> > the addressee(s) named herein and may contain legally privileged
> > and/or confidential information. If you are not the intended recipient
> > of this e-mail, you are hereby notified that any dissemination,
> > distribution or copying of this e-mail, and any attachments thereto,
> > is strictly prohibited. If you have received this e-mail in error,
> > please immediately notify the sender and permanently delete the
> > original and any copy or printout thereof.
> > 
> > 
> > -------------------------------------------------------------------------
> > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> > Build the coolest Linux based applications with Moblin SDK & win great prizes
> > Grand prize is a trip for two to an Open Source event anywhere in the world
> > http://moblin-contest.org/redirect.php?banner_id=100&url=/
> > _______________________________________________
> > Iscsitarget-devel mailing list
> > Isc...@li...
> > https://lists.sourceforge.net/lists/listinfo/iscsitarget-devel
> 
> 
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Iscsitarget-devel mailing list
> Isc...@li...
> https://lists.sourceforge.net/lists/listinfo/iscsitarget-devel
>

Re: [Iscsitarget-devel] Iscsi Read Consumes 100% CPU Problem

From: Ming Z. <bla...@gm...> - 2008-09-02 19:02:36

On Tue, 2008-09-02 at 20:55 +0200, Arne Redlich wrote:
> Am Dienstag, den 02.09.2008, 14:26 -0400 schrieb Ming Zhang:
> > my 2c, a 5 second tcpdump from login to read might reveal a lot of
> > things. ;)
> 
> Splendid idea, Ming :)

thanks.

also i know a lot of the company technical support group have scripts
that can collect various information from servers without rounds and
rounds of emails. can we have one?

including these can save a lot of time, i believe...

uname -a
cat /proc/cpuinfo
cat /proc/meminfo
cat /proc/net/iet/*
lsmod
lspci
dmesg
lsscsi
ethtool ...
cat /etc/ietd.conf
...



> 
> Denis, could you provide one?
> 
> Thanks,
> Arne
> > 
> > On Tue, 2008-09-02 at 14:24 -0400, Ross S. W. Walker wrote:
> > > rae l wrote:
> > > > On Mon, Sep 1, 2008 at 1:55 PM, Arne Redlich 
> > > > <ag...@po...> wrote:
> > > > >> > we tested 2.6.22-26, x86-32 and x86-64, not the default distro kernel,
> > > > >
> > > > > Forgot to ask: what NICs are you using?
> > > > 
> > > > Two types of NICs we have tested:
> > > > 
> > > > 04:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit
> > > > Ethernet Controller (Copper) (rev 01)
> > > > 	Subsystem: Inventec Corporation Device 0023
> > > > 	Flags: bus master, fast devsel, latency 0, IRQ 381
> > > > 	Memory at fcdc0000 (32-bit, non-prefetchable) [size=128K]
> > > > 	I/O ports at ce80 [size=32]
> > > > 	Capabilities: [c8] Power Management version 2
> > > > 	Capabilities: [d0] Message Signalled Interrupts: Mask- 
> > > > 64bit+ Queue=0/0 Enable+
> > > > 	Capabilities: [e0] Express Endpoint, MSI 00
> > > > 	Capabilities: [100] Advanced Error Reporting <?>
> > > > 	Capabilities: [140] Device Serial Number 5c-fd-e7-ff-ff-d1-a0-00
> > > > 	Kernel driver in use: e1000
> > > > 
> > > > 02:09.0 Ethernet controller: Broadcom Corporation BCM4401-B0 
> > > > 100Base-TX (rev 02)
> > > > 	Subsystem: Dell Device 01e5
> > > > 	Flags: bus master, fast devsel, latency 64, IRQ 10
> > > > 	Memory at dfcfe000 (32-bit, non-prefetchable) [size=8K]
> > > > 	Capabilities: <access denied>
> > > > 	Kernel driver in use: b44
> > > > 	Kernel modules: b44
> > > > 
> > > > One gigabit and one megabit, both have high CPU consuming 
> > > > when iSCSI reading.
> > > 
> > > I haven't been able to reproduce this yet (haven't had access to
> > > the test machine).
> > > 
> > > I can tell you though that for nullio targets all write data
> > > is discarded and all read data is uninitialized memory, or
> > > basically random data.
> > > 
> > > If you see high load on reading and not writing I would
> > > check the transmit path of the server. Comparing NFS/CIFS
> > > and iSCSI usage of the network adapter isn't quite the
> > > same.
> > > 
> > > Can you give us a 'modinfo e1000' and a 'modinfo b44'?
> > > 
> > > Can you also run a 'vmstat 1' during the high CPU usage
> > > and send us a screen's worth.
> > > 
> > > How are the initiators configured? What is the MRDSL set
> > > to?
> > > 
> > > -Ross
> > > 
> > > ______________________________________________________________________
> > > This e-mail, and any attachments thereto, is intended only for use by
> > > the addressee(s) named herein and may contain legally privileged
> > > and/or confidential information. If you are not the intended recipient
> > > of this e-mail, you are hereby notified that any dissemination,
> > > distribution or copying of this e-mail, and any attachments thereto,
> > > is strictly prohibited. If you have received this e-mail in error,
> > > please immediately notify the sender and permanently delete the
> > > original and any copy or printout thereof.
> > > 
> > > 
> > > -------------------------------------------------------------------------
> > > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> > > Build the coolest Linux based applications with Moblin SDK & win great prizes
> > > Grand prize is a trip for two to an Open Source event anywhere in the world
> > > http://moblin-contest.org/redirect.php?banner_id=100&url=/
> > > _______________________________________________
> > > Iscsitarget-devel mailing list
> > > Isc...@li...
> > > https://lists.sourceforge.net/lists/listinfo/iscsitarget-devel
> > 
> > 
> > -------------------------------------------------------------------------
> > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> > Build the coolest Linux based applications with Moblin SDK & win great prizes
> > Grand prize is a trip for two to an Open Source event anywhere in the world
> > http://moblin-contest.org/redirect.php?banner_id=100&url=/
> > _______________________________________________
> > Iscsitarget-devel mailing list
> > Isc...@li...
> > https://lists.sourceforge.net/lists/listinfo/iscsitarget-devel
> > 
>

Re: [Iscsitarget-devel] Iscsi Read Consumes 100% CPU Problem

From: Ross S. W. W. <RW...@me...> - 2008-09-02 19:06:16

Ming Zhang wrote:
> On Tue, 2008-09-02 at 20:55 +0200, Arne Redlich wrote:
> > Am Dienstag, den 02.09.2008, 14:26 -0400 schrieb Ming Zhang:
> > > my 2c, a 5 second tcpdump from login to read might reveal a lot of
> > > things. ;)
> > 
> > Splendid idea, Ming :)
> 
> thanks.
> 
> also i know a lot of the company technical support group have scripts
> that can collect various information from servers without rounds and
> rounds of emails. can we have one?
> 
> including these can save a lot of time, i believe...
> 
> uname -a
> cat /proc/cpuinfo
> cat /proc/meminfo
> cat /proc/net/iet/*
> lsmod
> lspci
> dmesg
> lsscsi
> ethtool ...
> cat /etc/ietd.conf
> ...

Excellent, can you throw one together for inclusion into the code?

Call it ietdiag or something of that ilk.

-Ross

______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.

Re: [Iscsitarget-devel] Iscsi Read Consumes 100% CPU Problem

From: Ming Z. <bla...@gm...> - 2008-09-02 19:11:34

On Tue, 2008-09-02 at 15:06 -0400, Ross S. W. Walker wrote:
> Ming Zhang wrote:
> > On Tue, 2008-09-02 at 20:55 +0200, Arne Redlich wrote:
> > > Am Dienstag, den 02.09.2008, 14:26 -0400 schrieb Ming Zhang:
> > > > my 2c, a 5 second tcpdump from login to read might reveal a lot of
> > > > things. ;)
> > > 
> > > Splendid idea, Ming :)
> > 
> > thanks.
> > 
> > also i know a lot of the company technical support group have scripts
> > that can collect various information from servers without rounds and
> > rounds of emails. can we have one?
> > 
> > including these can save a lot of time, i believe...
> > 
> > uname -a
> > cat /proc/cpuinfo
> > cat /proc/meminfo
> > cat /proc/net/iet/*
> > lsmod
> > lspci
> > dmesg
> > lsscsi
> > ethtool ...
> > cat /etc/ietd.conf
> > ...
> 
> Excellent, can you throw one together for inclusion into the code?

sorry, i do not think i have the time... i am just in a boring meeting
and replying email has more fun...

http://solutions.qlogic.com/KanisaSupportSite/search.do?cmd=displayKC&docType=kc&externalId=10726&sliceId=&dialogID=17509767&stateId=0%200%2017497248

should be a good start for you...


> 
> Call it ietdiag or something of that ilk.
> 
> -Ross
> 
> ______________________________________________________________________
> This e-mail, and any attachments thereto, is intended only for use by
> the addressee(s) named herein and may contain legally privileged
> and/or confidential information. If you are not the intended recipient
> of this e-mail, you are hereby notified that any dissemination,
> distribution or copying of this e-mail, and any attachments thereto,
> is strictly prohibited. If you have received this e-mail in error,
> please immediately notify the sender and permanently delete the
> original and any copy or printout thereof.
>

Re: [Iscsitarget-devel] Iscsi Read Consumes 100% CPU Problem

From: rae l <cr...@gm...> - 2008-09-03 03:20:12

Attachments: iscsi.dump ietddiag.info

On Wed, Sep 3, 2008 at 3:02 AM, Ming Zhang <bla...@gm...> wrote:
>
> On Tue, 2008-09-02 at 20:55 +0200, Arne Redlich wrote:
>> Am Dienstag, den 02.09.2008, 14:26 -0400 schrieb Ming Zhang:
>> > my 2c, a 5 second tcpdump from login to read might reveal a lot of
>> > things. ;)
>>
>> Splendid idea, Ming :)
>
> thanks.
>
> also i know a lot of the company technical support group have scripts
> that can collect various information from servers without rounds and
> rounds of emails. can we have one?
>
> including these can save a lot of time, i believe...
>
> uname -a
> cat /proc/cpuinfo
> cat /proc/meminfo
> cat /proc/net/iet/*
> lsmod
> lspci
> dmesg
> lsscsi
> ethtool ...
> cat /etc/ietd.conf
> ...
>
>
>
>>
>> Denis, could you provide one?

In recent serveral days, I do a lot more benchmark,

On a Dell Power Edge 2950, the problem reproduced, again, the
attachment is the first 750 iscsi packets including logon phase,
ietddiag.info is some information collection,

while iscsi reading, here's the dstat output:

root@uitnode1 ~/tmp/iet-r168 1 # dstat -M
proc,cpu,mem,sys,net,disk,app -C 0,1 -D sda,sdb -N
total,eth0,eth1,eth2 5
---procs--- -------cpu0-usage--------------cpu1-usage------
------memory-usage----- ---system--
-net/total----net/eth0----net/eth1- --dsk/sda-----dsk/sdb--
--most-expensive--
run blk new|usr sys idl wai hiq siq:usr sys idl wai hiq siq|_used
_buff _cach _free|_int_ _csw_|_recv _send:_recv _send:_recv
_send|_read _writ:_read _writ|_____process______
  0   0   0|  0   0  99   0   0   0:  1   2  95   0   0   2| 261M
171M 1795M 1066M|1531  1574 |   0     0 :   0     0 :   0     0 |  15k
  11k:  22B    0 |istd3            3
  3   0   0|  5   3  92   0   0   0:  0  46   0   0   2  52| 261M
171M 1795M 1066M|6988  4552 | 796k  116M: 796k  116M: 899B    0 |   0
   0 :   0     0 |istd3          100
  2   0   0|  7   3  90   0   0   0:  0  51   0   0   1  49| 261M
171M 1795M 1066M|7024  4605 | 801k  116M: 801k  116M: 553B    0 |   0
  33k:   0     0 |istd3          100
  2   0   0|  3   4  93   0   0   0:  0  48   0   0   1  51| 261M
171M 1795M 1066M|7035  4552 | 798k  116M: 797k  116M:1029B    0 |   0
   0 :   0     0 |istd3          100
  2   0   0|  3   3  94   0   0   0:  0  47   0   0   2  51| 261M
171M 1795M 1066M|7012  4529 | 798k  116M: 796k  116M:1585B    0 |   0
   0 :   0     0 |istd3          100
  1   0   0|  2   3  96   0   0   0:  0  48   0   0   1  51| 261M
171M 1795M 1066M|7030  4517 | 798k  116M: 797k  116M:1650B    0 |   0
   0 :   0     0 |istd3          100
  2   0   0|  1   4  95   0   0   0:  0  48   0   0   1  52| 261M
171M 1795M 1066M|7021  4519 | 799k    - : 798k    - :1431B    0 |   0
   0 :   0     0 |istd3          100
  2   0   0|  6   4  89   0   0   0:  0  48   0   0   0  52| 261M
171M 1795M 1066M|7023  4591 | 799k  116M: 798k  116M:1038B    0 |   0
   0 :   0     0 |istd3          100^C

cpu0 is idle, cpu1 is always busy with sys+siq; istd3 thread consume
the 100% cpu1,

Re: [Iscsitarget-devel] Iscsi Read Consumes 100% CPU Problem

From: rae l <cr...@gm...> - 2008-09-03 03:51:52

And here are some other dmesg with `insmod kernel/iscsi_trgt.ko
debug_enable_flags=8`

sendpage returns -EAGAIN very likely, I think this may consume much CPU, -~

iscsi_trgt: write_data(318) 0x1000037010040:1: 48(48)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 4096(0,0,4096)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 3004(0,0,4096)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,3004,1092)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 1092(0,3004,1092)
iscsi_trgt: write_data(318) 0x1000037010040:1: 48(48)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 4096(0,0,4096)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 4096(4054807680,0,4096)
iscsi_trgt: write_data(318) 0x1000037010040:1: 48(48)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 4096(4040847152,0,4096)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 4096(4044210464,0,4096)
iscsi_trgt: write_data(318) 0x1000037010040:1: 48(48)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 4096(3724852752,0,4096)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 4096(4040553600,0,4096)
iscsi_trgt: write_data(318) 0x1000037010040:1: 48(48)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 4096(0,0,4096)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 2164(0,0,4096)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,2164,1932)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,2164,1932)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,2164,1932)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,2164,1932)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,2164,1932)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,2164,1932)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,2164,1932)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,2164,1932)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,2164,1932)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,2164,1932)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,2164,1932)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,2164,1932)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,2164,1932)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,2164,1932)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,2164,1932)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,2164,1932)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,2164,1932)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,2164,1932)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,2164,1932)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,2164,1932)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,2164,1932)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,2164,1932)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,2164,1932)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,2164,1932)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,2164,1932)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,2164,1932)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,2164,1932)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 1932(0,2164,1932)
iscsi_trgt: write_data(318) 0x1000037010040:1: 48(48)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 4096(0,0,4096)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 4096(4043964416,0,4096)
iscsi_trgt: write_data(318) 0x1000037010040:1: 48(48)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 4096(4053159296,0,4096)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 4096(0,0,4096)
iscsi_trgt: write_data(318) 0x1000037010040:1: 48(48)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 4096(4044245248,0,4096)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 4096(4054019888,0,4096)
iscsi_trgt: write_data(318) 0x1000037010040:1: 48(48)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 4096(3726032688,0,4096)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 1324(0,0,4096)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,1324,2772)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 2772(0,1324,2772)
iscsi_trgt: write_data(318) 0x1000037010040:1: 48(48)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 4096(0,0,4096)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 4096(0,0,4096)
iscsi_trgt: write_data(318) 0x1000037010040:1: 48(48)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 4096(4052098864,0,4096)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 4096(4055172080,0,4096)
iscsi_trgt: write_data(318) 0x1000037010040:1: 48(48)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 4096(4042202208,0,4096)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 4096(0,0,4096)
iscsi_trgt: write_data(318) 0x1000037010040:1: 48(48)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 4096(4053423232,0,4096)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 484(0,0,4096)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,484,3612)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,484,3612)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,484,3612)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,484,3612)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,484,3612)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,484,3612)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,484,3612)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,484,3612)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,484,3612)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,484,3612)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,484,3612)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,484,3612)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,484,3612)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,484,3612)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,484,3612)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,484,3612)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,484,3612)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,484,3612)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,484,3612)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,484,3612)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,484,3612)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,484,3612)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,484,3612)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,484,3612)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(0,484,3612)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 3612(0,484,3612)
iscsi_trgt: write_data(318) 0x1000037010040:1: 48(48)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 4096(4044439344,0,4096)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 4096(0,0,4096)
iscsi_trgt: write_data(318) 0x1000037010040:1: 48(48)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 4096(0,0,4096)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 4096(3725111088,0,4096)
iscsi_trgt: write_data(318) 0x1000037010040:1: 48(48)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 4096(4045645360,0,4096)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 4096(4046479152,0,4096)
iscsi_trgt: write_data(318) 0x1000037010040:1: 48(48)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 3740(4046466864,0,4096)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(4046466864,3740,356)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(4046466864,3740,356)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(4046466864,3740,356)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(4046466864,3740,356)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(4046466864,3740,356)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(4046466864,3740,356)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(4046466864,3740,356)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(4046466864,3740,356)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(4046466864,3740,356)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(4046466864,3740,356)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(4046466864,3740,356)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(4046466864,3740,356)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(4046466864,3740,356)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(4046466864,3740,356)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(4046466864,3740,356)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(4046466864,3740,356)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(4046466864,3740,356)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(4046466864,3740,356)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(4046466864,3740,356)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(4046466864,3740,356)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(4046466864,3740,356)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(4046466864,3740,356)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(4046466864,3740,356)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(4046466864,3740,356)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(4046466864,3740,356)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(4046466864,3740,356)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(4046466864,3740,356)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(4046466864,3740,356)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 356(4046466864,3740,356)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 4096(0,0,4096)
iscsi_trgt: write_data(318) 0x1000037010040:1: 48(48)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 4096(0,0,4096)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 4096(4042311168,0,4096)
iscsi_trgt: write_data(318) 0x1000037010040:1: 48(48)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 4096(0,0,4096)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 4096(3725065168,0,4096)
iscsi_trgt: write_data(318) 0x1000037010040:1: 48(48)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 4096(0,0,4096)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 4096(0,0,4096)
iscsi_trgt: write_data(318) 0x1000037010040:1: 48(48)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 2900(0,0,4096)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,2900,1196)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,2900,1196)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,2900,1196)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,2900,1196)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,2900,1196)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,2900,1196)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,2900,1196)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,2900,1196)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,2900,1196)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,2900,1196)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,2900,1196)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,2900,1196)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,2900,1196)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,2900,1196)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,2900,1196)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 1196(0,2900,1196)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 4096(0,0,4096)
iscsi_trgt: write_data(318) 0x1000037010040:1: 48(48)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 4096(4047425328,0,4096)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 4096(3724873520,0,4096)
iscsi_trgt: write_data(318) 0x1000037010040:1: 48(48)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 4096(0,0,4096)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 4096(4045610800,0,4096)
iscsi_trgt: write_data(318) 0x1000037010040:1: 48(48)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 4096(0,0,4096)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 4096(4046139184,0,4096)
iscsi_trgt: write_data(318) 0x1000037010040:1: 48(48)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 2060(0,0,4096)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,2060,2036)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,2060,2036)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,2060,2036)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,2060,2036)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,2060,2036)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,2060,2036)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,2060,2036)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,2060,2036)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,2060,2036)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,2060,2036)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,2060,2036)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 2036(0,2060,2036)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 4096(4045766448,0,4096)
iscsi_trgt: write_data(318) 0x1000037010040:1: 48(48)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 4096(3724906456,0,4096)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 4096(0,0,4096)
iscsi_trgt: write_data(318) 0x1000037010040:1: 48(48)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 4096(0,0,4096)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 4096(0,0,4096)
iscsi_trgt: write_data(318) 0x1000037010040:1: 48(48)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 4096(4054351664,0,4096)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 4096(0,0,4096)
iscsi_trgt: write_data(318) 0x1000037010040:1: 48(48)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 1220(3724579536,0,4096)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1:
-11(3724579536,1220,2876)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1:
-11(3724579536,1220,2876)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1:
-11(3724579536,1220,2876)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1:
-11(3724579536,1220,2876)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1:
-11(3724579536,1220,2876)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1:
-11(3724579536,1220,2876)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1:
-11(3724579536,1220,2876)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1:
-11(3724579536,1220,2876)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1:
-11(3724579536,1220,2876)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1:
-11(3724579536,1220,2876)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1:
-11(3724579536,1220,2876)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1:
-11(3724579536,1220,2876)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1:
-11(3724579536,1220,2876)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1:
-11(3724579536,1220,2876)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1:
-11(3724579536,1220,2876)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1:
-11(3724579536,1220,2876)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1:
-11(3724579536,1220,2876)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1:
-11(3724579536,1220,2876)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1:
-11(3724579536,1220,2876)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1:
2876(3724579536,1220,2876)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 4096(0,0,4096)
iscsi_trgt: write_data(318) 0x1000037010040:1: 48(48)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 4096(4040776512,0,4096)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 4096(4053519240,0,4096)
iscsi_trgt: write_data(318) 0x1000037010040:1: 48(48)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 4096(4040613680,0,4096)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 4096(4041436688,0,4096)
iscsi_trgt: write_data(318) 0x1000037010040:1: 48(48)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 4096(4043788080,0,4096)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 4096(4057292592,0,4096)
iscsi_trgt: write_data(318) 0x1000037010040:1: 48(48)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 380(0,0,4096)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,380,3716)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,380,3716)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,380,3716)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,380,3716)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,380,3716)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,380,3716)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,380,3716)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,380,3716)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,380,3716)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,380,3716)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,380,3716)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,380,3716)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,380,3716)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,380,3716)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,380,3716)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,380,3716)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,380,3716)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,380,3716)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,380,3716)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,380,3716)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,380,3716)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: -11(0,380,3716)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 3716(0,380,3716)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 4096(0,0,4096)
iscsi_trgt: write_data(318) 0x1000037010040:1: 48(48)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 4096(4053683936,0,4096)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 4096(0,0,4096)
iscsi_trgt: write_data(318) 0x1000037010040:1: 48(48)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 4096(3724972896,0,4096)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 4096(4044635952,0,4096)
iscsi_trgt: write_data(318) 0x1000037010040:1: 48(48)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 4096(4045478400,0,4096)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 3684(4044926768,0,4096)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(4044926768,3684,412)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(4044926768,3684,412)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(4044926768,3684,412)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(4044926768,3684,412)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(4044926768,3684,412)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(4044926768,3684,412)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(4044926768,3684,412)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(4044926768,3684,412)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: -11(4044926768,3684,412)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 412(4044926768,3684,412)
iscsi_trgt: write_data(318) 0x1000037010040:1: 48(48)
iscsi_trgt: write_data(384) sendpage 0x1000037010040:1: 4096(4039419152,0,4096)
iscsi_trgt: write_data(363) sendpage 0x1000037010040:1: 4096(0,0,4096)


-- 
Denis Cheng
Linux Application Developer

"One of my most productive days was throwing away 1000 lines of code."
 - Ken Thompson.

Re: [Iscsitarget-devel] Iscsi Read Consumes 100% CPU Problem

From: Ross S. W. W. <RW...@me...> - 2008-09-03 13:57:52

rae l wrote:
> On Wed, Sep 3, 2008 at 3:02 AM, Ming Zhang 
> <bla...@gm...> wrote:
> >
> > On Tue, 2008-09-02 at 20:55 +0200, Arne Redlich wrote:
> >> Am Dienstag, den 02.09.2008, 14:26 -0400 schrieb Ming Zhang:
> >> > my 2c, a 5 second tcpdump from login to read might 
> reveal a lot of
> >> > things. ;)
> >>
> >> Splendid idea, Ming :)
> >
> > thanks.
> >
> > also i know a lot of the company technical support group 
> have scripts
> > that can collect various information from servers without rounds and
> > rounds of emails. can we have one?
> >
> > including these can save a lot of time, i believe...
> >
> > uname -a
> > cat /proc/cpuinfo
> > cat /proc/meminfo
> > cat /proc/net/iet/*
> > lsmod
> > lspci
> > dmesg
> > lsscsi
> > ethtool ...
> > cat /etc/ietd.conf
> > ...
> >
> >
> >
> >>
> >> Denis, could you provide one?
> 
> In recent serveral days, I do a lot more benchmark,
> 
> On a Dell Power Edge 2950, the problem reproduced, again, the
> attachment is the first 750 iscsi packets including logon phase,
> ietddiag.info is some information collection,
> 
> while iscsi reading, here's the dstat output:
> 

Thanks for the diag info I'm going to take a look at that,
but can you give us a 'vmstat 1' during the high CPU to
nullio lun so we can get an idea of the interrupts being
driven.

-Ross

______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.

Re: [Iscsitarget-devel] Iscsi Read Consumes 100% CPU Problem

From: Ross S. W. W. <RW...@me...> - 2008-09-03 14:40:44

rae l wrote:
> On Wed, Sep 3, 2008 at 3:02 AM, Ming Zhang 
> <bla...@gm...> wrote:
> >
> > On Tue, 2008-09-02 at 20:55 +0200, Arne Redlich wrote:
> >> Am Dienstag, den 02.09.2008, 14:26 -0400 schrieb Ming Zhang:
> >> > my 2c, a 5 second tcpdump from login to read might reveal a lot of
> >> > things. ;)
> >>
> >> Splendid idea, Ming :)
> >
> > thanks.
> >
> > also i know a lot of the company technical support group have scripts
> > that can collect various information from servers without rounds and
> > rounds of emails. can we have one?
> >
> > including these can save a lot of time, i believe...
> >
> > uname -a
> > cat /proc/cpuinfo
> > cat /proc/meminfo
> > cat /proc/net/iet/*
> > lsmod
> > lspci
> > dmesg
> > lsscsi
> > ethtool ...
> > cat /etc/ietd.conf
> > ...
> >
> >
> >
> >>
> >> Denis, could you provide one?
> 

Rae,

My mistake on the tcpdump, I didn't tell you to give a long
enough snap length on the data, so all the good iscsi protocol
information was truncated. I naively thought a raw dump would
do the full packet, oh well.

Can you run a:

# tcpdump -c 750 -s 1460 -w iscsi.dump -i <int> tcp port 3260

Again from zero initiators to 1 initiator doing a read.

I was able to determine that your MTU is 1500 and your MSS
is 1460, so the snap length above should be good.

You will need to compress it this time though, so just
gzip it.

-Ross

______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.

Re: [Iscsitarget-devel] Iscsi Read Consumes 100% CPU Problem

From: Ross S. W. W. <RW...@me...> - 2008-09-03 14:55:25

rae l wrote:
> 
> cpu0 is idle, cpu1 is always busy with sys+siq; istd3 thread consume
> the 100% cpu1,

I'm curious, what throughput were you getting on the initiator during
these tests?

It is possible that you were driving enough IO through the NIC to peg
the CPU. iSCSI is a processor intensive protocol.

-Ross

______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.

Re: [Iscsitarget-devel] Iscsi Read Consumes 100% CPU Problem

From: rae l <cr...@gm...> - 2008-09-04 02:29:30

Attachments: iscsi.dump.bz2

On Wed, Sep 3, 2008 at 9:57 PM, Ross S. W. Walker <RW...@me...> wrote:
> Thanks for the diag info I'm going to take a look at that,
> but can you give us a 'vmstat 1' during the high CPU to
> nullio lun so we can get an idea of the interrupts being
> driven.

Do you notice the dstat output, `dstat` is generally a better vmstat,
it collects cpu load, net throughput, most expensive app in the same
time, here is the test output, almost the same as the previous one,

It displays usr,sys,idl,wai,hiq,siq of every CPUs, here cpu0 is still
idle, but cpu1 becomes busy with sys+siq, while the net/eth0
throughput becomes 110MB/s;

root@uitnode1 ~/tmp/iet-r168 0 # dstat -M
proc,cpu,mem,sys,net,disk,app -C 0,1 -D sda,sdb -N eth0 5
---procs--- -------cpu0-usage--------------cpu1-usage------
------memory-usage----- ---system-- --net/eth0-
--dsk/sda-----dsk/sdb-- --most-expensive--
run blk new|usr sys idl wai hiq siq:usr sys idl wai hiq siq|_used
_buff _cach _free|_int_ _csw_|_recv _send|_read _writ:_read
_writ|_____process______
  0   0   0|  0   1  98   1   0   0:  0   0  98   1   0   0| 331M
139M 1189M 1634M| 246   102 |2548B   11k|   0    37k:   0     0
|gnome-terminal   1
  0   0   0|  1   0  99   0   0   0:  0   0 100   0   0   0| 330M
139M 1190M 1634M| 297   162 |1612B  108k|   0    22k:   0     0
|istiod6         86
  0   0   0|  1   0  99   0   0   0:  0   0  99   1   0   0| 330M
139M 1190M 1634M| 223    58 |1089B    0 |   0    14k:   0     0 |pcscd
           0
  2   0   0|  2   4  92   2   0   0:  0  37  22   0   1  40| 338M
139M 1190M 1626M|  11k 3550 |3850B   86M|4915B  188k:   0     0 |istd6
         163
  2   0   0|  2   7  91   0   0   0:  0  50   0   0   1  49| 338M
139M 1190M 1626M|  15k 4430 |3994B  113M|   0     0 :   0     0 |istd6
         100
  1   0   0|  2   2  97   0   0   0:  0  50   0   0   0  50| 338M
139M 1190M 1626M|  15k 4380 |1920B  112M|   0     0 :   0     0 |istd6
         100
  3   0   0|  2   2  96   0   0   0:  0  51   0   0   1  48| 338M
139M 1190M 1626M|  13k 4254 |2710B  109M|   0     0 :   0     0 |istd6
         100
  1   0   0|  1   3  96   0   0   0:  0  48   0   0   1  51| 338M
139M 1190M 1626M|  15k 4356 |1534B    - |   0  3277B:   0     0 |istd6
         100
  2   0   0|  2   4  93   1   0   0:  0  51   0   0   1  48| 338M
139M 1190M 1626M|  14k 4304 |3689B  110M|   0    22k:   0     0 |istd6
         100
  4   0   0|  4   2  94   0   0   0:  0  51   0   0   1  48| 338M
139M 1190M 1626M|  14k 4340 |5716B  111M|   0   819B:   0     0 |istd6
         100
  5   0   0|  4   4  92   0   0   0:  0  45   0   0   1  54| 338M
139M 1190M 1626M|  16k 4491 |9003B  114M|   0     0 :   0     0 |istd6
         100
  2   0   0|  4   2  94   0   0   0:  0  49   0   0   0  51| 338M
139M 1190M 1626M|  15k 4488 |5306B  113M|   0     0 :   0     0 |istd6
         100
  1   0   0|  2   4  94   0   0   0:  0  49   2   0   0  48| 329M
139M 1190M 1634M|  14k 4144 |2974B  107M|   0     0 :   0     0 |istd6
          97
  0   0   0|  1   0  99   0   0   0:  2   0  98   0   0   0| 329M
139M 1190M 1634M| 274   127 |4262B   57k|   0     0 :   0     0
|gnome-terminal   2^C


However, here is `vmstat 1` from when iscsi reading begun, cpu load
falls from 100 to 50, means one cpu of the two became totally busy,

# vmstat 1
 0  0    108 1673412 142312 1218136    0    0     0     0  222   56  1
 0 100  0  0
 0  0    108 1673412 142320 1218128    0    0     0    72  224   68  0
 1 98  2  0
 0  0    108 1673412 142320 1218136    0    0     0     0  219   54  0
 0 100  0  0
 0  0    108 1673412 142320 1218136    0    0     0     0  219   62  1
 0 100  0  0
 8  0    108 1664848 142352 1218844    0    0    24   940 9206 3270  1
28 66  5  0
 2  0    108 1664980 142352 1219012    0    0     0     0 13172 4337
1 51 48  0  0
 1  0    108 1664980 142352 1219012    0    0     0     0 13674 4307
2 52 47  0  0
 1  0    108 1664980 142352 1219012    0    0     0     0 15305 4434
1 57 42  0  0
 2  0    108 1664980 142352 1219012    0    0     0     0 12999 4208
1 54 46  0  0
 1  0    108 1664980 142352 1219012    0    0     0     0 15573 4451
1 52 48  0  0
 1  0    108 1664980 142352 1219012    0    0     0     0 17470 4566
2 56 43  0  0
 1  0    108 1664980 142352 1219012    0    0     0     0 14819 4536
1 51 48  0  0
 1  0    108 1664980 142352 1219012    0    0     0     0 14932 4395
0 59 42  0  0
 1  0    108 1665104 142352 1219012    0    0     0     0 13233 4242
1 52 48  0  0
 1  0    108 1665104 142352 1219012    0    0     0     0 15343 4411
1 51 49  0  0
 1  0    108 1665104 142352 1219012    0    0     0     0 14956 4377
1 51 49  0  0
 2  0    108 1664980 142352 1219012    0    0     0     0 16134 4457
1 51 49  0  0
 1  0    108 1664980 142352 1219012    0    0     0     0 14888 4363
1 51 49  0  0
 1  0    108 1664980 142352 1219012    0    0     0     0 17138 4552
2 50 48  0  0
 1  0    108 1665048 142352 1219012    0    0     0     0 13799 4281
1 51 49  0  0
 1  0    108 1665036 142352 1219012    0    0     0     0 12202 4153
1 51 48  0  0



On Wed, Sep 3, 2008 at 10:40 PM, Ross S. W. Walker
<RW...@me...> wrote:
> My mistake on the tcpdump, I didn't tell you to give a long
> enough snap length on the data, so all the good iscsi protocol
> information was truncated. I naively thought a raw dump would
> do the full packet, oh well.
>
> Can you run a:
>
> # tcpdump -c 750 -s 1460 -w iscsi.dump -i <int> tcp port 3260

Done, the attachment is new one.

On Wed, Sep 3, 2008 at 10:55 PM, Ross S. W. Walker
<RW...@me...> wrote:
> rae l wrote:
>>
>> cpu0 is idle, cpu1 is always busy with sys+siq; istd3 thread consume
>> the 100% cpu1,
>
> I'm curious, what throughput were you getting on the initiator during
> these tests?
The initiator is "microsoft iscsi initiator", the iometer benchmark
got 105MB/s throughput.

>
> It is possible that you were driving enough IO through the NIC to peg
> the CPU. iSCSI is a processor intensive protocol.
>
> -Ross

Another interesting test is, if use open-iscsi initiator on an Linux
as the client, the benchmark can also get 110MB/s bandwidth, while IET
on the server don't consume high CPU;

It seems IET cannot collaborate well with microsoft iscsi initiator,
but it should handle this.


-- 
Denis Cheng
Linux Application Developer

"One of my most productive days was throwing away 1000 lines of code."
 - Ken Thompson.

1 2 > >> (Page 1 of 2)