From: Evgeny Y. <Evg...@fr...> - 2008-09-30 21:52:35
|
All 18 targets share the same HW raid. Iostat at the time when problem was noticed (approximately +-15 sec) Sat Sep 27 05:40:26 2008 Linux 2.6.24.7-0.5.1.smp.gcc3.4.x86.i686 (san6.frontline.ca) 09/27/2008 avg-cpu: %user %nice %system %iowait %steal %idle 0.78 0.00 2.02 0.02 0.00 97.17 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 344.15 827.45 1411.23 2101296916 3583803627 sda1 346.13 827.42 1411.22 2101240221 3583779901 sdb 1.58 0.09 74.20 223136 188437316 sdb1 0.00 0.00 0.00 8954 1348 sdb2 8.77 0.08 70.11 213966 178052688 sdc 1.58 0.09 74.20 234018 188437316 sdc1 0.00 0.00 0.00 8866 1348 sdc2 8.77 0.09 70.11 224936 178052688 md0 0.00 0.00 0.00 9860 452 md1 8.27 0.17 66.03 426970 167670304 dm-0 8.26 0.11 66.03 277506 167670304 dm-1 0.00 0.01 0.00 18896 0 dm-2 79.78 763.27 157.68 1938325852 400415242 dm-3 0.80 26.65 11.65 67679810 29573737 dm-4 93.04 954.00 741.73 2422681153 1883625473 dm-6 3.55 0.64 30.34 1636500 77059565 dm-7 14.00 0.03 35.27 68990 89577980 dm-8 0.39 8.82 9.63 22406870 24444417 dm-9 3.71 45.68 34.37 116010021 87278750 dm-10 0.06 0.63 1.91 1596596 4841294 dm-11 7.50 39.32 81.87 99855593 207896120 dm-12 0.15 1.92 6.30 4870921 16001121 dm-13 0.66 5.85 0.17 14864962 434157 dm-14 89.32 669.20 233.36 1699428858 592617160 dm-15 9.67 0.02 11.58 41426 29405087 dm-16 4.25 0.04 5.11 98797 12976217 dm-5 30.77 0.09 36.18 234425 91887872 dm-17 8.01 0.07 9.83 177552 24956500 dm-18 0.00 0.00 0.00 9472 8 dm-19 0.00 0.00 0.00 6888 8 dm-20 0.46 2.44 4.26 6185873 10812736 Eugene -----Original Message----- From: Ming Zhang [mailto:bla...@gm...] Sent: September 30, 2008 5:33 PM To: Evgeny Yurchenko Cc: Roman Naumenko; isc...@li... Subject: RE: [Iscsitarget-devel] kernel Error message On Tue, 2008-09-30 at 17:26 -0400, Evgeny Yurchenko wrote: > CORRECTION! > Sorry, please read "we decided to migrate to blockio as... in item 3. > > Eugene > > -----Original Message----- > From: Evgeny Yurchenko > Sent: September 30, 2008 5:22 PM > To: bla...@gm... > Cc: Roman Naumenko; isc...@li... > Subject: RE: [Iscsitarget-devel] kernel Error message > > RAID controller has 128MB on it. not much. ;) > 1.We export this raid5 as a whole, nothing else using these drives. > 2.We have 18 targets. Some of them are used for sequential writing > (logs), some of them have random access (like mail-boxes). all 18 targets share this RAID? then i would see a huge seek cost here... can u ssh into box and use iostat to check the iowait time? > 3. You noticed very good point. Initially these target were created as > fileio. But then during system upgrade when we stopped all servers we > decided to migrate it to fileio as testing showed much better > performance (of course we tested this upgrade and did not found any > issues). blockio bypass cache, so it standalone potentially has better performance. but it is a write through mode and when u have seek here, it might not be good. > Can it be the issue? Should I move them back to fileio or is there any > other way to migrate these targets to blockio mode? > 4. I'd love to do tcpdump but this OpenFiler distribution does not have > one and it is very hard to install anything on it, I will probably try > to WireShark from initiator's side. On target linux-box we do not have > any packet drops/errors. Definitely no routing issues as there is no > routing, just L2 switching. then your switch might not be a problem as well. wireshark is ok. but it can only confirm the problem. not to solve it. > > Thanks. > Eugene > > -----Original Message----- > From: Ming Zhang [mailto:bla...@gm...] > Sent: September 30, 2008 5:01 PM > To: Evgeny Yurchenko > Cc: Roman Naumenko; isc...@li... > Subject: RE: [Iscsitarget-devel] kernel Error message > > On Tue, 2008-09-30 at 16:51 -0400, Evgeny Yurchenko wrote: > > > > Hi Ming! > > > > Thanks for quick response. All iscsi-target are configured in blockio > > mode. The server is IntelPentium4 CPU 3.00GHz, 2GB RAM and Adaptec > > 2820SA RAID-card with 8 SATA 3.0MB/s drives configured in RAID5. > Before > > i would say you disk array might be a bottleneck... how many NVRAM u > have in RAID controller? > > and you export this raid5 as a whole? or some other io share the raid? > > is this target only for log (pretty sequential) or also for mail? > > > > putting this server in production we burned it with much higher iscsi > > load (measuring network load) and there were no errors. I just want > you > > to confirm the next: > > ic. did u test fileio mode before? > > > > > > This error tells us that INITIATOR did reset connection because did > not > > receive confirmation from TARGET that read/write operation complete in > > timely manner. Correct? > > yes > > > And two more questions if you do not mind: > > 1) how can installing more memory help? > > i thought u use fileio, so memory is not issue here. > > > 2) is it possible that this is network issue? > > could be. but u need to check u switch to see if any routing issue, > packet loss.... or run tcpdump to check packet timing, for a high > traffic like this. maybe u have to grab couple GB logs... > > > > > > Thanks! > > Eugene. > > > > -----Original Message----- > > From: Ming Zhang [mailto:bla...@gm...] > > Sent: September 30, 2008 4:36 PM > > To: Roman Naumenko > > Cc: isc...@li...; Evgeny Yurchenko > > Subject: Re: [Iscsitarget-devel] kernel Error message > > > > On Tue, 2008-09-30 at 16:00 -0400, Roman Naumenko wrote: > > > Hello, > > > > > > We have a few errors per day in /var/log/messages on one of most > > > loaded SAN servers: > > > > > > kernel: iscsi_trgt: Logical Unit Reset (05) issued on tid:1 lun:0 by > > > sid:1971425271414848 (Function Complete) > > > > > > As we discovered this error appears when Exchange heavily writes > logs > > > on target device. > > > We are still investigating disk loads on the server trying to figure > > > out if error are actually related to disk load. > > > > yes. it is. ini sends a command to target and the command takes too > long > > to complete. and then ini issues a reset. > > > > using a faster disk if possible, if you use 64bit os, add more ram if > > possible. > > > > and u might want to use blockio or async mode. > > > > > > but should not cause data loss, ... > > > > > > > > > > The system is rPath Linux 2.3 and kernel is > > > 2.6.24.7-0.5.1.smp.gcc3.4.x86.i686 > > > > > > We use openfiler system on SANs. Now we are discussing with > openfiler > > > developers if NICs issues might be a problem - but there are no > > > dropped packets on the server side. On the initiators side all > > > additional options are disabled (like TCPOffload Engine) > > > > > > What is your opinion about this issue? > > > > > > Best regards, > > > Roman Naumenko > > > > > > Network Administrator > > > Frontline Technologies > > > > > > HelpDesk: (416) 637-3132 > > > > > > rom...@fr... > > > http://www.frontline.ca > > > > > > ------------------------------------------------------------------------ > > - > > > This SF.Net email is sponsored by the Moblin Your Move Developer's > > challenge > > > Build the coolest Linux based applications with Moblin SDK & win > great > > prizes > > > Grand prize is a trip for two to an Open Source event anywhere in > the > > world > > > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > > > _______________________________________________ Iscsitarget-devel > > mailing list Isc...@li... > > https://lists.sourceforge.net/lists/listinfo/iscsitarget-devel > > > |