From: Vladislav B. <vs...@vl...> - 2006-05-24 11:04:12
|
Those messages on the initiator are expected. It issues them when it can't perform commands constantly receiving BUSY for them from the target, which can't allocate necessary memory in limited SG entries count for such huge data sizes. Brad, how were you managed the initiator to send such commands? I have seen such big data sizes only when sent them manually via sg for testing. Did you manually edit /sys/block/YOUR_DEVICE/queue/max_sectors_kb? What are the current values there and in /sys/block/YOUR_DEVICE/queue/max_hw_sectors_kb? You should set max_sectors_kb in something more sane like 1024. Vlad Brad Johnson wrote: > The initiator system log is full of these messages: > > sd 1:0:0:0: timing out command, waited 150s > sd 1:0:0:0: SCSI error: return code = 0x28 > end_request: I/O error, dev sda, sector 22334567 > sd 1:0:0:0: timing out command, waited 150s > sd 1:0:0:0: SCSI error: return code = 0x8 > end_request: I/O error, dev sda, sector 22382295 > > Here is the sgv file: > > Name Hit Total > sgv 0 0 > sgv-4K 0 0 > sgv-8K 0 0 > sgv-16K 0 0 > sgv-32K 0 0 > sgv-64K 0 0Name Hit Total > sgv 0 0 > sgv-4K 0 0 > sgv-8K 0 0 > sgv-16K 0 0 > sgv-32K 0 0 > sgv-64K 0 0 > sgv-128K 0 0 > sgv-256K 0 0 > sgv-512K 0 0 > sgv-1024K 0 0 > sgv-2048K 0 0 > > sgv-clust 2381747 2383002 > sgv-clust-4K 974 1040 > sgv-clust-8K 459 512 > sgv-clust-16K 510 543 > sgv-clust-32K 746 787 > sgv-clust-64K 8541 8629 > sgv-clust-128K 22 32 > sgv-clust-256K 61 76 > sgv-clust-512K 35599 35714 > sgv-clust-1024K 341439 341721 > sgv-clust-2048K 1993396 1993948 > > sgv-dma 0 0 > sgv-dma-4K 0 0 > sgv-dma-8K 0 0 > sgv-dma-16K 0 0 > sgv-dma-32K 0 0 > sgv-dma-64K 0 0 > sgv-dma-128K 0 0 > sgv-dma-256K 0 0 > sgv-dma-512K 0 0 > sgv-dma-1024K 0 0 > sgv-dma-2048K 0 0 > > big 7052121 > > sgv-128K 0 0 > sgv-256K 0 0 > sgv-512K 0 0 > sgv-1024K 0 0 > sgv-2048K 0 0 > > sgv-clust 2381747 2383002 > sgv-clust-4K 974 1040 > sgv-clust-8K 459 512 > sgv-clust-16K 510 543 > sgv-clust-32K 746 787 > sgv-clust-64K 8541 8629 > sgv-clust-128K 22 32 > sgv-clust-256K 61 76 > sgv-clust-512K 35599 35714 > sgv-clust-1024K 341439 341721 > sgv-clust-2048K 1993396 1993948 > > sgv-dma 0 0 > sgv-dma-4K 0 0 > sgv-dma-8K 0 0 > sgv-dma-16K 0 0 > sgv-dma-32K 0 0 > sgv-dma-64K 0 0 > sgv-dma-128K 0 0 > sgv-dma-256K 0 0 > sgv-dma-512K 0 0 > sgv-dma-1024K 0 0 > sgv-dma-2048K 0 0 > > big 7052121 > > > ...Brad > > > On Tue, 2006-05-23 at 11:41 +0400, Vladislav Bolkhovitin wrote: > >>OK, thanks. >> >>Relating the messages. What is your initiator? Does it have something in >>its logs? 3 Mb data size in commands is a bit too much, Qlogic cards >>usually have troubles to handle it, so you see those messages. Could you >>send me the content of /proc/scsi_tgt/sgv file, please? >> >>Vlad >> >>Brad Johnson wrote: >> >>>Vlad, >>> >>>The patched version has not yet crashed after running bonnie for an >>>hour. But now it doesn't seem to be progressing, with the following >>>messages appearing continuously in the system log: >>> >>>May 22 16:25:15 localhost kernel: [4614]: scst_set_busy:Sending BUSY >>>status to initiator 21:01:00:e0:8b:a6:54:d1 (cmds count 1, queue_type 0, >>>sess->init_phase 3), probably the system is overloaded >>>May 22 16:25:15 localhost kernel: [4615]: scst_prepare_space:Unable to >>>allocate or build requested buffer (size 3182592), sending BUSY status >>> >>> >>>The requested buffer size varies slightly, but is always in the 3 >>>million range. >>> >>>...Brad >>> >>> >>>On Mon, 2006-05-22 at 16:08 +0400, Vladislav Bolkhovitin wrote: >>> >>> >>>>Thanks. Could you try with the attached patch, please? (On top of the >>>>latest CVS) >>>> >>>>Brad Johnson wrote: >>>> >>>> >>>>>I tried it with the latest code from CVS (as of 8:00 AM CDT 05/19/2006). >>>>>It goes a while longer before crashing this time, and fails in a >>>>>different place. This test again used scst_disk handler. I will next try >>>>>using the scst_fileio module and will let you know the results. >>>>>Here is the GDB output for this failure: >>>>> >>>>>Program received signal SIGILL, Illegal instruction. >>>>>__scst_process_active_cmd (cmd=0xea340d14, context=<value optimized >>>>>out>, >>>>> pflags=0xf1488fd4, left_locked=1) >>>>> at /root/mid-level/cvs_version/src/scst_targ.c:2494 >>>>>2494 BUG(); >>>>>(gdb) bt >>>>>#0 __scst_process_active_cmd (cmd=0xea340d14, context=<value optimized >>>>>out>, >>>>> pflags=0xf1488fd4, left_locked=1) >>>>> at /root/mid-level/cvs_version/src/scst_targ.c:2494 >>>>>#1 0xf8d88b8c in scst_do_job_active (active_cmd_list=0xf8dab8f0, >>>>> pflags=0xf1488fd4, context=268435459) >>>>> at /root/mid-level/cvs_version/src/scst_targ.c:54 >>>>>#2 0xf8d88f2b in scst_cmd_thread (arg=<value optimized out>) >>>>> at /root/mid-level/cvs_version/src/scst_targ.c:2662 >>>>>#3 0xc01024d9 in kernel_thread_helper () at >>>>>arch/i386/kernel/process.c:298 >>>>> >>>>>(gdb) print *cmd >>>>>$2 = {cmd_list_entry = {next = 0xf8dab900, prev = 0xf8dab900}, >>>>> sess = 0xe92800a8, state = 9, sent_to_midlev = 0, ua_ignore = 0, >>>>> atomic = 0, non_atomic_only = 1, internal = 0, retry = 0, blocking = >>>>>0, >>>>> data_buf_alloced = 0, expected_values_set = 1, processible_env = 1, >>>>> cmd_flags = 0, tgtt = 0xf8c39f40, dev = 0xf5b1a340, lun = 0, >>>>> tgt_dev = 0xd1b2309c, scsi_req = 0x0, sn = 20668, >>>>>search_cmd_list_entry = { >>>>> next = 0xe92800d0, prev = 0xe92800d0}, >>>>> cdb = "(\000\003\uffff\000?\000\000\b\000\000\000\000\000\000", >>>>>cdb_len = 10, >>>>> queue_type = SCST_CMD_QUEUE_UNTAGGED, timeout = 15000, retries = 0, >>>>> data_direction = DMA_FROM_DEVICE, >>>>> expected_data_direction = DMA_FROM_DEVICE, expected_transfer_len = >>>>>4096, >>>>> data_len = 4096, scst_cmd_done = 0xf8d83080 <scst_cmd_done_local>, >>>>> sgv = 0xecc606c0, bufflen = 4096, buffer = 0xecc606d4, use_sg = 1, >>>>> get_sg_buf_entry_num = 0, status = 0 '\0', masked_status = 0 '\0', >>>>> msg_status = 0 '\0', host_status = 0, driver_status = 0, >>>>> sense_buffer = '\0' <repeats 95 times>, tgt_dev_saved = 0x0, >>>>> tgt_resp_flags = 2, resp_data_len = 4096, mgmt_cmd = 0x0, >>>>> extra_cmd_list_entry = {next = 0x100100, prev = 0x200200}, cmd_saved = >>>>>0x0, >>>>> tag = 18824, tgt_specific = 0xe32b09bc, tgt_dev_specific = 0x0} >>>>>(gdb) >>>>> >>>>> >>>>>...Brad >>>>> >>>>> >>>>>On Thu, 2006-05-18 at 18:38 -0400, Ming Zhang wrote: >>>>> >>>>> >>>>> >>>>>>1) can u try latest code from scst cvs >>>>>> >>>>>>2) can u try to export it via scst_fileio module? see if oops in same >>>>>>place. >>>>>> >>>>>>ming >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>On Thu, 2006-05-18 at 17:20 -0500, Brad Johnson wrote: >>>>>> >>>>>> >>>>>> >>>>>>>The system running scst crashes when doing I/O to target from remote >>>>>>>system. >>>>>>> >>>>>>>Here is my setup: >>>>>>>My target system has 2 Intel Xeon processors (3.2 MHz) and 1 GB RAM. >>>>>>>It is running Linux 2.6.15.7. >>>>>>>It has scst-0.9.4 and qla2x00-target-26-0.9.3.8 installed. >>>>>>>It has a Qlogic 2312 HBA connected to a switch. This is my FC target >>>>>>>host. (My FC Initiator is another x86 system with a Qlogic HBA also >>>>>>>connected to the switch.) >>>>>>>For back-end devices it has an LSI FC949X HBA connected to a Hitachi >>>>>>>Fibre-channel drive. >>>>>>> >>>>>>>Here is my start script: >>>>>>>-------------------------------------------------------- >>>>>>>modprobe -v qla2x00tgt >>>>>>>modprobe -v scst_disk >>>>>>>echo "add 2:0:3:0 0" >/proc/scsi_tgt/groups/Default/devices >>>>>>>echo "1" >/sys/class/scsi_host/host5/target_mode_enabled >>>>>>>-------------------------------------------------------- >>>>>>> >>>>>>>In the script, 2:0:3:0 refers to my Hitachi drive, host5 refers to my >>>>>>>Qlogic target-mode port. Everything starts successfully (including >>>>>>>scsi_tgt module since it is a dependency of scst_disk). >>>>>>> >>>>>>>>From my initiator system I see the one drive I have exposed. I >>>>>>>successfully partition that drive and do mkfs. At this point everything >>>>>>>is still fine. I then mount the file system and copy a big file to it. >>>>>>>The copy seems to work fine but at some point shortly after that my >>>>>>>target system crashes. There is no oops output to the system log. So I >>>>>>>did it again with a remote kgdb attached. Here is the gdb output: >>>>>>> >>>>>>> >>>>>>>Program received signal SIGILL, Illegal instruction. >>>>>>>__free_pages (page=0xc190a22c, order=0) at mm/page_alloc.c:1055 >>>>>>>1055 if (put_page_testzero(page)) { >>>>>>>(gdb) bt >>>>>>>#0 __free_pages (page=0xc190a22c, order=0) at mm/page_alloc.c:1055 >>>>>>>#1 0xf8d61efd in scst_release_space (cmd=0xf40a4e58) >>>>>>> at /root/mid-level/scst-0.9.4/src/scst_lib.c:1430 >>>>>>>#2 0xf8d60b2a in scst_free_cmd (cmd=0xf40a4e58, check_retry=1) >>>>>>> at /root/mid-level/scst-0.9.4/src/scst_lib.c:956 >>>>>>>#3 0xf8d599ee in scst_finish_cmd (cmd=0xf40a4e58) >>>>>>> at /root/mid-level/scst-0.9.4/src/scst_targ.c:2212 >>>>>>>#4 0xf8d5a7df in __scst_process_active_cmd (cmd=0xf40a4e58, >>>>>>> context=<value optimized out>, pflags=0xc046cfb8, >>>>>>> left_locked=<value optimized out>) >>>>>>> at /root/mid-level/scst-0.9.4/src/scst_targ.c:2461 >>>>>>>#5 0xf8d5aa81 in scst_do_job_active (active_cmd_list=0xf8d756d0, >>>>>>> pflags=0xc046cfb8, context=268435457) >>>>>>> at /root/mid-level/scst-0.9.4/src/scst_targ.c:54 >>>>>>>#6 0xf8d5af99 in scst_cmd_tasklet (p=<value optimized out>) >>>>>>> at /root/mid-level/scst-0.9.4/src/scst_targ.c:2672 >>>>>>>#7 0xc012d905 in tasklet_action (a=<value optimized out>) >>>>>>> at kernel/softirq.c:267 >>>>>>>#8 0xc012d552 in __do_softirq () at kernel/softirq.c:95 >>>>>>>#9 0xc010619e in do_softirq () at arch/i386/kernel/irq.c:187 >>>>>>>#10 0xc012d689 in irq_exit () at kernel/softirq.c:169 >>>>>>>#11 0xc010604e in do_IRQ (regs=0xc1cf4f48) at arch/i386/kernel/irq.c:110 >>>>>>>#12 0xc010499e in common_interrupt () at thread_info.h:91 >>>>>>>#13 0xc1cf4000 in ?? () >>>>>>>#14 0x00000000 in ?? () >>>>>>> >>>>>>> >>>>>>>I can reproduce this easily every time. Let me know if you want any >>>>>>>further information about this. >>>>>>> >>>>>>>...Brad Johnson >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>------------------------------------------------------- >>>>>>>Using Tomcat but need to do more? Need to support web services, security? >>>>>>>Get stuff done quickly with pre-integrated technology to make your job easier >>>>>>>Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo >>>>>>>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 >>>>>>>_______________________________________________ >>>>>>>Scst-devel mailing list >>>>>>>Scs...@li... >>>>>>>https://lists.sourceforge.net/lists/listinfo/scst-devel >>>>>> >>>>>> >>>>> >>>>>------------------------------------------------------- >>>>>Using Tomcat but need to do more? Need to support web services, security? >>>>>Get stuff done quickly with pre-integrated technology to make your job easier >>>>>Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo >>>>>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 >>>>>_______________________________________________ >>>>>Scst-devel mailing list >>>>>Scs...@li... >>>>>https://lists.sourceforge.net/lists/listinfo/scst-devel >>>>> >>>> >>> >>> >>>------------------------------------------------------- >>>Using Tomcat but need to do more? Need to support web services, security? >>>Get stuff done quickly with pre-integrated technology to make your job easier >>>Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo >>>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 >>>_______________________________________________ >>>Scst-devel mailing list >>>Scs...@li... >>>https://lists.sourceforge.net/lists/listinfo/scst-devel >>> >> >> > > |