From: Brad J. <bjo...@pr...> - 2006-05-23 13:54:08
|
The initiator system log is full of these messages: sd 1:0:0:0: timing out command, waited 150s sd 1:0:0:0: SCSI error: return code = 0x28 end_request: I/O error, dev sda, sector 22334567 sd 1:0:0:0: timing out command, waited 150s sd 1:0:0:0: SCSI error: return code = 0x8 end_request: I/O error, dev sda, sector 22382295 Here is the sgv file: Name Hit Total sgv 0 0 sgv-4K 0 0 sgv-8K 0 0 sgv-16K 0 0 sgv-32K 0 0 sgv-64K 0 0Name Hit Total sgv 0 0 sgv-4K 0 0 sgv-8K 0 0 sgv-16K 0 0 sgv-32K 0 0 sgv-64K 0 0 sgv-128K 0 0 sgv-256K 0 0 sgv-512K 0 0 sgv-1024K 0 0 sgv-2048K 0 0 sgv-clust 2381747 2383002 sgv-clust-4K 974 1040 sgv-clust-8K 459 512 sgv-clust-16K 510 543 sgv-clust-32K 746 787 sgv-clust-64K 8541 8629 sgv-clust-128K 22 32 sgv-clust-256K 61 76 sgv-clust-512K 35599 35714 sgv-clust-1024K 341439 341721 sgv-clust-2048K 1993396 1993948 sgv-dma 0 0 sgv-dma-4K 0 0 sgv-dma-8K 0 0 sgv-dma-16K 0 0 sgv-dma-32K 0 0 sgv-dma-64K 0 0 sgv-dma-128K 0 0 sgv-dma-256K 0 0 sgv-dma-512K 0 0 sgv-dma-1024K 0 0 sgv-dma-2048K 0 0 big 7052121 sgv-128K 0 0 sgv-256K 0 0 sgv-512K 0 0 sgv-1024K 0 0 sgv-2048K 0 0 sgv-clust 2381747 2383002 sgv-clust-4K 974 1040 sgv-clust-8K 459 512 sgv-clust-16K 510 543 sgv-clust-32K 746 787 sgv-clust-64K 8541 8629 sgv-clust-128K 22 32 sgv-clust-256K 61 76 sgv-clust-512K 35599 35714 sgv-clust-1024K 341439 341721 sgv-clust-2048K 1993396 1993948 sgv-dma 0 0 sgv-dma-4K 0 0 sgv-dma-8K 0 0 sgv-dma-16K 0 0 sgv-dma-32K 0 0 sgv-dma-64K 0 0 sgv-dma-128K 0 0 sgv-dma-256K 0 0 sgv-dma-512K 0 0 sgv-dma-1024K 0 0 sgv-dma-2048K 0 0 big 7052121 ...Brad On Tue, 2006-05-23 at 11:41 +0400, Vladislav Bolkhovitin wrote: > OK, thanks. > > Relating the messages. What is your initiator? Does it have something in > its logs? 3 Mb data size in commands is a bit too much, Qlogic cards > usually have troubles to handle it, so you see those messages. Could you > send me the content of /proc/scsi_tgt/sgv file, please? > > Vlad > > Brad Johnson wrote: > > Vlad, > > > > The patched version has not yet crashed after running bonnie for an > > hour. But now it doesn't seem to be progressing, with the following > > messages appearing continuously in the system log: > > > > May 22 16:25:15 localhost kernel: [4614]: scst_set_busy:Sending BUSY > > status to initiator 21:01:00:e0:8b:a6:54:d1 (cmds count 1, queue_type 0, > > sess->init_phase 3), probably the system is overloaded > > May 22 16:25:15 localhost kernel: [4615]: scst_prepare_space:Unable to > > allocate or build requested buffer (size 3182592), sending BUSY status > > > > > > The requested buffer size varies slightly, but is always in the 3 > > million range. > > > > ...Brad > > > > > > On Mon, 2006-05-22 at 16:08 +0400, Vladislav Bolkhovitin wrote: > > > >>Thanks. Could you try with the attached patch, please? (On top of the > >>latest CVS) > >> > >>Brad Johnson wrote: > >> > >>>I tried it with the latest code from CVS (as of 8:00 AM CDT 05/19/2006). > >>>It goes a while longer before crashing this time, and fails in a > >>>different place. This test again used scst_disk handler. I will next try > >>>using the scst_fileio module and will let you know the results. > >>>Here is the GDB output for this failure: > >>> > >>>Program received signal SIGILL, Illegal instruction. > >>>__scst_process_active_cmd (cmd=0xea340d14, context=<value optimized > >>>out>, > >>> pflags=0xf1488fd4, left_locked=1) > >>> at /root/mid-level/cvs_version/src/scst_targ.c:2494 > >>>2494 BUG(); > >>>(gdb) bt > >>>#0 __scst_process_active_cmd (cmd=0xea340d14, context=<value optimized > >>>out>, > >>> pflags=0xf1488fd4, left_locked=1) > >>> at /root/mid-level/cvs_version/src/scst_targ.c:2494 > >>>#1 0xf8d88b8c in scst_do_job_active (active_cmd_list=0xf8dab8f0, > >>> pflags=0xf1488fd4, context=268435459) > >>> at /root/mid-level/cvs_version/src/scst_targ.c:54 > >>>#2 0xf8d88f2b in scst_cmd_thread (arg=<value optimized out>) > >>> at /root/mid-level/cvs_version/src/scst_targ.c:2662 > >>>#3 0xc01024d9 in kernel_thread_helper () at > >>>arch/i386/kernel/process.c:298 > >>> > >>>(gdb) print *cmd > >>>$2 = {cmd_list_entry = {next = 0xf8dab900, prev = 0xf8dab900}, > >>> sess = 0xe92800a8, state = 9, sent_to_midlev = 0, ua_ignore = 0, > >>> atomic = 0, non_atomic_only = 1, internal = 0, retry = 0, blocking = > >>>0, > >>> data_buf_alloced = 0, expected_values_set = 1, processible_env = 1, > >>> cmd_flags = 0, tgtt = 0xf8c39f40, dev = 0xf5b1a340, lun = 0, > >>> tgt_dev = 0xd1b2309c, scsi_req = 0x0, sn = 20668, > >>>search_cmd_list_entry = { > >>> next = 0xe92800d0, prev = 0xe92800d0}, > >>> cdb = "(\000\003\uffff\000?\000\000\b\000\000\000\000\000\000", > >>>cdb_len = 10, > >>> queue_type = SCST_CMD_QUEUE_UNTAGGED, timeout = 15000, retries = 0, > >>> data_direction = DMA_FROM_DEVICE, > >>> expected_data_direction = DMA_FROM_DEVICE, expected_transfer_len = > >>>4096, > >>> data_len = 4096, scst_cmd_done = 0xf8d83080 <scst_cmd_done_local>, > >>> sgv = 0xecc606c0, bufflen = 4096, buffer = 0xecc606d4, use_sg = 1, > >>> get_sg_buf_entry_num = 0, status = 0 '\0', masked_status = 0 '\0', > >>> msg_status = 0 '\0', host_status = 0, driver_status = 0, > >>> sense_buffer = '\0' <repeats 95 times>, tgt_dev_saved = 0x0, > >>> tgt_resp_flags = 2, resp_data_len = 4096, mgmt_cmd = 0x0, > >>> extra_cmd_list_entry = {next = 0x100100, prev = 0x200200}, cmd_saved = > >>>0x0, > >>> tag = 18824, tgt_specific = 0xe32b09bc, tgt_dev_specific = 0x0} > >>>(gdb) > >>> > >>> > >>>...Brad > >>> > >>> > >>>On Thu, 2006-05-18 at 18:38 -0400, Ming Zhang wrote: > >>> > >>> > >>>>1) can u try latest code from scst cvs > >>>> > >>>>2) can u try to export it via scst_fileio module? see if oops in same > >>>>place. > >>>> > >>>>ming > >>>> > >>>> > >>>> > >>>> > >>>>On Thu, 2006-05-18 at 17:20 -0500, Brad Johnson wrote: > >>>> > >>>> > >>>>>The system running scst crashes when doing I/O to target from remote > >>>>>system. > >>>>> > >>>>>Here is my setup: > >>>>>My target system has 2 Intel Xeon processors (3.2 MHz) and 1 GB RAM. > >>>>>It is running Linux 2.6.15.7. > >>>>>It has scst-0.9.4 and qla2x00-target-26-0.9.3.8 installed. > >>>>>It has a Qlogic 2312 HBA connected to a switch. This is my FC target > >>>>>host. (My FC Initiator is another x86 system with a Qlogic HBA also > >>>>>connected to the switch.) > >>>>>For back-end devices it has an LSI FC949X HBA connected to a Hitachi > >>>>>Fibre-channel drive. > >>>>> > >>>>>Here is my start script: > >>>>>-------------------------------------------------------- > >>>>>modprobe -v qla2x00tgt > >>>>>modprobe -v scst_disk > >>>>>echo "add 2:0:3:0 0" >/proc/scsi_tgt/groups/Default/devices > >>>>>echo "1" >/sys/class/scsi_host/host5/target_mode_enabled > >>>>>-------------------------------------------------------- > >>>>> > >>>>>In the script, 2:0:3:0 refers to my Hitachi drive, host5 refers to my > >>>>>Qlogic target-mode port. Everything starts successfully (including > >>>>>scsi_tgt module since it is a dependency of scst_disk). > >>>>> > >>>>>>From my initiator system I see the one drive I have exposed. I > >>>>>successfully partition that drive and do mkfs. At this point everything > >>>>>is still fine. I then mount the file system and copy a big file to it. > >>>>>The copy seems to work fine but at some point shortly after that my > >>>>>target system crashes. There is no oops output to the system log. So I > >>>>>did it again with a remote kgdb attached. Here is the gdb output: > >>>>> > >>>>> > >>>>>Program received signal SIGILL, Illegal instruction. > >>>>>__free_pages (page=0xc190a22c, order=0) at mm/page_alloc.c:1055 > >>>>>1055 if (put_page_testzero(page)) { > >>>>>(gdb) bt > >>>>>#0 __free_pages (page=0xc190a22c, order=0) at mm/page_alloc.c:1055 > >>>>>#1 0xf8d61efd in scst_release_space (cmd=0xf40a4e58) > >>>>> at /root/mid-level/scst-0.9.4/src/scst_lib.c:1430 > >>>>>#2 0xf8d60b2a in scst_free_cmd (cmd=0xf40a4e58, check_retry=1) > >>>>> at /root/mid-level/scst-0.9.4/src/scst_lib.c:956 > >>>>>#3 0xf8d599ee in scst_finish_cmd (cmd=0xf40a4e58) > >>>>> at /root/mid-level/scst-0.9.4/src/scst_targ.c:2212 > >>>>>#4 0xf8d5a7df in __scst_process_active_cmd (cmd=0xf40a4e58, > >>>>> context=<value optimized out>, pflags=0xc046cfb8, > >>>>> left_locked=<value optimized out>) > >>>>> at /root/mid-level/scst-0.9.4/src/scst_targ.c:2461 > >>>>>#5 0xf8d5aa81 in scst_do_job_active (active_cmd_list=0xf8d756d0, > >>>>> pflags=0xc046cfb8, context=268435457) > >>>>> at /root/mid-level/scst-0.9.4/src/scst_targ.c:54 > >>>>>#6 0xf8d5af99 in scst_cmd_tasklet (p=<value optimized out>) > >>>>> at /root/mid-level/scst-0.9.4/src/scst_targ.c:2672 > >>>>>#7 0xc012d905 in tasklet_action (a=<value optimized out>) > >>>>> at kernel/softirq.c:267 > >>>>>#8 0xc012d552 in __do_softirq () at kernel/softirq.c:95 > >>>>>#9 0xc010619e in do_softirq () at arch/i386/kernel/irq.c:187 > >>>>>#10 0xc012d689 in irq_exit () at kernel/softirq.c:169 > >>>>>#11 0xc010604e in do_IRQ (regs=0xc1cf4f48) at arch/i386/kernel/irq.c:110 > >>>>>#12 0xc010499e in common_interrupt () at thread_info.h:91 > >>>>>#13 0xc1cf4000 in ?? () > >>>>>#14 0x00000000 in ?? () > >>>>> > >>>>> > >>>>>I can reproduce this easily every time. Let me know if you want any > >>>>>further information about this. > >>>>> > >>>>>...Brad Johnson > >>>>> > >>>>> > >>>>> > >>>>> > >>>>>------------------------------------------------------- > >>>>>Using Tomcat but need to do more? Need to support web services, security? > >>>>>Get stuff done quickly with pre-integrated technology to make your job easier > >>>>>Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > >>>>>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > >>>>>_______________________________________________ > >>>>>Scst-devel mailing list > >>>>>Scs...@li... > >>>>>https://lists.sourceforge.net/lists/listinfo/scst-devel > >>>> > >>>> > >>> > >>> > >>>------------------------------------------------------- > >>>Using Tomcat but need to do more? Need to support web services, security? > >>>Get stuff done quickly with pre-integrated technology to make your job easier > >>>Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > >>>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > >>>_______________________________________________ > >>>Scst-devel mailing list > >>>Scs...@li... > >>>https://lists.sourceforge.net/lists/listinfo/scst-devel > >>> > >> > > > > > > > > ------------------------------------------------------- > > Using Tomcat but need to do more? Need to support web services, security? > > Get stuff done quickly with pre-integrated technology to make your job easier > > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > > _______________________________________________ > > Scst-devel mailing list > > Scs...@li... > > https://lists.sourceforge.net/lists/listinfo/scst-devel > > > > |