From: Vladislav B. <vs...@vl...> - 2006-05-26 16:43:48
|
Resent, because sf.net bounced the original. ==================================================================== Brad Johnson wrote: > After setting max_sectors_kb = 1024 I got the same results (unable to > allocate size 1 MB). Also the same results at 512. I had to lower it to > 128 to get it to work. Which means that there are some problems in SCST memory allocator. It must allocate memory up to 1 Mb more or less seamlessly. I'll check it. > Isn't this a parameter that the target and initiator should negotiate? I > know with iSCSI the target negotiates the maximum write buffer size it > can handle in a single command. Maybe, but not in the current implementation, where this limit set statically (as in the most SCSI drivers). AFAIK, among all SCSI transports only iSCSI allows such negotiation. > ...Brad > > > On Wed, 2006-05-24 at 15:03 +0400, Vladislav Bolkhovitin wrote: > >>Those messages on the initiator are expected. It issues them when it >>can't perform commands constantly receiving BUSY for them from the >>target, which can't allocate necessary memory in limited SG entries >>count for such huge data sizes. >> >>Brad, how were you managed the initiator to send such commands? I have >>seen such big data sizes only when sent them manually via sg for >>testing. Did you manually edit >>/sys/block/YOUR_DEVICE/queue/max_sectors_kb? What are the current values >>there and in /sys/block/YOUR_DEVICE/queue/max_hw_sectors_kb? You should >>set max_sectors_kb in something more sane like 1024. >> >>Vlad >> >>Brad Johnson wrote: >> >>>The initiator system log is full of these messages: >>> >>>sd 1:0:0:0: timing out command, waited 150s >>>sd 1:0:0:0: SCSI error: return code = 0x28 >>>end_request: I/O error, dev sda, sector 22334567 >>>sd 1:0:0:0: timing out command, waited 150s >>>sd 1:0:0:0: SCSI error: return code = 0x8 >>>end_request: I/O error, dev sda, sector 22382295 >>> >>>Here is the sgv file: >>> >>>Name Hit Total >>>sgv 0 0 >>> sgv-4K 0 0 >>> sgv-8K 0 0 >>> sgv-16K 0 0 >>> sgv-32K 0 0 >>> sgv-64K 0 0Name Hit Total >>>sgv 0 0 >>> sgv-4K 0 0 >>> sgv-8K 0 0 >>> sgv-16K 0 0 >>> sgv-32K 0 0 >>> sgv-64K 0 0 >>> sgv-128K 0 0 >>> sgv-256K 0 0 >>> sgv-512K 0 0 >>> sgv-1024K 0 0 >>> sgv-2048K 0 0 >>> >>>sgv-clust 2381747 2383002 >>> sgv-clust-4K 974 1040 >>> sgv-clust-8K 459 512 >>> sgv-clust-16K 510 543 >>> sgv-clust-32K 746 787 >>> sgv-clust-64K 8541 8629 >>> sgv-clust-128K 22 32 >>> sgv-clust-256K 61 76 >>> sgv-clust-512K 35599 35714 >>> sgv-clust-1024K 341439 341721 >>> sgv-clust-2048K 1993396 1993948 >>> >>>sgv-dma 0 0 >>> sgv-dma-4K 0 0 >>> sgv-dma-8K 0 0 >>> sgv-dma-16K 0 0 >>> sgv-dma-32K 0 0 >>> sgv-dma-64K 0 0 >>> sgv-dma-128K 0 0 >>> sgv-dma-256K 0 0 >>> sgv-dma-512K 0 0 >>> sgv-dma-1024K 0 0 >>> sgv-dma-2048K 0 0 >>> >>>big 7052121 >>> >>> sgv-128K 0 0 >>> sgv-256K 0 0 >>> sgv-512K 0 0 >>> sgv-1024K 0 0 >>> sgv-2048K 0 0 >>> >>>sgv-clust 2381747 2383002 >>> sgv-clust-4K 974 1040 >>> sgv-clust-8K 459 512 >>> sgv-clust-16K 510 543 >>> sgv-clust-32K 746 787 >>> sgv-clust-64K 8541 8629 >>> sgv-clust-128K 22 32 >>> sgv-clust-256K 61 76 >>> sgv-clust-512K 35599 35714 >>> sgv-clust-1024K 341439 341721 >>> sgv-clust-2048K 1993396 1993948 >>> >>>sgv-dma 0 0 >>> sgv-dma-4K 0 0 >>> sgv-dma-8K 0 0 >>> sgv-dma-16K 0 0 >>> sgv-dma-32K 0 0 >>> sgv-dma-64K 0 0 >>> sgv-dma-128K 0 0 >>> sgv-dma-256K 0 0 >>> sgv-dma-512K 0 0 >>> sgv-dma-1024K 0 0 >>> sgv-dma-2048K 0 0 >>> >>>big 7052121 >>> >>> >>>...Brad >>> >>> >>>On Tue, 2006-05-23 at 11:41 +0400, Vladislav Bolkhovitin wrote: >>> >>> >>>>OK, thanks. >>>> >>>>Relating the messages. What is your initiator? Does it have something in >>>>its logs? 3 Mb data size in commands is a bit too much, Qlogic cards >>>>usually have troubles to handle it, so you see those messages. Could you >>>>send me the content of /proc/scsi_tgt/sgv file, please? >>>> >>>>Vlad >>>> >>>>Brad Johnson wrote: >>>> >>>> >>>>>Vlad, >>>>> >>>>>The patched version has not yet crashed after running bonnie for an >>>>>hour. But now it doesn't seem to be progressing, with the following >>>>>messages appearing continuously in the system log: >>>>> >>>>>May 22 16:25:15 localhost kernel: [4614]: scst_set_busy:Sending BUSY >>>>>status to initiator 21:01:00:e0:8b:a6:54:d1 (cmds count 1, queue_type 0, >>>>>sess->init_phase 3), probably the system is overloaded >>>>>May 22 16:25:15 localhost kernel: [4615]: scst_prepare_space:Unable to >>>>>allocate or build requested buffer (size 3182592), sending BUSY status >>>>> >>>>> >>>>>The requested buffer size varies slightly, but is always in the 3 >>>>>million range. >>>>> >>>>>...Brad >>>>> >>>>> >>>>>On Mon, 2006-05-22 at 16:08 +0400, Vladislav Bolkhovitin wrote: >>>>> >>>>> >>>>> >>>>>>Thanks. Could you try with the attached patch, please? (On top of the >>>>>>latest CVS) >>>>>> >>>>>>Brad Johnson wrote: >>>>>> >>>>>> >>>>>> >>>>>>>I tried it with the latest code from CVS (as of 8:00 AM CDT 05/19/2006). >>>>>>>It goes a while longer before crashing this time, and fails in a >>>>>>>different place. This test again used scst_disk handler. I will next try >>>>>>>using the scst_fileio module and will let you know the results. >>>>>>>Here is the GDB output for this failure: >>>>>>> >>>>>>>Program received signal SIGILL, Illegal instruction. >>>>>>>__scst_process_active_cmd (cmd=0xea340d14, context=<value optimized >>>>>>>out>, >>>>>>> pflags=0xf1488fd4, left_locked=1) >>>>>>> at /root/mid-level/cvs_version/src/scst_targ.c:2494 >>>>>>>2494 BUG(); >>>>>>>(gdb) bt >>>>>>>#0 __scst_process_active_cmd (cmd=0xea340d14, context=<value optimized >>>>>>>out>, >>>>>>> pflags=0xf1488fd4, left_locked=1) >>>>>>> at /root/mid-level/cvs_version/src/scst_targ.c:2494 >>>>>>>#1 0xf8d88b8c in scst_do_job_active (active_cmd_list=0xf8dab8f0, >>>>>>> pflags=0xf1488fd4, context=268435459) >>>>>>> at /root/mid-level/cvs_version/src/scst_targ.c:54 >>>>>>>#2 0xf8d88f2b in scst_cmd_thread (arg=<value optimized out>) >>>>>>> at /root/mid-level/cvs_version/src/scst_targ.c:2662 >>>>>>>#3 0xc01024d9 in kernel_thread_helper () at >>>>>>>arch/i386/kernel/process.c:298 >>>>>>> >>>>>>>(gdb) print *cmd >>>>>>>$2 = {cmd_list_entry = {next = 0xf8dab900, prev = 0xf8dab900}, >>>>>>>sess = 0xe92800a8, state = 9, sent_to_midlev = 0, ua_ignore = 0, >>>>>>>atomic = 0, non_atomic_only = 1, internal = 0, retry = 0, blocking = >>>>>>>0, >>>>>>>data_buf_alloced = 0, expected_values_set = 1, processible_env = 1, >>>>>>>cmd_flags = 0, tgtt = 0xf8c39f40, dev = 0xf5b1a340, lun = 0, >>>>>>>tgt_dev = 0xd1b2309c, scsi_req = 0x0, sn = 20668, >>>>>>>search_cmd_list_entry = { >>>>>>> next = 0xe92800d0, prev = 0xe92800d0}, >>>>>>>cdb = "(\000\003\uffff\000?\000\000\b\000\000\000\000\000\000", >>>>>>>cdb_len = 10, >>>>>>>queue_type = SCST_CMD_QUEUE_UNTAGGED, timeout = 15000, retries = 0, >>>>>>>data_direction = DMA_FROM_DEVICE, >>>>>>>expected_data_direction = DMA_FROM_DEVICE, expected_transfer_len = >>>>>>>4096, >>>>>>>data_len = 4096, scst_cmd_done = 0xf8d83080 <scst_cmd_done_local>, >>>>>>>sgv = 0xecc606c0, bufflen = 4096, buffer = 0xecc606d4, use_sg = 1, >>>>>>>get_sg_buf_entry_num = 0, status = 0 '\0', masked_status = 0 '\0', >>>>>>>msg_status = 0 '\0', host_status = 0, driver_status = 0, >>>>>>>sense_buffer = '\0' <repeats 95 times>, tgt_dev_saved = 0x0, >>>>>>>tgt_resp_flags = 2, resp_data_len = 4096, mgmt_cmd = 0x0, >>>>>>>extra_cmd_list_entry = {next = 0x100100, prev = 0x200200}, cmd_saved = >>>>>>>0x0, >>>>>>>tag = 18824, tgt_specific = 0xe32b09bc, tgt_dev_specific = 0x0} >>>>>>>(gdb) >>>>>>> >>>>>>> >>>>>>>...Brad >>>>>>> >>>>>>> >>>>>>>On Thu, 2006-05-18 at 18:38 -0400, Ming Zhang wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>1) can u try latest code from scst cvs >>>>>>>> >>>>>>>>2) can u try to export it via scst_fileio module? see if oops in same >>>>>>>>place. >>>>>>>> >>>>>>>>ming >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>On Thu, 2006-05-18 at 17:20 -0500, Brad Johnson wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>The system running scst crashes when doing I/O to target from remote >>>>>>>>>system. >>>>>>>>> >>>>>>>>>Here is my setup: >>>>>>>>>My target system has 2 Intel Xeon processors (3.2 MHz) and 1 GB RAM. >>>>>>>>>It is running Linux 2.6.15.7. >>>>>>>>>It has scst-0.9.4 and qla2x00-target-26-0.9.3.8 installed. >>>>>>>>>It has a Qlogic 2312 HBA connected to a switch. This is my FC target >>>>>>>>>host. (My FC Initiator is another x86 system with a Qlogic HBA also >>>>>>>>>connected to the switch.) >>>>>>>>>For back-end devices it has an LSI FC949X HBA connected to a Hitachi >>>>>>>>>Fibre-channel drive. >>>>>>>>> >>>>>>>>>Here is my start script: >>>>>>>>>-------------------------------------------------------- >>>>>>>>>modprobe -v qla2x00tgt >>>>>>>>>modprobe -v scst_disk >>>>>>>>>echo "add 2:0:3:0 0" >/proc/scsi_tgt/groups/Default/devices >>>>>>>>>echo "1" >/sys/class/scsi_host/host5/target_mode_enabled >>>>>>>>>-------------------------------------------------------- >>>>>>>>> >>>>>>>>>In the script, 2:0:3:0 refers to my Hitachi drive, host5 refers to my >>>>>>>>>Qlogic target-mode port. Everything starts successfully (including >>>>>>>>>scsi_tgt module since it is a dependency of scst_disk). >>>>>>>>> >>>>>>>>>>From my initiator system I see the one drive I have exposed. I >>>>>>>>>successfully partition that drive and do mkfs. At this point everything >>>>>>>>>is still fine. I then mount the file system and copy a big file to it. >>>>>>>>>The copy seems to work fine but at some point shortly after that my >>>>>>>>>target system crashes. There is no oops output to the system log. So I >>>>>>>>>did it again with a remote kgdb attached. Here is the gdb output: >>>>>>>>> >>>>>>>>> >>>>>>>>>Program received signal SIGILL, Illegal instruction. >>>>>>>>>__free_pages (page=0xc190a22c, order=0) at mm/page_alloc.c:1055 >>>>>>>>>1055 if (put_page_testzero(page)) { >>>>>>>>>(gdb) bt >>>>>>>>>#0 __free_pages (page=0xc190a22c, order=0) at mm/page_alloc.c:1055 >>>>>>>>>#1 0xf8d61efd in scst_release_space (cmd=0xf40a4e58) >>>>>>>>> at /root/mid-level/scst-0.9.4/src/scst_lib.c:1430 >>>>>>>>>#2 0xf8d60b2a in scst_free_cmd (cmd=0xf40a4e58, check_retry=1) >>>>>>>>> at /root/mid-level/scst-0.9.4/src/scst_lib.c:956 >>>>>>>>>#3 0xf8d599ee in scst_finish_cmd (cmd=0xf40a4e58) >>>>>>>>> at /root/mid-level/scst-0.9.4/src/scst_targ.c:2212 >>>>>>>>>#4 0xf8d5a7df in __scst_process_active_cmd (cmd=0xf40a4e58, >>>>>>>>> context=<value optimized out>, pflags=0xc046cfb8, >>>>>>>>> left_locked=<value optimized out>) >>>>>>>>> at /root/mid-level/scst-0.9.4/src/scst_targ.c:2461 >>>>>>>>>#5 0xf8d5aa81 in scst_do_job_active (active_cmd_list=0xf8d756d0, >>>>>>>>> pflags=0xc046cfb8, context=268435457) >>>>>>>>> at /root/mid-level/scst-0.9.4/src/scst_targ.c:54 >>>>>>>>>#6 0xf8d5af99 in scst_cmd_tasklet (p=<value optimized out>) >>>>>>>>> at /root/mid-level/scst-0.9.4/src/scst_targ.c:2672 >>>>>>>>>#7 0xc012d905 in tasklet_action (a=<value optimized out>) >>>>>>>>> at kernel/softirq.c:267 >>>>>>>>>#8 0xc012d552 in __do_softirq () at kernel/softirq.c:95 >>>>>>>>>#9 0xc010619e in do_softirq () at arch/i386/kernel/irq.c:187 >>>>>>>>>#10 0xc012d689 in irq_exit () at kernel/softirq.c:169 >>>>>>>>>#11 0xc010604e in do_IRQ (regs=0xc1cf4f48) at arch/i386/kernel/irq.c:110 >>>>>>>>>#12 0xc010499e in common_interrupt () at thread_info.h:91 >>>>>>>>>#13 0xc1cf4000 in ?? () >>>>>>>>>#14 0x00000000 in ?? () >>>>>>>>> >>>>>>>>> >>>>>>>>>I can reproduce this easily every time. Let me know if you want any >>>>>>>>>further information about this. >>>>>>>>> >>>>>>>>>...Brad Johnson >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>------------------------------------------------------- >>>>>>>>>Using Tomcat but need to do more? Need to support web services, security? >>>>>>>>>Get stuff done quickly with pre-integrated technology to make your job easier >>>>>>>>>Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo >>>>>>>>>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 >>>>>>>>>_______________________________________________ >>>>>>>>>Scst-devel mailing list >>>>>>>>>Scs...@li... >>>>>>>>>https://lists.sourceforge.net/lists/listinfo/scst-devel >>>>>>>> >>>>>>>> >>>>>>>------------------------------------------------------- >>>>>>>Using Tomcat but need to do more? Need to support web services, security? >>>>>>>Get stuff done quickly with pre-integrated technology to make your job easier >>>>>>>Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo >>>>>>>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 >>>>>>>_______________________________________________ >>>>>>>Scst-devel mailing list >>>>>>>Scs...@li... >>>>>>>https://lists.sourceforge.net/lists/listinfo/scst-devel >>>>>>> >>>>>> >>>>> >>>>>------------------------------------------------------- >>>>>Using Tomcat but need to do more? Need to support web services, security? >>>>>Get stuff done quickly with pre-integrated technology to make your job easier >>>>>Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo >>>>>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 >>>>>_______________________________________________ >>>>>Scst-devel mailing list >>>>>Scs...@li... >>>>>https://lists.sourceforge.net/lists/listinfo/scst-devel >>>>> >>>> >>>> >>> >> >> >>------------------------------------------------------- >>All the advantages of Linux Managed Hosting--Without the Cost and Risk! >>Fully trained technicians. The highest number of Red Hat certifications in >>the hosting industry. Fanatical Support. Click to learn more >>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642 >>_______________________________________________ >>Scst-devel mailing list >>Scs...@li... >>https://lists.sourceforge.net/lists/listinfo/scst-devel >> > > > |