|
From: Pascal B. <pas...@fr...> - 2012-07-14 12:27:04
|
Hi all ! I'm currently facing weird problems with SCST, and after days of various experiments and observations, trying to isolate as precisely as possible the problem, I conclude that I now need a hand. Could somebody help me a bit on that ? Basically, we're running a 2 nodes single-primary DRBD/Pacemaker cluster (kernel version 2.6.32-71.7.1., based from Openfiler 2.99 distro) hosting 4 DRBD resources each presented to 4 VMware hosts (ESXi 4.1) using two SCST (vdisk_fileio) and ISCSI-SCST targets (version 2.0.0.1 at first, now in 2.2.0 but the problem persists) per resource. Resources are spread over the 2 nodes, 3 active TB per node overall. DRBD replication link is a dual 10GbE link bonded in LACP (mode 4). Volumes are hardware RAID5 made up of 9 15krpm 146 or 300GB SAS drives (I mean, disk IO perf doesn't seem to be in cause) Basically, the issue is : Cluster starts resources, the 4 DRBD primaries go up, then the 4 pairs of virtual IPs, then the SCST services and things run fine. Until you try to migrate resources back and forth. When you do that, it works once, twice, sometimes even 3 times, but then you can see DRBD promoted correctly, then the IPs wake up, but the SCST resource remains stuck down, running into timeout after the configured 60s. At that moment, everything fails back to its former place, as it should. If you try again, same story. In the end, you obtain a cluster but the resource is stuck on a node, unable to failover either manually or, more embarrassing, following a node crash (Which we inevitably faced recently, thanks Mr Murphy.). After digging the various logs, what I see is : - DRBD does its job 100% correctly - Pacemaker seems to do its job, with the resource it has, in the state they are in. (I mean, the errors it mentions look normal errors in the global failing context) - SCST starts its job, but hangs on the device handling section (BTW, my RA agent uses the sysfs interface and is based on Patrick Zwahlen's implementation that I customized a bit, mostly to add more friendly tracing, and also to invert the order of activation : iSCSI target first, then the backend device, instead of the reverse, although I now doubt it has a real impact). Basically, all the iSCSI target setup stuff runs fine, but then : o Either it hangs on backend device creation o Or it hangs an LUN 0 assignment >From that point on, it hangs until the configured start timeout, and then everybody goes back home, however. The backend device that refused to get created correctly has been created and remains, eventually the target directory and even sometimes the LUN0 directory in it too. From that point on, it turns into a good mess, in fact problems start here. After that, any migration try is doomed to failure! If I reboot the node, it will accept a couple of migrations again, and then fail again in the same manner. Now the facts : When it hangs, I can see a whole lot bunch of error messages like this in the logs : -------------------------------------------------8<------------------------- ------------------------ Jul 14 05:10:41 ipstore21 kernel: [2061]: scst_translate_lun:3891:FLAG SUSPENDED set, skipping Jul 14 05:10:41 ipstore21 kernel: [2062]: scst_translate_lun:3891:FLAG SUSPENDED set, skipping Jul 14 05:10:41 ipstore21 kernel: [2062]: scst_translate_lun:3891:FLAG SUSPENDED set, skipping Jul 14 05:10:41 ipstore21 kernel: [2061]: scst_translate_lun:3891:FLAG SUSPENDED set, skipping Jul 14 05:10:41 ipstore21 kernel: [2062]: scst_translate_lun:3891:FLAG SUSPENDED set, skipping Jul 14 05:10:41 ipstore21 kernel: [2062]: scst_translate_lun:3891:FLAG SUSPENDED set, skipping Jul 14 05:10:41 ipstore21 kernel: [2061]: scst_translate_lun:3891:FLAG SUSPENDED set, skipping Jul 14 05:10:41 ipstore21 kernel: [2062]: scst_translate_lun:3891:FLAG SUSPENDED set, skipping Jul 14 05:10:41 ipstore21 kernel: [2061]: scst_translate_lun:3891:FLAG SUSPENDED set, skipping Jul 14 05:10:41 ipstore21 kernel: [2062]: scst_translate_lun:3891:FLAG SUSPENDED set, skipping Jul 14 05:10:41 ipstore21 kernel: [2061]: scst_translate_lun:3891:FLAG SUSPENDED set, skipping Jul 14 05:10:41 ipstore21 kernel: [2062]: scst_translate_lun:3891:FLAG SUSPENDED set, skipping Jul 14 05:10:41 ipstore21 kernel: [2061]: scst_translate_lun:3891:FLAG SUSPENDED set, skipping Jul 14 05:10:41 ipstore21 kernel: [2062]: scst_translate_lun:3891:FLAG SUSPENDED set, skipping Jul 14 05:10:41 ipstore21 kernel: [2061]: scst_translate_lun:3891:FLAG SUSPENDED set, skipping ... -------------------------------------------------8<------------------------- ------------------------ For instance, in the above precise case, it occurred endlessly until 05:10:54 (Guess how many log entries it represents), then it seems to finish something but immediately start again its "FLAG SUSPENDED set" stuff, until another "more active" bunch of log entries, etc. Until the timeout. As follows : -------------------------------------------------8<------------------------- ------------------------ Jul 14 05:10:54 ipstore21 kernel: [2061]: scst_translate_lun:3891:FLAG SUSPENDED set, skipping Jul 14 05:10:54 ipstore21 kernel: [2060]: scst_translate_lun:3891:FLAG SUSPENDED set, skipping Jul 14 05:10:54 ipstore21 kernel: [2062]: scst_translate_lun:3891:FLAG SUSPENDED set, skipping Jul 14 05:10:54 ipstore21 kernel: iscsi-scst: iSCSI TM fn 1 Jul 14 05:10:54 ipstore21 kernel: [2061]: scst: scst_rx_mgmt_fn:6045:TM fn 0 Jul 14 05:10:54 ipstore21 kernel: [2061]: scst_rx_mgmt_fn:6057:sess=ffff88030cf70b68, tag_set 1, tag 18217820, lun_set 1, lun=0, cmd_sn_set 1, cmd_sn 86922, priv ffff8802899f2320 Jul 14 05:10:54 ipstore21 kernel: [2061]: scst_post_rx_mgmt_cmd:5976:Adding mgmt cmd ffff88032fc4a470 to active mgmt cmd list Jul 14 05:10:54 ipstore21 kernel: [1084]: scst_tm_thread:5862:Deleting mgmt cmd ffff88032fc4a470 from active cmd list Jul 14 05:10:54 ipstore21 kernel: [1084]: scst_mgmt_cmd_init:5164:ABORT TASK: command for tag 18217820 not found Jul 14 05:10:54 ipstore21 kernel: [1084]: scst_call_task_mgmt_affected_cmds_done:5662:Target's iscsi task_mgmt_affected_cmds_done() returned Jul 14 05:10:54 ipstore21 kernel: [1084]: scst: scst_mgmt_cmd_send_done:5709:TM fn 0 finished, status ffffffff Jul 14 05:10:54 ipstore21 kernel: iscsi-scst: iSCSI TM fn 1 finished, status 0 Jul 14 05:10:54 ipstore21 kernel: [1084]: scst_mgmt_cmd_send_done:5724:Target's iscsi task_mgmt_fn_done() returned Jul 14 05:10:54 ipstore21 kernel: [2061]: scst_translate_lun:3891:FLAG SUSPENDED set, skipping Jul 14 05:10:54 ipstore21 kernel: [2060]: scst_translate_lun:3891:FLAG SUSPENDED set, skipping Jul 14 05:10:54 ipstore21 kernel: [2062]: scst_translate_lun:3891:FLAG SUSPENDED set, skipping Jul 14 05:10:54 ipstore21 kernel: [2061]: scst_translate_lun:3891:FLAG SUSPENDED set, skipping ... -------------------------------------------------8<------------------------- ------------------------ When this occurs, I also can see a sudden increase of dropped received packets on the network, just like if the kernel wad damn busy doing something. Just to be as precise as possible, hereafter is a sample of what happens in the logs from the very beginning of a migration request that will end up in backend device setup hang. This is not the only possible scenario, but is the most frequent one, especially at the very first SCST activation failure after a node reboot (I.e. when we start from a "clean" state) : -------------------------------------------------8<------------------------- ------------------------ Jul 14 04:44:15 ipstore11 kernel: [2048]: scst: scst_register_target:563:Target iqn.2003-08.fr.eurialys:rdmBackupExec for template iscsi registered successfully Jul 14 04:44:15 ipstore11 kernel: [1074]: dev_vdisk: vdisk_report_registering:3330:Registering virtual vdisk_fileio device drbd2 (NV_CACHE) Jul 14 04:44:15 ipstore11 kernel: [1074]: scst_suspend_activity:745:suspend_count 0 Jul 14 04:44:15 ipstore11 kernel: [1074]: scst: scst_suspend_activity:777:Waiting for 21 active commands to complete... This might take few minutes for disks or few hours for tapes, if you use long executed commands, like REWIND or FORMAT. In case, if you have a hung user space device (i.e. made using scst_user module) not responding to any commands, if might take virtually forever until the corresponding user space program recovers and starts responding or gets killed. Jul 14 04:44:15 ipstore11 kernel: [2040]: scst_translate_lun:3857:FLAG SUSPENDED set, skipping Jul 14 04:44:15 ipstore11 kernel: [2040]: scst_translate_lun:3857:FLAG SUSPENDED set, skipping Jul 14 04:44:15 ipstore11 kernel: [2041]: scst_translate_lun:3857:FLAG SUSPENDED set, skipping Jul 14 04:44:15 ipstore11 kernel: [2040]: scst_translate_lun:3857:FLAG SUSPENDED set, skipping Jul 14 04:44:15 ipstore11 kernel: [2041]: scst_translate_lun:3857:FLAG SUSPENDED set, skipping Jul 14 04:44:15 ipstore11 kernel: [2039]: scst_translate_lun:3857:FLAG SUSPENDED set, skipping Jul 14 04:44:15 ipstore11 kernel: [2040]: scst_translate_lun:3857:FLAG SUSPENDED set, skipping Jul 14 04:44:15 ipstore11 kernel: [2041]: scst_translate_lun:3857:FLAG SUSPENDED set, skipping Jul 14 04:44:15 ipstore11 kernel: [2040]: scst_translate_lun:3857:FLAG SUSPENDED set, skipping Jul 14 04:44:15 ipstore11 kernel: [2041]: scst_translate_lun:3857:FLAG SUSPENDED set, skipping Jul 14 04:44:15 ipstore11 kernel: [2039]: scst_translate_lun:3857:FLAG SUSPENDED set, skipping Jul 14 04:44:15 ipstore11 kernel: [2040]: scst_translate_lun:3857:FLAG SUSPENDED set, skipping Jul 14 04:44:15 ipstore11 kernel: [2041]: scst_translate_lun:3857:FLAG SUSPENDED set, skipping Jul 14 04:44:15 ipstore11 kernel: [2040]: scst_translate_lun:3857:FLAG SUSPENDED set, skipping -------------------------------------------------8<------------------------- ------------------------ It sounds like the backend device is waked up but refuses to start working before it unwinds some tasks it has in the pipe. But it was "sleeping", it's the other node that was working, so why would it undo something ? I also see things like this : -------------------------------------------------8<------------------------- ------------------------ Jul 14 05:10:36 ipstore21 kernel: [29483]: scst_free_cmd:4225:Freeing aborted cmd ffff8803071353c8 (scst_cmd_count 241) Jul 14 05:10:36 ipstore21 kernel: [29483]: scst_finish_cmd:3694:Aborted cmd ffff8803071351d0 finished (cmd_ref 1, scst_cmd_count 240) Jul 14 05:10:36 ipstore21 kernel: [29483]: scst_finish_cmd_mgmt:4630:cmd ffff8803071351d0 finished (tag 44725616) Jul 14 05:10:36 ipstore21 kernel: [29483]: scst_finish_cmd_mgmt:4639:mcmd ffff88032fc4a0c8, mcmd->cmd_finish_wait_count 237 Jul 14 05:10:36 ipstore21 kernel: [29483]: __scst_dec_finish_wait_count:4535:cmd_finish_wait_count(236) not 0, skipping Jul 14 05:10:36 ipstore21 kernel: [29483]: scst_free_cmd:4225:Freeing aborted cmd ffff8803071351d0 (scst_cmd_count 240) Jul 14 05:10:36 ipstore21 kernel: [29483]: scst_finish_cmd:3694:Aborted cmd ffff880307359c20 finished (cmd_ref 1, scst_cmd_count 239) Jul 14 05:10:36 ipstore21 kernel: [29483]: scst_finish_cmd_mgmt:4630:cmd ffff880307359c20 finished (tag 44725617) Jul 14 05:10:36 ipstore21 kernel: [29483]: scst_finish_cmd_mgmt:4639:mcmd ffff88032fc4a0c8, mcmd->cmd_finish_wait_count 236 -------------------------------------------------8<------------------------- ------------------------ Or also : -------------------------------------------------8<------------------------- ------------------------ Jul 14 04:29:51 ipstore21 kernel: [30900]: scst: scst_cmd_thread:4313:Processing thread drbd03_6 (PID 30900) started Jul 14 04:29:51 ipstore21 kernel: [30900]: scst_cmd_thread:4345:Linked IO context ffff8802d90dcf50 (p_cmd_threads ffff88017093e8e8) Jul 14 04:29:51 ipstore21 kernel: iscsi-scst: Negotiated parameters: InitialR2T No, ImmediateData Yes, MaxConnections 1, MaxRecvDataSegmentLength 1048576, MaxXmitDataSegmentLength 131072, Jul 14 04:29:51 ipstore21 kernel: iscsi-scst: MaxBurstLength 262144, FirstBurstLength 262144, DefaultTime2Wait 2, DefaultTime2Retain 0, Jul 14 04:29:51 ipstore21 kernel: iscsi-scst: MaxOutstandingR2T 1, DataPDUInOrder Yes, DataSequenceInOrder Yes, ErrorRecoveryLevel 0, Jul 14 04:29:51 ipstore21 kernel: iscsi-scst: HeaderDigest None, DataDigest None, OFMarker No, IFMarker No, OFMarkInt 2048, IFMarkInt 2048 Jul 14 04:29:51 ipstore21 kernel: iscsi-scst: Target parameters set for session 1380100003d0200: QueuedCommands 32, Response timeout 30, Nop-In interval 30 Jul 14 04:29:51 ipstore21 kernel: [30901]: scst: scst_cmd_thread:4313:Processing thread drbd03_7 (PID 30901) started Jul 14 04:29:51 ipstore21 kernel: [30901]: scst_cmd_thread:4345:Linked IO context ffff8802d90dcf50 (p_cmd_threads ffff88017093e8e8) Jul 14 04:29:52 ipstore21 kernel: [30893]: scst_set_pending_UA:6191:Setting pending UA cmd ffff8801c322c3c8 Jul 14 04:29:52 ipstore21 kernel: [30901]: scst_set_pending_UA:6191:Setting pending UA cmd ffff8801da266638 Jul 14 04:30:22 ipstore21 kernel: iscsi-scst: ***ERROR***: Timeout sending data/waiting for reply to/from initiator iqn.1998-01.com.vmware:vmware12-0176b5c2 (SID 1350100003d0200), closing connection Jul 14 04:30:22 ipstore21 kernel: [31658]: scst: scst_rx_mgmt_fn:6045:TM fn 6 Jul 14 04:30:22 ipstore21 kernel: [31658]: scst_rx_mgmt_fn:6057:sess=ffff8802e4c017f0, tag_set 0, tag 0, lun_set 1, lun=0, cmd_sn_set 0, cmd_sn 0, priv ffff880289c30c00 Jul 14 04:30:22 ipstore21 kernel: [31658]: scst_post_rx_mgmt_cmd:5976:Adding mgmt cmd ffff88032ddc1338 to active mgmt cmd list Jul 14 04:30:22 ipstore21 kernel: [1095]: scst_tm_thread:5862:Deleting mgmt cmd ffff88032ddc1338 from active cmd list Jul 14 04:30:22 ipstore21 kernel: [1095]: scst_abort_all_nexus_loss_sess:5416:Nexus loss for sess ffff8802e4c017f0 (mcmd ffff88032ddc1338) Jul 14 04:30:22 ipstore21 kernel: [1095]: scst: scst_abort_cmd:4707:Aborting cmd ffff880131d0c970 (tag 1458026, op 12) Jul 14 04:30:22 ipstore21 kernel: [1095]: scst_abort_cmd:4813:cmd ffff880131d0c970 (tag 1458026, sn 0) being executed/xmitted (state 120, op 12, proc time 31 sec., timeout 3 sec.), deferring ABORT (cmd_done_wait_count 0, cmd_finish_wait_count 1) Jul 14 04:30:22 ipstore21 kernel: [1095]: scst: scst_abort_cmd:4707:Aborting cmd ffff880131d0c778 (tag 1458027, op 12) Jul 14 04:30:22 ipstore21 kernel: [1095]: scst_abort_cmd:4813:cmd ffff880131d0c778 (tag 1458027, sn 0) being executed/xmitted (state 120, op 12, proc time 30 sec., timeout 3 sec.), deferring ABORT (cmd_done_wait_count 0, cmd_finish_wait_count 2) Jul 14 04:30:22 ipstore21 kernel: [1095]: scst: scst_abort_cmd:4707:Aborting cmd ffff880131d0c580 (tag 1458028, op 12) Jul 14 04:30:22 ipstore21 kernel: [1095]: scst_abort_cmd:4813:cmd ffff880131d0c580 (tag 1458028, sn 0) being executed/xmitted (state 120, op 12, proc time 30 sec., timeout 3 sec.), deferring ABORT (cmd_done_wait_count 0, cmd_finish_wait_count 3) -------------------------------------------------8<------------------------- ------------------------ This makes me think of section "Work if target's backstorage or link is too slow" in the SCST Readme, but my RAID card is an HP SmartArray P812 with 1GB cache onboard, RAID volumes consist of nine 15krpm SAS disks, I mean this doesn't sound like a slow backstorage, does it ? Well I'm stuck, don't know where to look at now, could someone give me some advice or enlightment on that ? I don't understand why a device that was idle until now tries to abort stacked commands if it was precisely idle. Can it be the SCST workload of the other running resources that would fill the SCST stack ? Reading the various READMEs and howtos, I feel like any resource is independent from the others, so what ? And I'm fairly sure I had the occasion to see that failing even when the destination node was empty, all resources being running on the other one. Any help would be highly appreciated. I have tons of logs and tens of config files, just ask me what you'd need. J Thanks! Best regards, Pascal. |