Menu

#2402 node stuck in openfirmware when rpower boot

2.6.10
closed
5
2012-09-19
2011-11-10
No

The frame is p7 IH with AIX 71D Gold build, we have 316 diskless CNs and 2 SNs.

After rnetbooting, when I rebooting CNs by "rpower .. boot", some CNs ran into openfirmware, I reran rpower and it still stuck there. At last I ran rnetboot and it went up. I think it may related with the bootseq which should be set up by the postscripts : setbootfromnet.

I checked the CNs after it came up(Since it has be rpowered up, I am not sure if the output is still making sense):
Running postscript: setbootfromnet
0514-202 bootlist: Invalid device name (hf0)

But lsdev showed "Defined" of hf0, so may this confused the setbootformnet and made it fail, can you check it, Hua Zhong?

Discussion

  • Guang Cheng Li

    Guang Cheng Li - 2011-11-10

    I think the "Defined" state is not right, it means the hf0 is in ODM but the hardware for hf0 is not physically available. This should be the reason why setbootfromnet can not set the node to boot from this hf0 interface.

    The setbootfromnet is a quite simple postscript, it runs command "bootlist -m normal $NIC bserver=$SERVER_IP gateway=$GATE_WAY client=$CLIENT_IP",

    In your case, the NIC should be hf0, the $SERVER_IP and $GATE_WAY should be the service node ip address, and the $CLIENT_IP should be the node ip address.

     
  • wang huazhong

    wang huazhong - 2011-11-10

    hf0 is configured by AIX during boot for installation by ifconfig, xCAT postscript confighfi doesn't configure it since reconfiguration of hf0 will make the CN's network unvailable for a short time and may impact other applications if they are using the network.

    Confuguring hf0 by ifconfig can make hf0 network work but in ODM it shows "Defined". So ifconfig is not the correct way to configure HFI interfaces, aix has provided a new script "mkhfi" to configue hfx, xCAT has used it to configure hf1, hf2, and hf. But this new scirpt is not used by AIX during boot.

    Yan Feng will open a bug to AIX to fix this issue.

    This bug leave another issue in setbootfromnet in "bootlist -m normal hf0". hf0 is not a valid HFI physical devices, so bootlist cannot recognize it. The correct device name should be "hfi0". Will fix this problem during this bug.

     
  • Kerry Bosworth

    Kerry Bosworth - 2011-11-14

    The node c250f53c09ap29 on frame 53 (c250mgrs32-pvt) can be used for testing and rebooted if needed.

     
  • Bruce

    Bruce - 2011-11-14

    Hua zhong,

    I'm changing this bug from 2.7 to 2.6.10, because i'd like you to fix the setbootfromnet problem in both 2.6.10 & 2.7. It is a simple fix, right?

     
  • Bruce

    Bruce - 2011-11-14

    There still seems to be a problem with the bootlist sometimes getting changed from what the user set it to. Hua Zhong is still investigating a possible aix problem. Until this is fixed, we recommend using rbootseq to explicitly set it each time before running rpower.

     
  • yan feng han

    yan feng han - 2011-11-15

    For the hf0 "Define" problem, I opened a defect in ClearQuest, the number is SW109898, the dev is looking into it.

     
  • wang huazhong

    wang huazhong - 2011-11-15

    opened AIX bug 179949 :Boot device is setting to iSCSI dump disks after diskless boot

    This issue caused the boot device changed after diskless node booted

    The reason of why some nodes set boot device to iSCSI disk and others don't, I checked several nodes and found that if the diskless node has "Available" iscsi disks, boot device will be set to it.

    On some nodes, the iscsi disks are in "Defined" state, and so boot device cannot set to it and keeps the original one.

    I still need to check with AIX team to see why the iscsi disk is in "Defined" state, not sure if we have any configuration related to it or it is hardware issue.

     
  • Bruce

    Bruce - 2011-11-17

    Hua Zhong,

    Why is the cmvc defect 179949 opened in the hpssl family and assigned to fernando? Shoudn't it be opened in the aix family and assigned to someone from the aix team?

     
  • wang huazhong

    wang huazhong - 2011-11-18

    my cmvc id was requested in hpssl family and this bug is opened to screen team..

    I am requesting an id in AIX family..

     
  • wang huazhong

    wang huazhong - 2011-11-21

    I have canceled the original CMVC bug and reopened one to AIX family.

    New AIX bug 817629: Boot device is setting to iSCSI dump disks after diskless boot

     
  • wang huazhong

    wang huazhong - 2011-12-05

    Fixed with revision 11122 in truck and 11123 in 2.6 branch.

    Previousely on AIX, we replace en0 with ent0 as boot device, since ent0 is the physical device name. For hfi, we need to do the same thing, use hfi0 instead of hf0 as boot devce, we were missing this change.

    I have grep and sed "hfx", and change it to "hfix" while setting boot device.

     
  • yan feng han

    yan feng han - 2012-02-27

    Verified on big cluster, worked well.