Menu

#2846 HASN:AIX multi-path changes can't be saved after rebooting

2.7.2
closed
HA-SN (21)
5
2014-08-16
2012-05-16
No

The cluster is P7 IH with 12 drawers, the EMS is c250mgrs27-pvt (frame 12), it is AIX 71D SP3 cluster, we set up HA SN on the cluster.

The xcat is 2.7.2 05/04 build:
[c250mgrs27-pvt][/]> rpm -qa|grep -i xcat
perl-xCAT-2.7.2-snap201205030303
openslp-xcat-1.2.1-1
xCAT-dfm-2.7.0-13
xCAT-IBMhpc-2.7.2-snap201205030304
xCAT-2.7.2-snap201205030304
xCAT-client-2.7.2-snap201205030303
xCAT-rmc-2.7.2-snap201205011649
xCAT-server-2.7.2-snap201205030303
[c250mgrs27-pvt][/]> xdsh service "rpm -qa|grep -i xcat"|xcoll
====================================
service
====================================
openslp-xcat-1.2.1-1
xCATsn-2.7.2-snap201205030304
xCAT-rmc-2.7.2-snap201205011649
perl-xCAT-2.7.2-snap201205030303
xCAT-client-2.7.2-snap201205030303
xCAT-server-2.7.2-snap201205030303

We set up HA SN on the cluster.The SNs are c250f12c10ap01 and c250f12c12ap01, we split the CNs to two groups according to the primary SNs they are using:
SN10group: primary SN is c250f12c10ap01, backup is c250f12c12ap01
SN12group: primary SN is c250f12c12ap01, backup is c250f12c10ap01

And for the two storage nodes: primary SN is c250f12c12ap01, backup is c250f12c10ap01
[c250mgrs27-pvt][/]> lsdef -l storage
Object name: c250f12c04ap29-hf0
cons=fsp
conserver=10.0.0.137
groups=storage,lpar,all,mc04,SN12storage
hcp=f12cec04
hidden=0
hwtype=lpar
id=29
mac=0200001f0004|0200001f0005|0200001f0006
mgt=fsp
monserver=c250f12c10ap01,c250f12c10ap01-hf0
nodetype=ppc,osi
os=AIX
parent=f12cec04
postbootscripts=otherpkgs
postscripts=syslog,aixremoteshell,syncfiles,configrmcnode,setupntp,confighfi,percs_basic_set_up,add_sec_ids,add_sec_groups,add_sec_ids.CR,paging_on_HASN,setupnfsv4replication
profile=71Ddskls_CSP5_1_IO
provmethod=71Ddskls_CSP5_1_IO
servicenode=c250f12c12ap01,c250f12c10ap01
status=booted
statustime=05-15-2012 13:35:07
xcatmaster=20.12.12.1
Object name: c250f12c06ap29-hf0
cons=fsp
conserver=10.0.0.137
groups=storage,lpar,all,mc04,SN12storage
hcp=f12cec06
hidden=0
hwtype=lpar
id=29
mac=0200002f0004|0200002f0005|0200002f0006
mgt=fsp
monserver=c250f12c10ap01,c250f12c10ap01-hf0
nodetype=ppc,osi
os=AIX
parent=f12cec06
postbootscripts=otherpkgs
postscripts=syslog,aixremoteshell,syncfiles,configrmcnode,setupntp,confighfi,percs_basic_set_up,add_sec_ids,add_sec_groups,add_sec_ids.CR,paging_on_HASN,setupnfsv4replication
profile=71Ddskls_CSP5_1_IO
provmethod=71Ddskls_CSP5_1_IO
servicenode=c250f12c12ap01,c250f12c10ap01
status=booted
statustime=05-15-2012 06:41:02
xcatmaster=20.12.12.1

The litefile and statelite tables:
[c250mgrs27-pvt][/]> tabdump litefile

image,file,options,comments,disable

"ALL","/etc/microcode/","rw",,
"ALL","/gpfslog/","persistent",,
"ALL","/var/adm/ras/gpfslog/","persistent",,
"ALL","/var/adm/ras/errlog","persistent",,
"ALL","/var/mmfs/","persistent",,
"ALL","/var/spool/cron/","persistent",,
"GOLD_71Ddskls_SP3_IO","/etc/basecust","persistent",,
"71Ddskls_SP41_IO","/etc/basecust","persistent",,
"71Ddskls_SP41_IO_1","/etc/basecust","persistent",,
"71Ddskls_CSP5_1_IO","/etc/basecust","persistent",,
[c250mgrs27-pvt][/]> tabdump statelite|grep storage
"storage",,"$noderes.xcatmaster:/install/statelite_data","vers=4",,

The problem was after I booted up storage nodes and did "updatenode storage disableMP_gpfs", the AIX multi-path was disabled(lspv showed only local hdisks), but after I rebooted the storage nodes, the AIX multi-path was enabled again(lspv showed all the hdisks).

[c250mgrs27-pvt][/]> cat /install/postscripts/disableMP_gpfs
/usr/sbin/lsdev -t 001072001410ea0 -F name | xargs -n1 rmdev -Rdl
/usr/bin/manage_disk_drivers -d SAS_SCSD -o AIX_non_MPIO
/usr/sbin/cfgmgr

Norm had worked on the issue for AIX multi-path couldn't been disabled on 03/23. And at that time, the rc.dd_root in the osimage was not correct, so I compared the rc.dd_boot of 71Ddskls_CSP5_1_IO to GOLD_71Ddskls_SP41_IO which had worked well to keep the ODM after disable the multi-path, but they were same:
[c250mgrs27-pvt][/]> ls -l /install/nim/spot/GOLD_71Ddskls_SP41_IO/usr/lib/boot/network/rc.dd_boot
-r-xr-xr-x 1 root system 21277 Apr 23 13:17 /install/nim/spot/GOLD_71Ddskls_SP41_IO/usr/lib/boot/network/rc.dd_boot
[c250mgrs27-pvt][/]> ls -l /install/nim/spot/71Ddskls_CSP5_1_IO/usr/lib/boot/network/rc.dd_boot
-r-xr-xr-x 1 root system 21277 May 15 01:53 /install/nim/spot/71Ddskls_CSP5_1_IO/usr/lib/boot/network/rc.dd_boot
[c250mgrs27-pvt][/]> sum install/nim/spot/GOLD_71Ddskls_Src.dd_boot/lib/boot/network/
50629 21 install/nim/spot/GOLD_71Ddskls_SP41_IO/usr/lib/boot/network/rc.dd_boot
[c250mgrs27-pvt][/]> sum /install/nim/spot/71Ddskls_CSP5_1_IO/usr/lib/boot/network/rc.dd_boot
50629 21 /install/nim/spot/71Ddskls_CSP5_1_IO/usr/lib/boot/network/rc.dd_boot

So it may related with HA SN? Please check it.

Discussion

  • Norm Nott

    Norm Nott - 2012-05-18

    This seems to be covered by bug #3527303

     
  • yan feng han

    yan feng han - 2012-05-25

    With 05/23 build with Norm's 05/24 efix, when making a new image from lpp_source, I didn't hit this problem.