The ospkgs and otherpkgs cannot support to have "@" (for group) in the configuration file. (for 2.5)
Current xCAT code has the problem to get the path of "pkglist" (for ospkgs), "otherpkgs.pkglist" (for otherpkgs) and "synclist" (for syncfile) for diskless node. The symptom is that for diskless node, the configuration files in previous cannot be found. The reason for this is xCAT cannot get the correct stat of the node. For diskless, it should be "netboot", but in following case it will be "install". So the work around for this to copy the configuration file from /install/custom/netboot/ to /install/custom/install/. It will be fixed in 2.5 and trunk.
The impacted cases:
If using netboot=xnba : netboot, statelite
If using netboot=pxe|yaboot : statelite
Make sure this is fixed in 2.5.3, 2.6.1 and 2.6.6. This is very high priority.
Does this affect p-series diskless install also and what release broke the code for
2) above
I am running the latest 2.6.6 on my Fedora machine x-series and do not hit the issue of not find the synclists files. Is the trunk really broken.
So I thought the problem would be in the SvrUtils->getsynclistfile routine but i ran the one from 2.5 and from 2.6 on my Fedora machine and had no problems. It returned the correct path. Where is the problem?
rra000-m' => '/install/custom/netboot/fedora/service.fedora9.x86_64.synclist
The sync list problem does not exist for all configuration scenarios, it only occurs in some specific scenarios, I remembered that the pxe boot is a scenario in which this problem exists. so I am not sure if this is a priority 7 bug, generally speaking a priority 7 bug will prevent the build being released.
We had a lengthy talk about this problem yesterday, we agree that this problem should be fixed in xCAT 2.5 and 2.6 stream, but since xCAT 2.5.3 has released yesterday and xCAT 2.6.1 final build is made also, also considering this is a problem that has been there for a long while, we plan to checkin code changes after 2.5.3 GA and 2.6.1 GA and the make a snapshot build for 2.5.4 and 2.6.2, just in case some customer complains, we can point the customer to the snapshot build.
2.5.3 GA was Friday, but I am not sure we should GA 2.6.1 without the fix. LRZ is running x-series diskless installs and are installing the machine now. If they are affected, we need a fix for them immediately. Can you describe the exact circumstances we hit the problem? Is the fix significant that we cannot get it into 2.6.1. I made it a priority 7 because I do not understand how many customers would be affected.
By the way, my Fedora machine running 2.6.6 is a Pxe x-series diskless install of the service node and I do not hit the problem. I even put SrvUtils from 2.5.2 on it and could still updatenode -F successfully. It would be good if we could create the problem there so we could document the exact scenario and setup, and may be come up with a better workaround. I think moving the netboot files to the install directory will not be acceptable to some, especially if they have not stateless and statefull installs.
If you have noderes.netboot=xnba, then the search algorithm for location of pkglist, otherpkg, synclist does not work.
Two options: put those packages in the /install/..../install directory instead of the /install/.../netboot directory.
Change noderes.netboot=pxe.
Broken in 2.5.3, 2.6.1 and 2.6.6.
Add xnba to the description in Schema.pm for noderes table netboot attribute also and if we support any other methods.
Has been fixed in 2.5,2.6 and trunk.