Last time it fixed normal files being synced, this is the same bug, but for merge files, i.e.
/var/xcat/syncfiles
i.e SNsyncfiledir, is kept being added to the src file, and therefore by the time it has processed the n'th node, we will have /var/xcat/syncfiles pre-pended N times
below link is the original issue that I reported back then
/usr/bin/rsync --rsync-path /usr/bin/rsync -Liprogtz --out-format=%f%L /var/xcat/syncfiles/opt/xcat/share/xcat/scripts/xdcpmerge.sh /var/xcat/syncfiles/opt/xcat/share/xcat/scripts/xdcpmerge.sh /var/xcat/syncfiles/opt/xcat/share/xcat/scripts/xdcpmerge.sh compute-01:/var/xcat/node/syncfiles/merge/opt/xcat/share/xcat/scripts <- this looks wrong. should only be one, but your patch did not fix this.
/usr/bin/rsync --rsync-path /usr/bin/rsync -Liprogtz --out-format=%f%L /var/xcat/syncfiles/root/lissa/merge/mergepasswd /var/xcat/syncfiles/root/lissa/merge/mergegroup /var/xcat/syncfiles/root/lissa/merge/mergeshadow compute-01:/var/xcat/node/syncfiles/merge/mergefiles/root/lissa/merge
~
What were you fixing??
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The first node in the list always works. So I had 5 x SN, and the first nodes in the list for each of them were successful, but the remaining nodes had issues.
i.e. Servicenodes sn01,sn02,sn03,sn04,sn05
node001,node002,node003 are controlled by sn01
node004,node005,node006 are controlled by sn02
node007,node008,node009 are controlled by sn03
node010,node011,node012 are controlled by sn04
node013,node014,node015 are controlled by sn05
In the above example node001, node004, node007, node010 and node013 will have the file successfully merged.
node002, 5, 8, 11 and 14, will have 1 extra /var/xcat/syncfiles in the src_file, so therefore will fail
node003, 6, 9, 12, and 15, will have 2 extra /var/xcat/syncfiles in the src_file
and so on...
It is the extra concatenating of /var/xcat/syncfiles, is what I am trying to avoid
I hope that makes sense
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
But I still have the problem in the second rsync of building this line with multiple entries for xdcpmerge.sh. I need to fix that. /var/xcat/syncfiles/opt/xcat/share/xcat/scripts/xdcpmerge.sh /var/xcat/syncfiles/opt/xcat/share/xcat/scripts/xdcpmerge.sh /var/xcat/syncfiles/opt/xcat/share/xcat/scripts/xdcpmerge.sh
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
--- DSHCLI.pm 2014-04-16 12:58:51.000000000 +0100
+++ DSHCLI.pm.new 2014-04-23 11:12:22.061047613 +0100
@@ -5208,6 +5208,7 @@
push @::appendlines,$line;
}
my $src_file = $1; # append file left of arror
+ my $orig_src_file = $1; # append file left of arror
# it will be sync'd to $nodesyncfiledir/$append_file
my $dest_file = $nodesyncfiledir;
$dest_file .= $src_file;
@@ -5236,7 +5237,7 @@
# to pick up files from /var/xcat/syncfiles...
if ($onServiceNode == 1) {
my $newsrcfile = $syncdir; # add SN syndir on front
- $newsrcfile .= $src_file;
+ $newsrcfile .= $orig_src_file;
$src_file=$newsrcfile;
}
# destination file name
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Go ahead and commit it in 2.8.4 and master. I put my change in the append function but forgot you original change. The code was copied so I am not surprised.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
There are further issues in this plugin with hierarchy, the EXECUTEALWAYS now doesn't work. The relevant script is not copied over to the SN, and therefore is not able to run on the compute node.
I will try to debug, and understand where exactly in the code the problem is.
regards,
Arif
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
As a harsh fix for one of my other customer sites, I have applied the following patch, as the /install is being synchronised using rsync, so it doesn't make a difference.
Let me look at this also. Are you sure the synclist file is created correctly. This use to work fine and I don't think the changes we made would have affected it, but I have not tried it in a while.
Note: for EXECUTE you must have the file to exectute in the synclist. It must be name filename.post and be executalbe. For EXECUTEALWAYS, you also must have the script in the synclist, see /tmp/myscript1 below. It will always execute. Make sure it is executable.
I guess it would be nice to see the synclist file that is not working. But I will test this morning.
This is the example
/tmp/share/file2 -> /tmp/file2
/tmp/share/file2.post -> /tmp/file2.post (required for hierarchical clusters)
/tmp/share/file3 -> /tmp/file3
/tmp/share/file3.post -> /tmp/file3.post (required for hierarchical clusters)
/tmp/myscript1 -> /tmp/myscript1
/tmp/myscript2 -> /tmp/myscript2
EXECUTE:
/tmp/share/file2.post
/tmp/share/file3.post
EXECUTEALWAYS: ( only in 2.8 and later)
/tmp/myscript1
/tmp/myscript2
Last edit: Lissa Valletta 2014-04-29
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
So I am getting a failure with my testcase with the latest 2.8.4 build. Let me debug. Mine did sync all the data to the servicenode though.
xdcp compute-01 -F /root/lissa/sync/synclist
Error: xdsh plugin bug, pid 11462, process description: 'xCATd SSL: xdcp for manage-02@manage-02: xdsh instance: locally executing' with error 'Can't use an undefined value as an ARRAY reference at /opt/xcat/lib/perl/xCAT/DSHCLI.pm line 6047.
' while trying to fulfill request for the following nodes: compute-01
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
also need to be synchronised as well, in the previous customer. below and including xCAT 2.8.0, this process was working (they have a flat network). and that commented out code didn't exist. and as the /install was synchronised on the SN, we had no problem with hierarchy.
So therefore the assumption is that, any EXECUTE and EXECUTEALWAYS scripts need to be synchronised as part of the synclist to the CN as well?
(For me) it doesn't make sense to have to include the file twice; but "hey" that's my opinion.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes you are correct EXECUTE and EXECUTEALWAYS need to be part of the synclist. I think it might work non-hiearchical not to have them, but best to just put them in.
The following fixes the problem for me, but there may be a better way of doing it
http://gitlab.ocf.co.uk/aali/xcat-core/commit/bf0f8dc26e37d8a898d9d7907be125f84dcdc149
Similar to issue here in Apr 2010
git diff 70050055635291b7469df14385479e36ef2ab39c..51f6b80eabb41a537bdd2d2f7548acfd7e33d893
So what is the symptom you were seeing that you found this. I wrote this code a long time ago. Probably has not been used much hierarchically.
Last time it fixed normal files being synced, this is the same bug, but for merge files, i.e.
i.e
SNsyncfiledir
, is kept being added to the src file, and therefore by the time it has processed the n'th node, we will have/var/xcat/syncfiles
pre-pended N timesbelow link is the original issue that I reported back then
https://sourceforge.net/p/xcat/mailman/xcat-user/thread/OF59FF2B9C.05862C8F-ON8525770C.0067E620-8525770C.006808F6@us.ibm.com/
I hope that makes sense
I am a little confused about what we are fixing.
I ran twice to a computenode and on the servicenode. /var/xcat/syncfiles looks ok
/var/xcat/syncfiles]> ls
opt root
My built rsync file
!/bin/sh
/usr/bin/ssh compute-01 '/bin/mkdir -p /var/xcat/node/syncfiles/merge/opt/xcat/share/xcat/scripts /var/xcat/node/syncfiles/merge/mergefiles/root/lissa/merge'
/usr/bin/rsync --rsync-path /usr/bin/rsync -Liprogtz --out-format=%f%L /var/xcat/syncfiles/opt/xcat/share/xcat/scripts/xdcpmerge.sh /var/xcat/syncfiles/opt/xcat/share/xcat/scripts/xdcpmerge.sh /var/xcat/syncfiles/opt/xcat/share/xcat/scripts/xdcpmerge.sh compute-01:/var/xcat/node/syncfiles/merge/opt/xcat/share/xcat/scripts <- this looks wrong. should only be one, but your patch did not fix this.
/usr/bin/rsync --rsync-path /usr/bin/rsync -Liprogtz --out-format=%f%L /var/xcat/syncfiles/root/lissa/merge/mergepasswd /var/xcat/syncfiles/root/lissa/merge/mergegroup /var/xcat/syncfiles/root/lissa/merge/mergeshadow compute-01:/var/xcat/node/syncfiles/merge/mergefiles/root/lissa/merge
~
What were you fixing??
OK, makes sense
The first node in the list always works. So I had 5 x SN, and the first nodes in the list for each of them were successful, but the remaining nodes had issues.
i.e. Servicenodes sn01,sn02,sn03,sn04,sn05
node001,node002,node003 are controlled by sn01
node004,node005,node006 are controlled by sn02
node007,node008,node009 are controlled by sn03
node010,node011,node012 are controlled by sn04
node013,node014,node015 are controlled by sn05
In the above example node001, node004, node007, node010 and node013 will have the file successfully merged.
node002, 5, 8, 11 and 14, will have 1 extra /var/xcat/syncfiles in the src_file, so therefore will fail
node003, 6, 9, 12, and 15, will have 2 extra /var/xcat/syncfiles in the src_file
and so on...
It is the extra concatenating of /var/xcat/syncfiles, is what I am trying to avoid
I hope that makes sense
Got it . I am building this on the servicenode, with extra /var/xcat/syncfile paths
vi /tmp/rsync_compute-03
!/bin/sh
/usr/bin/ssh compute-03 '/bin/mkdir -p /var/xcat/node/syncfiles/merge/opt/xcat/share/xcat/scripts /var/xcat/node/syncfiles/merge/mergefiles/root/lissa/merge'
/usr/bin/rsync --rsync-path /usr/bin/rsync -Liprogtz --out-format=%f%L /var/xcat/syncfiles/var/xcat/syncfiles/opt/xcat/share/xcat/scripts/xdcpmerge.sh /var/xcat/syncfiles/var/xcat/syncfiles/opt/xcat/share/xcat/scripts/xdcpmerge.sh /var/xcat/syncfiles/var/xcat/syncfiles/opt/xcat/share/xcat/scripts/xdcpmerge.sh compute-03:/var/xcat/node/syncfiles/merge/opt/xcat/share/xcat/scripts
/usr/bin/rsync --rsync-path /usr/bin/rsync -Liprogtz --out-format=%f%L /var/xcat/syncfiles/var/xcat/syncfiles/root/lissa/merge/mergepasswd /var/xcat/syncfiles/var/xcat/syncfiles/root/lissa/merge/mergegroup /var/xcat/syncfiles/var/xcat/syncfiles/root/lissa/merge/mergeshadow compute-03:/var/xcat/node/syncfiles/merge/mergefiles/root/lissa/merge
Now with your fix it looks like
vi /tmp/rsync_compute-03
!/bin/sh
/usr/bin/ssh compute-03 '/bin/mkdir -p /var/xcat/node/syncfiles/merge/opt/xcat/share/xcat/scripts /var/xcat/node/syncfiles/merge/mergefiles/root/lissa/merge'
/usr/bin/rsync --rsync-path /usr/bin/rsync -Liprogtz --out-format=%f%L /var/xcat/syncfiles/opt/xcat/share/xcat/scripts/xdcpmerge.sh /var/xcat/syncfiles/opt/xcat/share/xcat/scripts/xdcpmerge.sh /var/xcat/syncfiles/opt/xcat/share/xcat/scripts/xdcpmerge.sh compute-03:/var/xcat/node/syncfiles/merge/opt/xcat/share/xcat/scripts
/usr/bin/rsync --rsync-path /usr/bin/rsync -Liprogtz --out-format=%f%L /var/xcat/syncfiles/root/lissa/merge/mergepasswd /var/xcat/syncfiles/root/lissa/merge/mergegroup /var/xcat/syncfiles/root/lissa/merge/mergeshadow compute-03:/var/xcat/node/syncfiles/merge/mergefiles/root/lissa/merge
But I still have the problem in the second rsync of building this line with multiple entries for xdcpmerge.sh. I need to fix that. /var/xcat/syncfiles/opt/xcat/share/xcat/scripts/xdcpmerge.sh /var/xcat/syncfiles/opt/xcat/share/xcat/scripts/xdcpmerge.sh /var/xcat/syncfiles/opt/xcat/share/xcat/scripts/xdcpmerge.sh
ah, ok, I noticed an issues, but I didn't think any of it, when I got loads of messages wrt xdcpmerge.sh.
I guess 2 for the price of 1, let me know the commit, and then I can incorporate on my customer base
thanks again for assistance
commit 2.8.4
commit 300bc61da361adf55177c4bedd2553dc2207ae95
2.9
commit 553aa59bb1ae545234c2b66f24d969e7a5d9f996
APPEND: now has the same issue i.e. I have the folllowing in one of my /tmp/rsync_<nodename> files
as per the merge issue, before
Go ahead and commit it in 2.8.4 and master. I put my change in the append function but forgot you original change. The code was copied so I am not surprised.
Committed
2.8.4: [ecd697]
2.9: [065eb7]
Related
Commit: [065eb7]
Commit: [ecd697]
Hi Lissa,
There are further issues in this plugin with hierarchy, the EXECUTEALWAYS now doesn't work. The relevant script is not copied over to the SN, and therefore is not able to run on the compute node.
I will try to debug, and understand where exactly in the code the problem is.
regards,
Arif
Actually further from that, EXECUTE doesn't work either. It doesn't find the files are synchronised to the SN.
None of the execute scripts are being transferred to the SN into
/var/xcat/syncfiles
, and therefore errors outI tried
updatenode <nodenamed> -f
, to see if I can get the postscripts sync'd, but that didn't work either.As a harsh fix for one of my other customer sites, I have applied the following patch, as the
/install
is being synchronised using rsync, so it doesn't make a difference.https://gitlab.arif-ali.co.uk/arif/xcat-core/commit/06cad710500fb3ce96a809e836da624bc374cdf2
I have also tested this in the current customer scenario as well, and this resolves the problem for the time being.
But from what I can see we need to first synchronise the postscripts using xdcp and then run the scripts.
This I think is similar to how the
build_append_rsync
andbuild_merge_rsync
are done.What do you think?
Let me look at this also. Are you sure the synclist file is created correctly. This use to work fine and I don't think the changes we made would have affected it, but I have not tried it in a while.
Note: for EXECUTE you must have the file to exectute in the synclist. It must be name filename.post and be executalbe. For EXECUTEALWAYS, you also must have the script in the synclist, see /tmp/myscript1 below. It will always execute. Make sure it is executable.
I guess it would be nice to see the synclist file that is not working. But I will test this morning.
This is the example
/tmp/share/file2 -> /tmp/file2
/tmp/share/file2.post -> /tmp/file2.post (required for hierarchical clusters)
/tmp/share/file3 -> /tmp/file3
/tmp/share/file3.post -> /tmp/file3.post (required for hierarchical clusters)
/tmp/myscript1 -> /tmp/myscript1
/tmp/myscript2 -> /tmp/myscript2
EXECUTE:
/tmp/share/file2.post
/tmp/share/file3.post
EXECUTEALWAYS: ( only in 2.8 and later)
/tmp/myscript1
/tmp/myscript2
Last edit: Lissa Valletta 2014-04-29
So I am getting a failure with my testcase with the latest 2.8.4 build. Let me debug. Mine did sync all the data to the servicenode though.
xdcp compute-01 -F /root/lissa/sync/synclist
Error: xdsh plugin bug, pid 11462, process description: 'xCATd SSL: xdcp for manage-02@manage-02: xdsh instance: locally executing' with error 'Can't use an undefined value as an ARRAY reference at /opt/xcat/lib/perl/xCAT/DSHCLI.pm line 6047.
' while trying to fulfill request for the following nodes: compute-01
So my file below is therefore wrong
i.e. the files
also need to be synchronised as well, in the previous customer. below and including xCAT 2.8.0, this process was working (they have a flat network). and that commented out code didn't exist. and as the
/install
was synchronised on the SN, we had no problem with hierarchy.So therefore the assumption is that, any EXECUTE and EXECUTEALWAYS scripts need to be synchronised as part of the synclist to the CN as well?
(For me) it doesn't make sense to have to include the file twice; but "hey" that's my opinion.
Yes you are correct EXECUTE and EXECUTEALWAYS need to be part of the synclist. I think it might work non-hiearchical not to have them, but best to just put them in.
/install/syncfiles/gmond.conf.nextscale -> /etc/ganglia/gmond.conf
/install/syncfiles/cpuspeed.compute -> /etc/sysconfig/cpuspeed
/etc/profile.d/modules. -> /etc/profile.d/
/etc/custom/.modules -> /etc/custom/
/etc/{hosts.equiv,hosts} -> /etc/
/install/syncfiles/gmond.conf.nextscale.post -> /install/syncfiles/gmond.conf.nextscale.post
/install/syncfiles/cpuspeed.compute.post -> /install/syncfiles/cpuspeed.compute.post
/install/syncfiles/sysctl.conf.append.post -> /install/syncfiles/sysctl.conf.append.post
/install/syncfiles/test.sh -> /install/syncfiles/test.sh
MERGE:
/install/syncfiles/group.merge -> /etc/group
/install/syncfiles/passwd.merge -> /etc/passwd
/install/syncfiles/shadow.merge -> /etc/shadow
APPEND:
/install/syncfiles/sysctl.conf.append -> /etc/sysctl.conf
EXECUTE:
/install/syncfiles/gmond.conf.nextscale.post
/install/syncfiles/cpuspeed.compute.post
/install/syncfiles/sysctl.conf.append.post
EXECUTEALWAYS:
/install/syncfiles/test.sh
Do these work? I did not think they are a supported syntax. You can have /etc/profile.d/* but just never tried these.
/etc/profile.d/modules. -> /etc/profile.d/
/etc/custom/.modules -> /etc/custom/
/etc/{hosts.equiv,hosts} -> /etc/
Here are the supported syntaxes
https://sourceforge.net/apps/mediawiki/xcat/index.php?title=Sync-ing_Config_Files_to_Nodes