Menu

#4061 DSHCLI.pm bug, merge,execute,executealways, append issue in hierarchy

2.8.4
closed
parallel-cmds
5
2014-05-23
2014-04-11
Arif Ali
No

The DSHCLI.pm, keeps concatenating the src_file in build_merge_rsync

Patch to follow

Discussion

1 2 > >> (Page 1 of 2)
  • Lissa Valletta

    Lissa Valletta - 2014-04-11
    • assigned_to: Lissa Valletta
     
  • Arif Ali

    Arif Ali - 2014-04-11

    Similar to issue here in Apr 2010

    git diff 70050055635291b7469df14385479e36ef2ab39c..51f6b80eabb41a537bdd2d2f7548acfd7e33d893

     
  • Lissa Valletta

    Lissa Valletta - 2014-04-11

    So what is the symptom you were seeing that you found this. I wrote this code a long time ago. Probably has not been used much hierarchically.

     
  • Arif Ali

    Arif Ali - 2014-04-11

    Last time it fixed normal files being synced, this is the same bug, but for merge files, i.e.

    /var/xcat/syncfiles
    

    i.e SNsyncfiledir, is kept being added to the src file, and therefore by the time it has processed the n'th node, we will have /var/xcat/syncfiles pre-pended N times

    below link is the original issue that I reported back then

    https://sourceforge.net/p/xcat/mailman/xcat-user/thread/OF59FF2B9C.05862C8F-ON8525770C.0067E620-8525770C.006808F6@us.ibm.com/

    I hope that makes sense

     
  • Lissa Valletta

    Lissa Valletta - 2014-04-11

    I am a little confused about what we are fixing.

    I ran twice to a computenode and on the servicenode. /var/xcat/syncfiles looks ok
    /var/xcat/syncfiles]> ls
    opt root

    My built rsync file

    !/bin/sh

    /usr/bin/ssh compute-01 '/bin/mkdir -p /var/xcat/node/syncfiles/merge/opt/xcat/share/xcat/scripts /var/xcat/node/syncfiles/merge/mergefiles/root/lissa/merge'

    /usr/bin/rsync --rsync-path /usr/bin/rsync -Liprogtz --out-format=%f%L /var/xcat/syncfiles/opt/xcat/share/xcat/scripts/xdcpmerge.sh /var/xcat/syncfiles/opt/xcat/share/xcat/scripts/xdcpmerge.sh /var/xcat/syncfiles/opt/xcat/share/xcat/scripts/xdcpmerge.sh compute-01:/var/xcat/node/syncfiles/merge/opt/xcat/share/xcat/scripts <- this looks wrong. should only be one, but your patch did not fix this.

    /usr/bin/rsync --rsync-path /usr/bin/rsync -Liprogtz --out-format=%f%L /var/xcat/syncfiles/root/lissa/merge/mergepasswd /var/xcat/syncfiles/root/lissa/merge/mergegroup /var/xcat/syncfiles/root/lissa/merge/mergeshadow compute-01:/var/xcat/node/syncfiles/merge/mergefiles/root/lissa/merge
    ~
    What were you fixing??

     
  • Arif Ali

    Arif Ali - 2014-04-11

    OK, makes sense

    The first node in the list always works. So I had 5 x SN, and the first nodes in the list for each of them were successful, but the remaining nodes had issues.

    i.e. Servicenodes sn01,sn02,sn03,sn04,sn05

    node001,node002,node003 are controlled by sn01
    node004,node005,node006 are controlled by sn02
    node007,node008,node009 are controlled by sn03
    node010,node011,node012 are controlled by sn04
    node013,node014,node015 are controlled by sn05

    In the above example node001, node004, node007, node010 and node013 will have the file successfully merged.

    node002, 5, 8, 11 and 14, will have 1 extra /var/xcat/syncfiles in the src_file, so therefore will fail
    node003, 6, 9, 12, and 15, will have 2 extra /var/xcat/syncfiles in the src_file

    and so on...

    It is the extra concatenating of /var/xcat/syncfiles, is what I am trying to avoid

    I hope that makes sense

     
  • Lissa Valletta

    Lissa Valletta - 2014-04-14

    Got it . I am building this on the servicenode, with extra /var/xcat/syncfile paths
    vi /tmp/rsync_compute-03

    !/bin/sh

    /usr/bin/ssh compute-03 '/bin/mkdir -p /var/xcat/node/syncfiles/merge/opt/xcat/share/xcat/scripts /var/xcat/node/syncfiles/merge/mergefiles/root/lissa/merge'

    /usr/bin/rsync --rsync-path /usr/bin/rsync -Liprogtz --out-format=%f%L /var/xcat/syncfiles/var/xcat/syncfiles/opt/xcat/share/xcat/scripts/xdcpmerge.sh /var/xcat/syncfiles/var/xcat/syncfiles/opt/xcat/share/xcat/scripts/xdcpmerge.sh /var/xcat/syncfiles/var/xcat/syncfiles/opt/xcat/share/xcat/scripts/xdcpmerge.sh compute-03:/var/xcat/node/syncfiles/merge/opt/xcat/share/xcat/scripts

    /usr/bin/rsync --rsync-path /usr/bin/rsync -Liprogtz --out-format=%f%L /var/xcat/syncfiles/var/xcat/syncfiles/root/lissa/merge/mergepasswd /var/xcat/syncfiles/var/xcat/syncfiles/root/lissa/merge/mergegroup /var/xcat/syncfiles/var/xcat/syncfiles/root/lissa/merge/mergeshadow compute-03:/var/xcat/node/syncfiles/merge/mergefiles/root/lissa/merge

     
  • Lissa Valletta

    Lissa Valletta - 2014-04-14

    Now with your fix it looks like

    vi /tmp/rsync_compute-03
    !/bin/sh
    /usr/bin/ssh compute-03 '/bin/mkdir -p /var/xcat/node/syncfiles/merge/opt/xcat/share/xcat/scripts /var/xcat/node/syncfiles/merge/mergefiles/root/lissa/merge'

    /usr/bin/rsync --rsync-path /usr/bin/rsync -Liprogtz --out-format=%f%L /var/xcat/syncfiles/opt/xcat/share/xcat/scripts/xdcpmerge.sh /var/xcat/syncfiles/opt/xcat/share/xcat/scripts/xdcpmerge.sh /var/xcat/syncfiles/opt/xcat/share/xcat/scripts/xdcpmerge.sh compute-03:/var/xcat/node/syncfiles/merge/opt/xcat/share/xcat/scripts

    /usr/bin/rsync --rsync-path /usr/bin/rsync -Liprogtz --out-format=%f%L /var/xcat/syncfiles/root/lissa/merge/mergepasswd /var/xcat/syncfiles/root/lissa/merge/mergegroup /var/xcat/syncfiles/root/lissa/merge/mergeshadow compute-03:/var/xcat/node/syncfiles/merge/mergefiles/root/lissa/merge

    But I still have the problem in the second rsync of building this line with multiple entries for xdcpmerge.sh. I need to fix that. /var/xcat/syncfiles/opt/xcat/share/xcat/scripts/xdcpmerge.sh /var/xcat/syncfiles/opt/xcat/share/xcat/scripts/xdcpmerge.sh /var/xcat/syncfiles/opt/xcat/share/xcat/scripts/xdcpmerge.sh

     
  • Arif Ali

    Arif Ali - 2014-04-15

    ah, ok, I noticed an issues, but I didn't think any of it, when I got loads of messages wrt xdcpmerge.sh.

    I guess 2 for the price of 1, let me know the commit, and then I can incorporate on my customer base

    thanks again for assistance

     
  • Lissa Valletta

    Lissa Valletta - 2014-04-15

    commit 2.8.4
    commit 300bc61da361adf55177c4bedd2553dc2207ae95

    2.9
    commit 553aa59bb1ae545234c2b66f24d969e7a5d9f996

     
  • Lissa Valletta

    Lissa Valletta - 2014-04-15
    • status: open --> pending
     
  • Arif Ali

    Arif Ali - 2014-04-23

    APPEND: now has the same issue i.e. I have the folllowing in one of my /tmp/rsync_<nodename> files

    /usr/bin/rsync --rsync-path /usr/bin/rsync -Liprogtz --out-format=%f%L  /var/xcat/syncfiles/var/xcat/syncfiles/var/xcat/syncfiles/install/syncfiles/sysctl.conf.append idb1a05:/var/xcat/node/syncfiles/append/install/syncfiles
    
     
  • Arif Ali

    Arif Ali - 2014-04-23

    as per the merge issue, before

    --- DSHCLI.pm   2014-04-16 12:58:51.000000000 +0100
    +++ DSHCLI.pm.new       2014-04-23 11:12:22.061047613 +0100
    @@ -5208,6 +5208,7 @@
                   push @::appendlines,$line;
                 }
                 my $src_file  = $1; # append file left of arror
    +            my $orig_src_file  = $1; # append file left of arror
                 # it will be sync'd to $nodesyncfiledir/$append_file
                 my $dest_file = $nodesyncfiledir;
                 $dest_file .= $src_file;  
    @@ -5236,7 +5237,7 @@
                         # to pick up files from /var/xcat/syncfiles...
                         if ($onServiceNode == 1) {
                           my $newsrcfile = $syncdir;    # add SN syndir on front
    -                      $newsrcfile .= $src_file;
    +                      $newsrcfile .= $orig_src_file;
                           $src_file=$newsrcfile;
                         }
                         # destination file name
    
     
  • Lissa Valletta

    Lissa Valletta - 2014-04-23

    Go ahead and commit it in 2.8.4 and master. I put my change in the append function but forgot you original change. The code was copied so I am not surprised.

     
  • Arif Ali

    Arif Ali - 2014-04-23

    Committed
    2.8.4: [ecd697]
    2.9: [065eb7]

     

    Related

    Commit: [065eb7]
    Commit: [ecd697]

  • Arif Ali

    Arif Ali - 2014-04-28

    Hi Lissa,

    There are further issues in this plugin with hierarchy, the EXECUTEALWAYS now doesn't work. The relevant script is not copied over to the SN, and therefore is not able to run on the compute node.

    I will try to debug, and understand where exactly in the code the problem is.

    regards,
    Arif

     
  • Arif Ali

    Arif Ali - 2014-04-28

    Actually further from that, EXECUTE doesn't work either. It doesn't find the files are synchronised to the SN.

    None of the execute scripts are being transferred to the SN into /var/xcat/syncfiles, and therefore errors out

    I tried updatenode <nodenamed> -f, to see if I can get the postscripts sync'd, but that didn't work either.

     
  • Arif Ali

    Arif Ali - 2014-04-29

    As a harsh fix for one of my other customer sites, I have applied the following patch, as the /install is being synchronised using rsync, so it doesn't make a difference.

    https://gitlab.arif-ali.co.uk/arif/xcat-core/commit/06cad710500fb3ce96a809e836da624bc374cdf2

    I have also tested this in the current customer scenario as well, and this resolves the problem for the time being.

    But from what I can see we need to first synchronise the postscripts using xdcp and then run the scripts.

    This I think is similar to how the build_append_rsync and build_merge_rsync are done.

    What do you think?

     
  • Lissa Valletta

    Lissa Valletta - 2014-04-29

    Let me look at this also. Are you sure the synclist file is created correctly. This use to work fine and I don't think the changes we made would have affected it, but I have not tried it in a while.
    Note: for EXECUTE you must have the file to exectute in the synclist. It must be name filename.post and be executalbe. For EXECUTEALWAYS, you also must have the script in the synclist, see /tmp/myscript1 below. It will always execute. Make sure it is executable.
    I guess it would be nice to see the synclist file that is not working. But I will test this morning.

    This is the example
    /tmp/share/file2 -> /tmp/file2
    /tmp/share/file2.post -> /tmp/file2.post (required for hierarchical clusters)
    /tmp/share/file3 -> /tmp/file3
    /tmp/share/file3.post -> /tmp/file3.post (required for hierarchical clusters)
    /tmp/myscript1 -> /tmp/myscript1
    /tmp/myscript2 -> /tmp/myscript2

    EXECUTE:
    /tmp/share/file2.post
    /tmp/share/file3.post
    EXECUTEALWAYS: ( only in 2.8 and later)
    /tmp/myscript1
    /tmp/myscript2

     

    Last edit: Lissa Valletta 2014-04-29
  • Lissa Valletta

    Lissa Valletta - 2014-04-29

    So I am getting a failure with my testcase with the latest 2.8.4 build. Let me debug. Mine did sync all the data to the servicenode though.

    xdcp compute-01 -F /root/lissa/sync/synclist
    Error: xdsh plugin bug, pid 11462, process description: 'xCATd SSL: xdcp for manage-02@manage-02: xdsh instance: locally executing' with error 'Can't use an undefined value as an ARRAY reference at /opt/xcat/lib/perl/xCAT/DSHCLI.pm line 6047.
    ' while trying to fulfill request for the following nodes: compute-01

     
  • Lissa Valletta

    Lissa Valletta - 2014-04-29
    • status: pending --> open
     
  • Lissa Valletta

    Lissa Valletta - 2014-04-29
    • labels: sync files --> sync files, xdcp
    • summary: DSHCLI.pm bug, merge issue in hierarchy --> DSHCLI.pm bug, merge,execute,executealways, append issue in hierarchy
    • component: updatenode --> parallel-cmds
     
  • Arif Ali

    Arif Ali - 2014-04-29

    So my file below is therefore wrong

    /install/syncfiles/gmond.conf.nextscale -> /etc/ganglia/gmond.conf
    /install/syncfiles/cpuspeed.compute -> /etc/sysconfig/cpuspeed
    /etc/profile.d/modules.* -> /etc/profile.d/
    /etc/custom/*.modules -> /etc/custom/
    /etc/{hosts.equiv,hosts} -> /etc/
    MERGE:
    /install/syncfiles/group.merge -> /etc/group
    /install/syncfiles/passwd.merge -> /etc/passwd
    /install/syncfiles/shadow.merge -> /etc/shadow
    APPEND:
    /install/syncfiles/sysctl.conf.append -> /etc/sysctl.conf
    EXECUTE:
    /install/syncfiles/gmond.conf.nextscale.post
    /install/syncfiles/cpuspeed.compute.post
    /install/syncfiles/sysctl.conf.append.post
    EXECUTEALWAYS:
    /install/syncfiles/test.sh
    

    i.e. the files

    /install/syncfiles/gmond.conf.nextscale.post
    /install/syncfiles/cpuspeed.compute.post
    /install/syncfiles/sysctl.conf.append.post
    

    also need to be synchronised as well, in the previous customer. below and including xCAT 2.8.0, this process was working (they have a flat network). and that commented out code didn't exist. and as the /install was synchronised on the SN, we had no problem with hierarchy.

    So therefore the assumption is that, any EXECUTE and EXECUTEALWAYS scripts need to be synchronised as part of the synclist to the CN as well?

    (For me) it doesn't make sense to have to include the file twice; but "hey" that's my opinion.

     
  • Lissa Valletta

    Lissa Valletta - 2014-04-29

    Yes you are correct EXECUTE and EXECUTEALWAYS need to be part of the synclist. I think it might work non-hiearchical not to have them, but best to just put them in.

    /install/syncfiles/gmond.conf.nextscale -> /etc/ganglia/gmond.conf
    /install/syncfiles/cpuspeed.compute -> /etc/sysconfig/cpuspeed
    /etc/profile.d/modules. -> /etc/profile.d/
    /etc/custom/
    .modules -> /etc/custom/
    /etc/{hosts.equiv,hosts} -> /etc/
    /install/syncfiles/gmond.conf.nextscale.post -> /install/syncfiles/gmond.conf.nextscale.post
    /install/syncfiles/cpuspeed.compute.post -> /install/syncfiles/cpuspeed.compute.post
    /install/syncfiles/sysctl.conf.append.post -> /install/syncfiles/sysctl.conf.append.post
    /install/syncfiles/test.sh -> /install/syncfiles/test.sh
    MERGE:
    /install/syncfiles/group.merge -> /etc/group
    /install/syncfiles/passwd.merge -> /etc/passwd
    /install/syncfiles/shadow.merge -> /etc/shadow
    APPEND:
    /install/syncfiles/sysctl.conf.append -> /etc/sysctl.conf
    EXECUTE:
    /install/syncfiles/gmond.conf.nextscale.post
    /install/syncfiles/cpuspeed.compute.post
    /install/syncfiles/sysctl.conf.append.post
    EXECUTEALWAYS:
    /install/syncfiles/test.sh

    Do these work? I did not think they are a supported syntax. You can have /etc/profile.d/* but just never tried these.

    /etc/profile.d/modules. -> /etc/profile.d/
    /etc/custom/
    .modules -> /etc/custom/
    /etc/{hosts.equiv,hosts} -> /etc/

    Here are the supported syntaxes
    https://sourceforge.net/apps/mediawiki/xcat/index.php?title=Sync-ing_Config_Files_to_Nodes

     
1 2 > >> (Page 1 of 2)