Menu

Bug in ncremap/ncks in 4.6.1

Developers
Ben
2016-10-27
2016-10-28
  • Ben

    Ben - 2016-10-27

    We have been trying to use the ncremap function in 4.6.1, we had been using it just fine in 4.6.0 but now it appears to be broken.
    We are trying to run this command
    ncremap -s SCRIP_INPUT -i INPUT_FILE -E "--line_type greatcircle" -d DESTINATION_FILE -o OUTPUT_FILE

    Like so, notice there is all this "message is" output?

    tmp/test_esmf_regrid> ncremap -s ~/noback/Cubed_Sphere_Grids/PE2880x17280-CF.nc4 -i TOTPRES.c2880_1d.nc4 -E "--line_type greatcircle" -d example_nr_file.nc4 -o output.nc4 -D 3
    dbg: alg_opt = bilinear
    dbg: cln_flg = Yes
    dbg: dbg_lvl = 3
    dbg: drc_in = /home/bmauer/noback/tmp/test_esmf_regrid
    dbg: drc_out = .
    dbg: drc_tmp = /gpfsm/dnb02/tdirs/login/discover18.16366.bmauer
    dbg: dst_fl = example_nr_file.nc4
    dbg: gaa_sng = --gaa remap_script=ncremap --gaa remap_hostname=discover18.prv.cube --gaa remap_version="4.6.1"
    message
    is
    dbg: grd_dst = /gpfsm/dnb02/tdirs/login/discover18.16366.bmauer/ncremap_tmp_grd_dst.nc.pid18035
    dbg: grd_sng = --rgr grd_ttl='Default internally-generated grid' --rgr grid=/gpfsm/dnb02/tdirs/login/discover18.16366.bmauer/ncremap_tmp_grd_dst.nc.pid18035 --rgr latlon=100,100 --rgr snwe=30.0,70.0,-130.0,-90.0
    dbg: grd_src = /home/bmauer/noback/Cubed_Sphere_Grids/PE2880x17280-CF.nc4
    dbg: hdr_pad = 1000
    dbg: job_nbr = 2
    dbg: in_fl = TOTPRES.c2880_1d.nc4
    dbg: map_fl = /gpfsm/dnb02/tdirs/login/discover18.16366.bmauer/ncremap_tmp_map_esmf_bilinear.nc.pid18035
    dbg: map_mk = Yes
    dbg: mlt_map = Yes
    dbg: mpi_flg = No
    dbg: nco_opt = -D 3 -O --no_tmp_fl --gaa remap_script=ncremap --gaa remap_hostname=discover18.prv.cube --gaa remap_version="4.6.1"
    message
    is --hdr_pad=1000
    dbg: nd_nbr = 1
    dbg: out_fl = output.nc4
    dbg: par_typ = nil
    dbg: spt_pid = 18035
    dbg: thr_nbr = 2
    dbg: unq_sfx = .pid18035
    dbg: var_lst =
    dbg: var_rgr =
    dbg: wgt_usr =
    Asked to regrid 1 files:
    TOTPRES.c2880_1d.nc4
    NCO regridder invoked with command:
    ncremap -s /home/bmauer/noback/Cubed_Sphere_Grids/PE2880x17280-CF.nc4 -i TOTPRES.c2880_1d.nc4 -E --line_type greatcircle -d example_nr_file.nc4 -o output.nc4 -D 3
    Started processing at Thu Oct 27 16:12:43 EDT 2016.
    Running remap script ncremap from directory /gpfsm/dswdev/mathomp4/Baselibs/GMAO-Baselibs-5_0_2/x86_64-unknown-linux-gnu/ifort_16.0.3.210-intelmpi_5.1.3.210/Linux/bin
    NCO version "4.6.1"
    message
    is from directory /gpfsm/dswdev/mathomp4/Baselibs/GMAO-Baselibs-5_0_2/x86_64-unknown-linux-gnu/ifort_16.0.3.210-intelmpi_5.1.3.210/Linux/bin
    Input files in or relative to directory /home/bmauer/noback/tmp/test_esmf_regrid
    Intermediate/temporary files written to directory /gpfsm/dnb02/tdirs/login/discover18.16366.bmauer
    Output files to directory .
    Destination grid will be inferred from data-file
    ncks -D 3 -O --no_tmp_fl --gaa remap_script=ncremap --gaa remap_hostname=discover18.prv.cube --gaa remap_version="4.6.1" message is --hdr_pad=1000 --rgr nfr=y --rgr grid=/gpfsm/dnb02/tdirs/login/discover18.16366.bmauer/ncremap_tmp_grd_dst.nc.pid18035 example_nr_file.nc4 /gpfsm/dnb02/tdirs/login/discover18.16366.bmauer/ncremap_grd_tmp.nc.pid18035
    ncks: ERROR received 4 filenames; need no more than two

    where as with nco 4.6.0

    /ford1/share/gmao_SIteam/Baselibs/TmpBaselibs/GMAO-Baselibs-5_0_1_with_NCO460/x86_64-unknown-linux-gnu/gfortran_6.1.0-openmpi_1.10.2/Linux/bin/ncremap -i moist_72.nc4 -d pchem_144.nc4 -o yaya.nc4 -D 3
    dbg: alg_opt = bilinear
    dbg: cln_flg = Yes
    dbg: dbg_lvl = 3
    dbg: drc_in = /home/mathomp4/ncremap
    dbg: drc_out = .
    dbg: drc_tmp = /tmp
    dbg: dst_fl = pchem_144.nc4
    dbg: gaa_sng = --gaa remap_script=ncremap --gaa remap_hostname=anvil.gsfc.nasa.gov --gaa remap_version="4.6.0"
    dbg: grd_dst = /tmp/ncremap_tmp_grd_dst.nc.pid22640
    dbg: grd_sng = --rgr grd_ttl='Default internally-generated grid' --rgr grid=/tmp/ncremap_tmp_grd_dst.nc.pid22640 --rgr latlon=100,100 --rgr snwe=30.0,70.0,-130.0,-90.0
    dbg: grd_src = /tmp/ncremap_tmp_grd_src.nc.pid22640
    dbg: hdr_pad = 1000
    dbg: job_nbr = 2
    dbg: in_fl = moist_72.nc4
    dbg: map_fl = /tmp/ncremap_tmp_map_esmf_bilinear.nc.pid22640
    dbg: map_mk = Yes
    dbg: mlt_map = Yes
    dbg: mpi_flg = No
    dbg: nco_opt = -D 3 -O --no_tmp_fl --gaa remap_script=ncremap --gaa remap_hostname=anvil.gsfc.nasa.gov --gaa remap_version="4.6.0" --hdr_pad=1000
    dbg: nd_nbr = 1
    dbg: out_fl = yaya.nc4
    dbg: par_typ = nil
    dbg: spt_pid = 22640
    dbg: thr_nbr = 2
    dbg: unq_sfx = .pid22640
    dbg: var_lst =
    dbg: var_rgr =
    dbg: wgt_usr =
    Asked to regrid 1 files:
    moist_72.nc4
    NCO regridder invoked with command:
    ncremap -i moist_72.nc4 -d pchem_144.nc4 -o yaya.nc4 -D 3
    ncremap: Removing PET0.RegridWeightGen.Log file from current directory before running
    Started processing at Thu Oct 27 16:01:41 EDT 2016.
    NCO ncremap version is "4.6.0"
    Destination grid will be inferred from data-file
    ncks -D 3 -O --no_tmp_fl --gaa remap_script=ncremap --gaa remap_hostname=anvil.gsfc.nasa.gov --gaa remap_version="4.6.0" --hdr_pad=1000 --rgr nfr=y --rgr grid=/tmp/ncremap_tmp_grd_dst.nc.pid22640 pchem_144.nc4 /tmp/ncremap_grd_tmp.nc.pid22640

     
  • Charlie Zender

    Charlie Zender - 2016-10-27

    As a first test, please download and try the current ncremap
    http://dust.ess.uci.edu/tmp/ncremap
    and let us know how well it works...
    cz

     
  • Matthew Thompson

    Charlie,

    Nope. That didn't solve it. Same issue.

    I tried out some different tests on my end to see if I'm building NCO oddly. First, I built a Baselibs that doesn't have the dashes in it that seemed to trigger this: https://sourceforge.net/p/nco/discussion/9830/thread/8409f58d/?limit=25#372b.

    That did not help. I then built a Baselibs but reverted back to NCO 4.6.0. That worked! That's a data point!

    Next, is it ncremap? Well, no! I copied the ncremap from 4.6.0 and 4.6.1 as well as your latest one. If the ncks it finds in the path is from 4.6.0, success, if it's from 4.6.1, failure. The ncremap script itself always works.

    So, it looks like ncks is the culprit. Honestly, this Baselibs version isn't in production yet, so probably no one tried to run ncks yet, or, if they did, never with the options that ncremap requires that is triggering this.

    Charlie, let me know what you'd like me to try now.

    For some more info, from a good run with -D 3 I see:

    dbg: gaa_sng  = --gaa remap_script=ncremap --gaa remap_hostname=anvil.gsfc.nasa.gov --gaa remap_version="4.6.0"
    dbg: grd_dst  = /tmp/ncremap_tmp_grd_dst.nc.pid32512
    dbg: grd_sng  = --rgr grd_ttl='Default internally-generated grid' --rgr grid=/tmp/ncremap_tmp_grd_dst.nc.pid32512 --rgr latlon=100,100 --rgr snwe=30.0,70.0,-130.0,-90.0
    dbg: grd_src  = /tmp/ncremap_tmp_grd_src.nc.pid32512
    dbg: hdr_pad  = 1000
    

    From a bad run:

    dbg: gaa_sng  = --gaa remap_script=ncremap --gaa remap_hostname=anvil.gsfc.nasa.gov --gaa remap_version="4.6.1"
    message
    is
    dbg: grd_dst  = /tmp/ncremap_tmp_grd_dst.nc.pid341
    dbg: grd_sng  = --rgr grd_ttl='Default internally-generated grid' --rgr grid=/tmp/ncremap_tmp_grd_dst.nc.pid341 --rgr latlon=100,100 --rgr snwe=30.0,70.0,-130.0,-90.0
    dbg: grd_src  = /tmp/ncremap_tmp_grd_src.nc.pid341
    dbg: hdr_pad  = 1000
    
     
  • Charlie Zender

    Charlie Zender - 2016-10-28

    Gentlemen,
    It appears to me that, on your system, 4.6.1 builds a "corrupt" version string into the ncks executable.
    Please execute

    nco_vrs=$(ncks --version 2>&1 >/dev/null | grep NCO | awk '{print $5}')

    with each version of ncks. I suspect the "message is" string comes from the version string that NCO reports. This becomes an argument to the ncks command and breaks ncremap. This string is baked into ncks at compile time. If you verify this then we can devise a fix. It may be possible that 4.6.2-alpha02 (the latest) does not have this problem....

     
  • Matthew Thompson

    Charlie,

    You are correct:

    [mathomp4@anvil src]$ ncks --version 2>&1 >/dev/null | grep NCO | awk '{print $5}'
    "4.6.0"
    
    [mathomp4@anvil ncremap]$ ncks --version 2>&1 >/dev/null | grep NCO | awk '{print $5}'
    "4.6.1"
    message
    is
    

    I will also note that 4.6.0 seems to have avoided the weird -Baselibs- error:

    [mathomp4@anvil src]$ ncks --version
    NCO netCDF Operators version "4.6.0" built by mathomp4 on anvil.gsfc.nasa.gov at Oct 28 2016 08:58:18
    ncks version "4.6.0"
    
    [mathomp4@anvil ncremap]$ ncks --version
    NCO netCDF Operators version "4.6.1" last modified 2016/08/08 built Oct 28 2016 on anvil.gsfc.nasa.gov by mathomp4
    ncks: WARNING cvs_vrs_prs() reports nco_sng_ptr == NULL
    nco_sng_cnv_err(): ERROR an NCO function or main program attempted to convert the user-defined string "-Baselibs-" to an integer-type using the standard C-library function "strtol()". This function stopped converting the input string when it encountered the illegal (i.e., non-numeric or non-integer) character '-'. This probably indicates a syntax error by the user. Please check the argument syntax and re-try the command. Exiting...
    nco_err_exit(): ERROR Short NCO-generated message (usually name of function that triggered error): nco_sng_cnv_err()
    nco_err_exit(): ERROR Error code is 0. This indicates an error occurred in NCO code or in a system call, not in the netCDF layer.
    nco_err_exit(): ERROR NCO will now exit with system call exit(EXIT_FAILURE)
    

    Note the only difference with these builds are:
    The NCO version. One is 4.6.1 the other 4.6.0.
    The directories they were built in. The 4.6.0 is located in:

    [mathomp4@anvil src]$ which ncks
    /ford1/share/gmao_SIteam/Baselibs/TmpBaselibs/GMAO_Baselibs_5_0_2_with_NCO460/x86_64-unknown-linux-gnu/ifort_16.0.2.181-openmpi_1.10.2/Linux/bin/ncks
    
    [mathomp4@anvil ncremap]$ which ncks
    /ford1/share/gmao_SIteam/Baselibs/TmpBaselibs/GMAO_Baselibs_5_0_2/x86_64-unknown-linux-gnu/ifort_16.0.2.181-openmpi_1.10.2/Linux/bin/ncks
    

    That is it. Same compilers, computer, everything.

    So, I took a look around and I have a possibility. Namely:

    (1108) $ diff GMAO_Baselibs_5_0_2/src/nco/src/nco/nco_scm.c GMAO_Baselibs_5_0_2_with_NCO460/src/nco/src/nco/nco_scm.c 
    1c1
    < /* $Header: /cvsroot/baselibs/Baselibs/src/nco/src/nco/nco_scm.c,v 1.1.1.14 2016/08/08 13:41:36 mathomp4 Exp $ */
    ---
    > /* $Header$ */
    34c34
    <   char cvs_Name[]="$Name: GMAO-Baselibs-5_0_2 $";
    ---
    >   char cvs_Name[]="$Name$";
    

    Could cvs_Name be causing all this? My 4.6.1 build came from a CVS checkout of our Baselibs. Meanwhile, the 4.6.0 was from your tarball on git. Since it was a tarball, the CVS keywords were not expanded unlike when I checked out Baselibs.

    I'm going to try a CVS checkout of our Baselibs but with -kk during the checkout. If that solves it, well, Good enough for me, I suppose. It's not like I really care about CVS keyword substitution when I build a model!

     
  • Matthew Thompson

    Yuuuuuuuuuup. By checking out our Baselibs with -kk, ncremap works again. All for a keyword substitution!

    Once again, CVS keywords get me. I'm beginning to think I should just add checkout -kk to my .cvsrc.

    Thanks for working with us on this, Charlie. I suppose as we move to git here, I wonder if those CVS keywords even do anything anymore? Maybe I can strip them from our code in a joyful manner knowing they'll never bother again...

     
  • Charlie Zender

    Charlie Zender - 2016-10-28

    Glad you found how to fix it. You may be the only site building NCO with CVS keyword expansion. I should remove the tokens. It's on the list.
    cz

     
  • Matthew Thompson

    We build some libraries that are old enough that CVS was still a thing when they were around. Surprising this never happened any place else! Oh CVS...

     

Log in to post a comment.