Menu

ncks BUG: coords are zero after 2 var extract

Developers
2007-05-23
2013-10-17
  • Remik Ziemlinski

    These variable names cause the coords (lon,lat) data to be output as zero:

    ncks -v t_ref_min,t_ref_max -Oh in.nc out.nc
    ncdump -v lon out.nc

    But if you rename the variables, then the coords data is fine:

    ncrename -v t_ref_min,var1 in.nc in2.nc
    ncrename -v t_ref_max,var2 in2.nc
    ncks -v var1,var2 -Oh in2.nc out.nc
    ncdump -v lon out.nc

    This applies to ncks 3.1.4, 3.1.9 on an ia64 platform.
    The input file can be downloaded from
    ftp://ftp.gfdl.noaa.gov/pub/rsz/ncksbug/in.nc

     
    • Charlie Zender

      Charlie Zender - 2007-05-23

      Hi (Roland?),

      Thanks for the report.
      You gave a test case so it was easy to try to reproduce.
      Oddly, I cannot reproduce the problem with either version 3.1.8
      or the current 3.2.0 code.
      Here is what may be happening (please verify and let us know):

      The -Oh switch in your command is incorrect.
      You either want -O or -h or -O -h.
      For some reason, your ia64 platform handles the getopt() system
      call differently than my ia32, opteron, and power4 test systems.
      This leads to the weirdness you are seeing.
      I admit that the getopt() parsing should bomb if my theory is
      correct and I'm not sure why it doesn't.

      Anyway, please separate -Oh into -O -h and tell us what happens.

      Thanks,
      Charlie

       
    • Remik Ziemlinski

      Hi Charlie,

      Yes, the error only occurs on IA64. Separating -O -h didn't remove the error.  Do you have access to an IA64 machine?

      The odd thing is that if the variable names are renamed , such as t_ref_min to var1 and t_ref_max to var2, the error doesn't occur. Weird dependence, no?

      Thanks for looking into it.

      Remik

       
      • Charlie Zender

        Charlie Zender - 2007-05-24

        Hi Remik,

        > Yes, the error only occurs on IA64. Separating -O -h didn't remove the error.

        Interesting. My getopt() theory is wrong.

        > Do you have access to an IA64 machine?

        No, so we won't be able to track down where things go awry.

        > The odd thing is that if the variable names are renamed , such as t_ref_min
        > to var1 and t_ref_max to var2, the error doesn't occur. Weird dependence, no?

        No. I mean Yes.

        > Thanks for looking into it.

        Sure. We appreciate hearing about the weird behavior.
        It might turn out to be a symptom of cross-platform problem,
        or it might be a platform-specific software bug.

        Charlie

         
    • Remik Ziemlinski

      >> Do you have access to an IA64 machine?
      >No, so we won't be able to track down where things go awry.

      Maybe if I get some spare cycles, I'll trace through to see what's going on.  Anyway, one workaround is to extract and append each variable incrementally, in case someone else encounters this:

      ncks -v t_ref_min in.nc out.nc
      ncks -A -v t_ref_max in.nc out.nc

      Thanks.
      Remik

       
      • Charlie Zender

        Charlie Zender - 2007-05-25

        Thanks for the workaround.
        Keep us posted if you find anything more.

        Charlie

         
    • Remik Ziemlinski

      FYI,
      I noticed that this occurs when compiled with Intel (debug or not).  GCC binaries on ia64 work fine.  I traced the program through the function "nco_cpy_var_val", and the correct data is loaded from the input file, so something goes wrong with output file synchronization.

      When superficially injecting into ncks.c:559 at the end of the "Copy variable data" for-loop
          nc_sync(out_id);
      the output is correct, although this isn't a solution we'd want. Oh well.

       
      • Charlie Zender

        Charlie Zender - 2007-06-03

        Hi Remik,

        Thanks for tracking this down farther.
        It's still a mystery why this happens,
        but at least we know where it happens.
        I'm reluctant to use the nc_sync() patch
        because this is either a compiler bug or
        an NCO bug and nc_sync() doesn't fix either,
        it just hides it.

        It's now TODO nco873 and I'll try to look into it more later.

        Charlie

         

Log in to post a comment.