Thanks for the report.
You gave a test case so it was easy to try to reproduce.
Oddly, I cannot reproduce the problem with either version 3.1.8
or the current 3.2.0 code.
Here is what may be happening (please verify and let us know):
The -Oh switch in your command is incorrect.
You either want -O or -h or -O -h.
For some reason, your ia64 platform handles the getopt() system
call differently than my ia32, opteron, and power4 test systems.
This leads to the weirdness you are seeing.
I admit that the getopt() parsing should bomb if my theory is
correct and I'm not sure why it doesn't.
Anyway, please separate -Oh into -O -h and tell us what happens.
Thanks,
Charlie
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, the error only occurs on IA64. Separating -O -h didn't remove the error. Do you have access to an IA64 machine?
The odd thing is that if the variable names are renamed , such as t_ref_min to var1 and t_ref_max to var2, the error doesn't occur. Weird dependence, no?
Thanks for looking into it.
Remik
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> Yes, the error only occurs on IA64. Separating -O -h didn't remove the error.
Interesting. My getopt() theory is wrong.
> Do you have access to an IA64 machine?
No, so we won't be able to track down where things go awry.
> The odd thing is that if the variable names are renamed , such as t_ref_min
> to var1 and t_ref_max to var2, the error doesn't occur. Weird dependence, no?
No. I mean Yes.
> Thanks for looking into it.
Sure. We appreciate hearing about the weird behavior.
It might turn out to be a symptom of cross-platform problem,
or it might be a platform-specific software bug.
Charlie
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
>> Do you have access to an IA64 machine?
>No, so we won't be able to track down where things go awry.
Maybe if I get some spare cycles, I'll trace through to see what's going on. Anyway, one workaround is to extract and append each variable incrementally, in case someone else encounters this:
FYI,
I noticed that this occurs when compiled with Intel (debug or not). GCC binaries on ia64 work fine. I traced the program through the function "nco_cpy_var_val", and the correct data is loaded from the input file, so something goes wrong with output file synchronization.
When superficially injecting into ncks.c:559 at the end of the "Copy variable data" for-loop
nc_sync(out_id);
the output is correct, although this isn't a solution we'd want. Oh well.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for tracking this down farther.
It's still a mystery why this happens,
but at least we know where it happens.
I'm reluctant to use the nc_sync() patch
because this is either a compiler bug or
an NCO bug and nc_sync() doesn't fix either,
it just hides it.
It's now TODO nco873 and I'll try to look into it more later.
Charlie
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
These variable names cause the coords (lon,lat) data to be output as zero:
ncks -v t_ref_min,t_ref_max -Oh in.nc out.nc
ncdump -v lon out.nc
But if you rename the variables, then the coords data is fine:
ncrename -v t_ref_min,var1 in.nc in2.nc
ncrename -v t_ref_max,var2 in2.nc
ncks -v var1,var2 -Oh in2.nc out.nc
ncdump -v lon out.nc
This applies to ncks 3.1.4, 3.1.9 on an ia64 platform.
The input file can be downloaded from
ftp://ftp.gfdl.noaa.gov/pub/rsz/ncksbug/in.nc
Hi (Roland?),
Thanks for the report.
You gave a test case so it was easy to try to reproduce.
Oddly, I cannot reproduce the problem with either version 3.1.8
or the current 3.2.0 code.
Here is what may be happening (please verify and let us know):
The -Oh switch in your command is incorrect.
You either want -O or -h or -O -h.
For some reason, your ia64 platform handles the getopt() system
call differently than my ia32, opteron, and power4 test systems.
This leads to the weirdness you are seeing.
I admit that the getopt() parsing should bomb if my theory is
correct and I'm not sure why it doesn't.
Anyway, please separate -Oh into -O -h and tell us what happens.
Thanks,
Charlie
Hi Charlie,
Yes, the error only occurs on IA64. Separating -O -h didn't remove the error. Do you have access to an IA64 machine?
The odd thing is that if the variable names are renamed , such as t_ref_min to var1 and t_ref_max to var2, the error doesn't occur. Weird dependence, no?
Thanks for looking into it.
Remik
Hi Remik,
> Yes, the error only occurs on IA64. Separating -O -h didn't remove the error.
Interesting. My getopt() theory is wrong.
> Do you have access to an IA64 machine?
No, so we won't be able to track down where things go awry.
> The odd thing is that if the variable names are renamed , such as t_ref_min
> to var1 and t_ref_max to var2, the error doesn't occur. Weird dependence, no?
No. I mean Yes.
> Thanks for looking into it.
Sure. We appreciate hearing about the weird behavior.
It might turn out to be a symptom of cross-platform problem,
or it might be a platform-specific software bug.
Charlie
>> Do you have access to an IA64 machine?
>No, so we won't be able to track down where things go awry.
Maybe if I get some spare cycles, I'll trace through to see what's going on. Anyway, one workaround is to extract and append each variable incrementally, in case someone else encounters this:
ncks -v t_ref_min in.nc out.nc
ncks -A -v t_ref_max in.nc out.nc
Thanks.
Remik
Thanks for the workaround.
Keep us posted if you find anything more.
Charlie
FYI,
I noticed that this occurs when compiled with Intel (debug or not). GCC binaries on ia64 work fine. I traced the program through the function "nco_cpy_var_val", and the correct data is loaded from the input file, so something goes wrong with output file synchronization.
When superficially injecting into ncks.c:559 at the end of the "Copy variable data" for-loop
nc_sync(out_id);
the output is correct, although this isn't a solution we'd want. Oh well.
Hi Remik,
Thanks for tracking this down farther.
It's still a mystery why this happens,
but at least we know where it happens.
I'm reluctant to use the nc_sync() patch
because this is either a compiler bug or
an NCO bug and nc_sync() doesn't fix either,
it just hides it.
It's now TODO nco873 and I'll try to look into it more later.
Charlie