I've finally isolated a reproducible symptom of a recent NCO bug.
The following four commands are TODO #306:
cd ~/nco/data;ncgen -b -o in.nc in.cdl
ncwa -O -a time -v u,v in.nc foo.nc # Compute time mean of u,v
ncrename -O -v u,uavg -v v,vavg foo.nc # Rename to avoid conflict
ncks -A -C -v u,v in.nc foo.nc # Place originals with time means
work fine when NCO is built with bld/Makefile but the last command
generates this error when NCO is built with autotools:
ncks: ERROR attempt to write 1 dimensional input variable u to 0
dimensional space in output file.
Can anyone reproduce this problem? Any idea what's going on?
I checked a few things and found nothing wrong with the code, yet.
Building with --disable-shared does not fix things with autotools.
I'm kinda running out of ideas. I hate build-specific problems.
Thanks,
Charlie
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
drew my attention to this problem, hence the demo code.
In other words, the variance computations I've suggested seem to work
unless you try them on my laptop (and, I believe, the bug poster's)
with autotools builds of NCO.
Hmm.
Charlie
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I still cannot reproduce this bug, even on my Debian/sid machine. Furthermore, it seems to me that the error message
ncks: ERROR attempt to write 1 dimensional input variable u to 0 dimensional space in output file.
has to come from one of two places:
nco_var_utl.c: lines 309-310
or
nco_msa.c: lines 497-498
however, I do not ever get either of these two places running the 3-line test you gave me. I think this is because there are no dimension indicies specified with the -d option (which there isn't). Perhaps the bug is buried somewhere such that ncks is thinking there is a dimension specification when there is not.
Anyway, there is not much more I think I can do until I can reproduce this error. Thinking about the autotools specific nature of this, is there any change that different netCDF libraries are getting linked in the two cases? I see that there are twot netCDF library calls right before the error message gets triggered.
rorik
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> I still cannot reproduce this bug, even on my Debian/sid machine. Furthermore,
> it seems to me that the error message
Hmm.
> ncks: ERROR attempt to write 1 dimensional input variable u to 0 dimensional
> space in output file.
>
> has to come from one of two places:
> nco_var_utl.c: lines 309-310
> or
> nco_msa.c: lines 497-498
I disagree. I believe it comes from nco_var_utl.c nco_cpy_var_val():
line 208. I did not include the "HINT" part of the warning in the
original posting. Sorry.
> Anyway, there is not much more I think I can do until I can reproduce this error.
Agreed. My gut tells me that this is a real problem and that it has to
do with NCO's behavior possibly differing when it is dynamically
linked, not to autotools per se. I've never thoroughly audited the
code to guarantee it is fully "re-entrant". For instance, it's
possible there are static variables in the library that may get
set to weird states when multiple applications use NCO at the same
time.
> Thinking about the autotools specific nature of this, is there any change that
> different netCDF libraries are getting linked in the two cases? I see that
> there are twot netCDF library calls right before the error message gets
> triggered.
Here is the full error. It occurs when building with --disable-shared
or --enable-shared. One difference between bld/Makefile and
--disable-shared is that bld/Makefile is completely staticly linked,
where --disable-shared still dynamically links to libm and libc:
zender@ashes:~/nco$ ldd `which ncks`
libm.so.6 => /lib/libm.so.6 (0x40025000)
libc.so.6 => /lib/libc.so.6 (0x40047000)
/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
zender@ashes:~/nco/data$ ncks -r
NCO netCDF Operators version "2.8.7" built Jan 22 2004 on ashes by zender
Copyright (C) 1995--2004 Charlie Zender
ncks version "2.8.7"
NCO is free software and comes with ABSOLUTELY NO WARRANTY
NCO is distributed under the terms of the GNU General Public License
"RIP Ed McMullin (1941--2003): Musician, Singer, Songwriter, Teacher, Father, Husband. Keep on Gig'n. http://dust.ess.uci.edu/ed"
Linked to netCDF library version 3.5.1-beta10, compiled Nov 7 2003 23:09:02
Homepage URL: http://nco.sf.net
User's Guide: http://nco.sf.net/nco.html
Configuration Option: Active? Reference:
Debugging: Custom No Pedantic, bounds checking (slowest execution)
Debugging: Symbols No Produce symbols for debuggers (e.g., dbx, gdb)
DODS/OpenDAP clients No http://nco.sf.net/nco.html#DODS
Internationalization No http://nco.sf.net/nco.html#i18n (not ready)
OpenMP Multi-threading No http://nco.sf.net/nco.html#omp (alpha testing)
Optimization: run-time Yes Fastest execution possible (slowest compilation)
UDUnits conversions Yes http://nco.sf.net/nco.html#UDUnits
Wildcarding (regex) Yes http://nco.sf.net/nco.html#rx
zender@ashes:~/nco$ cd ~/nco/data;ncgen -b -o in.nc in.cdl
zender@ashes:~/nco/data$ ncwa -O -a time -v u,v in.nc foo.nc # Compute time mean of u,v
zender@ashes:~/nco/data$ ncrename -O -v u,uavg -v v,vavg foo.nc # Rename to avoid conflict
zender@ashes:~/nco/data$ ncks -A -C -v u,v in.nc foo.nc # Place originals with time means
ncks: WARNING Overwriting global attribute Conventions
ncks: WARNING Overwriting global attribute history
ncks: WARNING Overwriting global attribute julian_day
ncks: WARNING Overwriting global attribute RCS_Header
ncks: ERROR attempt to write 1 dimensional input variable u to 0 dimensional space in output file.
HINT: When using -A (append) option, all appended variables must be the same rank in the input file as in the output file. ncwa operator is useful at ridding variables of extraneous (size = 1) dimensions. Read the manual to see how.
Thanks,
Charlie
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You get libm and libc to be statically linked using bld/Makefile? I don't, and I'm not sure how that happens when using the linker flag -lm with gcc instead of manually adding /usr/lib/libm.a to the object files. I also don't know how to statically link with libc in any circumstances (although I do have /usr/lib/libc.a).
I don't follow how that could be happening with your GNU/Linux system.
In other words, one or more of the many Linux gcc flags triggered by
--enable-optimize-custom may cause the problem.
Since I've already verified that the problem does not occur with AIX
or SGI compilers, it makes sense to me that it's one of the GCC flags.
Will you try building both ways and verify whether this is the case?
If you can reproduce the problem then the next step is to figure out
which gcc flag causes it and then we may pinpoint the offending code.
Thanks,
Charlie
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Charlie,
I think the problem is -fshort-enums. According to the GCC manual
"Warning: the -fshort-enums switch causes GCC to generate code that is not binary compatible with code generated without that switch. Use it to conform to a non-default application binary interface."
The error occurs when libnetcdf is build without -fshort-enums. I built netcdf-3.5.1-beta10 with and without the flag, and the error seems to go away when both nco and netcdf have it, and obviously, when they both do not.
I think we should get rid of -fshort-enums. The savings are not worth bothering people to rebuild libnetcdf.
rorik
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi All,
I've finally isolated a reproducible symptom of a recent NCO bug.
The following four commands are TODO #306:
cd ~/nco/data;ncgen -b -o in.nc in.cdl
ncwa -O -a time -v u,v in.nc foo.nc # Compute time mean of u,v
ncrename -O -v u,uavg -v v,vavg foo.nc # Rename to avoid conflict
ncks -A -C -v u,v in.nc foo.nc # Place originals with time means
work fine when NCO is built with bld/Makefile but the last command
generates this error when NCO is built with autotools:
ncks: ERROR attempt to write 1 dimensional input variable u to 0
dimensional space in output file.
Can anyone reproduce this problem? Any idea what's going on?
I checked a few things and found nothing wrong with the code, yet.
Building with --disable-shared does not fix things with autotools.
I'm kinda running out of ideas. I hate build-specific problems.
Thanks,
Charlie
Charlie,
I did a quick test and could not reproduce the bug. I'll look into it further tomorrow. Is this bug repeatable on different OS's?
rorik
Hi Rorik,
Thanks for looking at this.
> Is this bug repeatable on different OS's?
Good question. Apparently not:
The problem occurs as described on my Debian Sid i686 laptops.
The symptoms do not occur (autotools builds work fine) on AIX and SGI,
and, surprisingly, on my RedHat 9 Linux desktop (with or without DODS).
I know that I am not the only one having this problem.
The thread on "Using NCO to calculate variance"
https://sourceforge.net/forum/forum.php?thread_id=1005783&forum_id=9829
drew my attention to this problem, hence the demo code.
In other words, the variance computations I've suggested seem to work
unless you try them on my laptop (and, I believe, the bug poster's)
with autotools builds of NCO.
Hmm.
Charlie
I still cannot reproduce this bug, even on my Debian/sid machine. Furthermore, it seems to me that the error message
ncks: ERROR attempt to write 1 dimensional input variable u to 0 dimensional space in output file.
has to come from one of two places:
nco_var_utl.c: lines 309-310
or
nco_msa.c: lines 497-498
however, I do not ever get either of these two places running the 3-line test you gave me. I think this is because there are no dimension indicies specified with the -d option (which there isn't). Perhaps the bug is buried somewhere such that ncks is thinking there is a dimension specification when there is not.
Anyway, there is not much more I think I can do until I can reproduce this error. Thinking about the autotools specific nature of this, is there any change that different netCDF libraries are getting linked in the two cases? I see that there are twot netCDF library calls right before the error message gets triggered.
rorik
Hi Rorik,
> I still cannot reproduce this bug, even on my Debian/sid machine. Furthermore,
> it seems to me that the error message
Hmm.
> ncks: ERROR attempt to write 1 dimensional input variable u to 0 dimensional
> space in output file.
>
> has to come from one of two places:
> nco_var_utl.c: lines 309-310
> or
> nco_msa.c: lines 497-498
I disagree. I believe it comes from nco_var_utl.c nco_cpy_var_val():
line 208. I did not include the "HINT" part of the warning in the
original posting. Sorry.
> Anyway, there is not much more I think I can do until I can reproduce this error.
Agreed. My gut tells me that this is a real problem and that it has to
do with NCO's behavior possibly differing when it is dynamically
linked, not to autotools per se. I've never thoroughly audited the
code to guarantee it is fully "re-entrant". For instance, it's
possible there are static variables in the library that may get
set to weird states when multiple applications use NCO at the same
time.
> Thinking about the autotools specific nature of this, is there any change that
> different netCDF libraries are getting linked in the two cases? I see that
> there are twot netCDF library calls right before the error message gets
> triggered.
Here is the full error. It occurs when building with --disable-shared
or --enable-shared. One difference between bld/Makefile and
--disable-shared is that bld/Makefile is completely staticly linked,
where --disable-shared still dynamically links to libm and libc:
zender@ashes:~/nco$ ldd `which ncks`
libm.so.6 => /lib/libm.so.6 (0x40025000)
libc.so.6 => /lib/libc.so.6 (0x40047000)
/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
zender@ashes:~/nco/data$ ncks -r
NCO netCDF Operators version "2.8.7" built Jan 22 2004 on ashes by zender
Copyright (C) 1995--2004 Charlie Zender
ncks version "2.8.7"
NCO is free software and comes with ABSOLUTELY NO WARRANTY
NCO is distributed under the terms of the GNU General Public License
"RIP Ed McMullin (1941--2003): Musician, Singer, Songwriter, Teacher, Father, Husband. Keep on Gig'n. http://dust.ess.uci.edu/ed"
Linked to netCDF library version 3.5.1-beta10, compiled Nov 7 2003 23:09:02
Homepage URL: http://nco.sf.net
User's Guide: http://nco.sf.net/nco.html
Configuration Option: Active? Reference:
Debugging: Custom No Pedantic, bounds checking (slowest execution)
Debugging: Symbols No Produce symbols for debuggers (e.g., dbx, gdb)
DODS/OpenDAP clients No http://nco.sf.net/nco.html#DODS
Internationalization No http://nco.sf.net/nco.html#i18n (not ready)
OpenMP Multi-threading No http://nco.sf.net/nco.html#omp (alpha testing)
Optimization: run-time Yes Fastest execution possible (slowest compilation)
UDUnits conversions Yes http://nco.sf.net/nco.html#UDUnits
Wildcarding (regex) Yes http://nco.sf.net/nco.html#rx
zender@ashes:~/nco$ cd ~/nco/data;ncgen -b -o in.nc in.cdl
zender@ashes:~/nco/data$ ncwa -O -a time -v u,v in.nc foo.nc # Compute time mean of u,v
zender@ashes:~/nco/data$ ncrename -O -v u,uavg -v v,vavg foo.nc # Rename to avoid conflict
zender@ashes:~/nco/data$ ncks -A -C -v u,v in.nc foo.nc # Place originals with time means
ncks: WARNING Overwriting global attribute Conventions
ncks: WARNING Overwriting global attribute history
ncks: WARNING Overwriting global attribute julian_day
ncks: WARNING Overwriting global attribute RCS_Header
ncks: ERROR attempt to write 1 dimensional input variable u to 0 dimensional space in output file.
HINT: When using -A (append) option, all appended variables must be the same rank in the input file as in the output file. ncwa operator is useful at ridding variables of extraneous (size = 1) dimensions. Read the manual to see how.
Thanks,
Charlie
You get libm and libc to be statically linked using bld/Makefile? I don't, and I'm not sure how that happens when using the linker flag -lm with gcc instead of manually adding /usr/lib/libm.a to the object files. I also don't know how to statically link with libc in any circumstances (although I do have /usr/lib/libc.a).
I don't follow how that could be happening with your GNU/Linux system.
rorik@chabuku:~/nco/bld$ make > make.log 2>&1
rorik@chabuku:~/nco/bld$ ldd ../bin/ncks
libnetcdf.so.3 => /usr/lib/libnetcdf.so.3 (0x40023000)
libm.so.6 => /lib/libm.so.6 (0x40046000)
libc.so.6 => /lib/libc.so.6 (0x40068000)
/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
rorik
I'm not going to have time to get back to this
for a few weeks. I'm glad it's not affecting everyone,
though it's still affecting me.
Charlie
Hi Rorik,
First, you are right about the static/dynamic linking. It may be a red herring anyway.
've found what may be a difference between our builds that
causes the "reproducible" problem to occur.
The problem occurs on my fully updated Debian Sid box with
./configure --enable-optimize-custom --prefix=${HOME} --bindir=${MY_BIN_DIR} --datadir=${HOME}/nco/data --libdir=${MY_LIB_DIR} --mandir=${HOME}/nco/man > configure.${GNU_TRP}.foo 2>&1
but not with
./configure --prefix=${HOME} --bindir=${MY_BIN_DIR} --datadir=${HOME}/nco/data --libdir=${MY_LIB_DIR} --mandir=${HOME}/nco/man > configure.${GNU_TRP}.foo 2>&1
In other words, one or more of the many Linux gcc flags triggered by
--enable-optimize-custom may cause the problem.
Since I've already verified that the problem does not occur with AIX
or SGI compilers, it makes sense to me that it's one of the GCC flags.
Will you try building both ways and verify whether this is the case?
If you can reproduce the problem then the next step is to figure out
which gcc flag causes it and then we may pinpoint the offending code.
Thanks,
Charlie
Charlie,
I think the problem is -fshort-enums. According to the GCC manual
"Warning: the -fshort-enums switch causes GCC to generate code that is not binary compatible with code generated without that switch. Use it to conform to a non-default application binary interface."
The error occurs when libnetcdf is build without -fshort-enums. I built netcdf-3.5.1-beta10 with and without the flag, and the error seems to go away when both nco and netcdf have it, and obviously, when they both do not.
I think we should get rid of -fshort-enums. The savings are not worth bothering people to rebuild libnetcdf.
rorik
Excellent work!
I will remove that switch and release 2.8.8 ASAP.
Thanks,
Charlie