NCO netCDF Operators / Discussion / Help: Concatation very slow

Hi,

Firsts, thanks for developing the nco library!

I am right now trying to use it to concatenate several files (about 2000 files). Each file is a tile/slide on a lat, lon with several variables (about 20) so I am concatenating first over longitudes and them over latitudes. I am using ncrcat because the size of each tile is not always the same. So basically I proceed as:

for number in {0..40}; do
    npad=$(printf %02d $number) 
    echo $npad '...processing loop--------------------------------'
    for number2 in {0..48}; do
        n2pad=$(printf %02d $number2) 
        file='slice_'$n2pad'_'$npad'.nc'
        new="${file/.nc/_rec.nc}"
        echo '   Adjusting' $file ' to  '$new 

        # Delete unwanted variables and dim3
        ncks  -O -h -x -C -v vars90,vars70,dim3 $file $new  

        # Ensure that we reorder to have aggregating dimension first
        ncpdq -O -h -a lon,lat           $new $new

        # Aggregating dimension is defined as unlimited. 
        ncks  -O -h --mk_rec_dmn lon     $new $new

    done
    # Now we should have a list of *_rec.nc files to aggregate on lon dim.
    # and aggregate all these files on longitude

    ncrcat -O -h *_rec.nc lon_added.nc 

    echo 'ok aggregation lon'  #this works perfectly!!

    # now we revert the lon to typical non-record dimension
    ncks  --fix_rec_dmn lon lon_added.nc lon_added_fix.nc

    # now we reorder the variables to be lat,lon
    ncpdq -O -h -a lat,lon lon_added_fix.nc lon_added_$npad.nc

    # and finally we define lat as record/unlimited dimension
    ncks --mk_rec_dmn lat 'lon_added_'$npad'.nc' 'lon_added_'$npad'_rec_lat.nc'

    rm lon_added.nc lon_added_fix.nc
    rm slice_*_rec.nc  # we clean all the files before next loop step.
done
# all the files named 'lon_added_'$npad'_rec_lat.nc' can be aggreated on lat
# and no other files are named with regular expression '*_rec_lat.nc', so

ncrcat *_rec_lat.nc  latlon_added_temp.nc 

# now we revert the lon to typical non-record dimension
ncks --fix_rec_dmn lat latlon_added_temp.nc latlon_added_fix.nc  

# finally we ensure that file is lat,lon order
ncpdq -O -h -a lat,lon latlon_added_fix.nc latlon_added.nc

Inital ncinfo of each slide file is like:

ncinfo HWSD_VARIABLES_slice_32_02.nc
<type 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    dimensions(sizes): lat(440), lon(880), dim3(12)
    variables(dimensions): float64 lat(lat), float64 lon(lon), int64 dim3(dim3), uint16 index_WSD(lat,lon), var1(lat,lon)......
    groups:

After aggregate on lon and reorder dimensions:

<type 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    history: Wed Sep  5 23:03:29 2018: ncks --mk_rec_dmn lat lon_added_31.nc lon_added_31_rec_lat.nc
Wed Sep  5 23:00:35 2018: ncks --fix_rec_dmn lon lon_added.nc lon_added_fix.nc
    NCO: "4.6.3"
    dimensions(sizes): lat(440), lon(43151)
    variables(dimensions): uint16 index_WSD(lat,lon), var1(lat,lon)......
    groups:

Each file is large (about 4Gb) but I simply tried to aggregate two of thems with dimensions
dimensions(sizes): lat(440), lon(43151)
dimensions(sizes): lat(441), lon(43151)
By changing the first loop to

for number in {0..1}; do

I waited more than 4 hours and it never finished. I tried also outside of any loop with same results. I tried to compress two of them that I know are mostly zeros (and change to 4G to about 400Mb) and also they seems to be imposible to concatenate.

Maybe I am doing something wrong or simply it is a very slow process?

Thanks in advance,
Ramiro.

Last edit: R. Checa-Garcia 2018-09-06

Concatation very slow

Command-line operators for netCDF and HDF files

Forums

Help

Concatation very slow

Concatation very slow

Command-line operators for netCDF and HDF files

Forums

Help

Concatation very slow document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Concatation very slow