I have two NetCDF files from a regional climate model. When I post-process one variable w/ Python (xarray + dask) it takes 3 minutes. When I post-process another it takes 3 days (!). The only difference appears to be the metadata. I'd like to clone the "fast" metadata to the "slow" file to see if that fixes things. Below is the fast metadat, then the slow, then the diff. Can you advise how to use nco to do this?
Thanks,
-k.
FAST:
dimensions:time=90;x=1496;y=2700;variables:floattime(time);time:units="DAYS since 2017-01-01 00:00:00";time:long_name="time";time:standard_name="time";floatx(x);x:units="km";x:long_name="x";x:standard_name="x";floaty(y);y:units="km";y:long_name="y";y:standard_name="y";floatLON(y,x);LON:units="km";LON:long_name="Easting";LON:standard_name="Easting";LON:actual_range=0.f,0.f;LON:missing_value=0.f,-1.225254e+28f;floatLAT(y,x);LAT:units="km";LAT:long_name="Northing";LAT:standard_name="Northing";LAT:actual_range=0.f,0.f;LAT:missing_value=0.f,-1.225254e+28f;floatrunoffcorr(time,y,x);runoffcorr:units="mm w.e. per day";runoffcorr:long_name="Downscaled corrected snowmelt";runoffcorr:standard_name="Downscaled_corrected_snowmelt";runoffcorr:actual_range=0.f,28.61563f;runoffcorr:missing_value=-1.e+30f;// global attributes::grid="Map Projection:Polar Stereographic Ellipsoid - Map Reference Latitude: 90.0 - Map Reference Longitude: -39.0 - Map Second Reference Latitude: 71.0 - Map Eccentricity: 0.081819190843 ;wgs84 - Map Equatorial Radius: 6378137.0 ;wgs84 meters - Grid Map Origin Column: 160 - Grid Map Origin Row: -120 - Grid Map Units per Cell: 5000 - Grid Width: 301 - Grid Height: 561";:netcdf="4.4.1.1 of Nov 25 2017 10:57:26 $";:_Format="classic";}
SLOW:
dimensions:time=90;x=1496;y=2700;variables:floattime(time);time:units="DAYS since 2017-01-01 00:00:00";time:long_name="time";time:standard_name="time";floatx(x);x:units="km";x:long_name="x";x:standard_name="x";floaty(y);y:units="km";y:long_name="y";y:standard_name="y";floatLON(y,x);LON:units="Degree";LON:long_name="Longitude";LON:standard_name="Longitude";LON:actual_range=-639.4561f,855.5441f;LON:missing_value=-1.e+30f;floatLAT(y,x);LAT:units="Degree";LAT:long_name="Latitude";LAT:standard_name="Latitude";LAT:actual_range=-3355.096f,-656.096f;LAT:missing_value=-1.e+30f;floatprecipcorr(time,y,x);precipcorr:units="mm w.e. per day";precipcorr:long_name="1km Topography precip";precipcorr:standard_name="1km_Topography_precip";precipcorr:actual_range=-0.0154459f,575.9426f;precipcorr:missing_value=-1.e+30f;// global attributes::grid="Map Projection:Polar Stereographic Ellipsoid - Map Reference Latitude: 90.0 - Map Reference Longitude: -39.0 - Map Second Reference Latitude: 71.0 - Map Eccentricity: 0.081819190843 ;wgs84 - Map Equatorial Radius: 6378137.0 ;wgs84 meters - Grid Map Origin Column: 160 - Grid Map Origin Row: -120 - Grid Map Units per Cell: 5000 - Grid Width: 301 - Grid Height: 561";:netcdf="4.4.1 of Jun 27 2017 09:19:19 $";:_Format="classic";}
I note that have multiple missing_values is non-standard, and may contribute to slowness. That said, NCO has the ncatted operator that can change multiple attributes with one command. That is the safest best bet. An alternative, unsupported, would be to try appending the variables data from one file to the other, but restricting the propagation of metadata (with -m -M) or of data (with -H).
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The file with two missing_value actually processes orders of magnitude faster. The other is the slow one.
Digging further, I find that the fast file has empty LON and LAT arrays, but the slow one has populated LON and LAT arrays (and for some reason, using weird range/values)
I have two NetCDF files from a regional climate model. When I post-process one variable w/ Python (xarray + dask) it takes 3 minutes. When I post-process another it takes 3 days (!). The only difference appears to be the metadata. I'd like to clone the "fast" metadata to the "slow" file to see if that fixes things. Below is the fast metadat, then the slow, then the diff. Can you advise how to use nco to do this?
Thanks,
-k.
FAST:
SLOW:
DIFF from
diff <(ncdump -s -h precip/precip_WJB_int.2017_JFM.BN_RACMO2.3p2_FGRN055_1km.DD.nc) <(ncdump -s -h runoff/runoff_WJB_int.2017_JFM.BN_RACMO2.3p2_FGRN055_1km.DD.nc)
produces:
Last edit: Ken Mankoff 2019-03-12
I note that have multiple
missing_value
s is non-standard, and may contribute to slowness. That said, NCO has thencatted
operator that can change multiple attributes with one command. That is the safest best bet. An alternative, unsupported, would be to try appending the variables data from one file to the other, but restricting the propagation of metadata (with-m -M
) or of data (with-H
).Hi Charlie,
The file with two missing_value actually processes orders of magnitude faster. The other is the slow one.
Digging further, I find that the fast file has empty LON and LAT arrays, but the slow one has populated LON and LAT arrays (and for some reason, using weird range/values)
Can you advise what nco command can empty a variable so that ncdump shows:
LON =
, , , , , , , , , , , , , , , , , , , , , , , , ,
, , , , , , , , , , , , , , , , , , , , , , , ,
, , , , , , , , , , , , , , , , , , , , , , , _,
Thanks,
-k.
ncap2 can set any variable to any value, http://nco.sf.net/nco.html#ncap2