From: Andreas H. <li...@hi...> - 2013-05-25 15:07:17
|
Am 25.05.2013 14:27, schrieb Andreas Hilboll: > Hi, > > the netcdf4-python project > (http://netcdf4-python.googlecode.com/svn/trunk/docs/netCDF4.Dataset-class.html#createVariable) > supports a "least_significant_digit" attribute when creating a > variable/array. This leads to a truncation of the array data before > storing it to disk > (https://code.google.com/p/netcdf4-python/source/browse/trunk/netCDF4_utils.py#26), > which leads to be zlib compression more effective. > > My question: Is the same true when I compress the array data with blosc? > Will I get significant compression improvements when truncating my data > before storing it in pytables? Actually, I can now answer my own question: Yes, it does save some space. As test, I created a file with two 5760x2880x12 arrays of dtype float32. The data values are all in the range between +-1E17. When I truncate the input values to 1E11 (least_significant_digit=-11), when I get about 20% space reduction: -rw-r--r-- 1 andreas andreas 418M Mai 25 16:47 satdb_blosc9-11.h5 -rw-r--r-- 1 andreas andreas 578M Mai 25 16:34 satdb_blosc9.h5 Would you guys be interested in having this as an optional filter? If so, I'd be happy to submit a PR for this. -- Andreas. |