Nco with Opendap/Deflate and bandwidth usage

ocehugo
2014-05-13
2014-05-15
  • ocehugo
    ocehugo
    2014-05-13

    Hello,

    I would like to report some strange behaviour of NCO/Opendap/Threadds/Netcdf api. Since I don't know where the problem is, i will post here instead of talking directly to unidata since I'm using nco to download some datasets.

    Problem: Requesting compressed data do not transfer compressed data, instead it transfer the stream in a uncompressed way.

    Example: Requesting a opendap request with ncks -4 -L1 -v var1 url file.nc returns the file.nc in a netcdf-4 format and compressed with the simpler deflate level.

    So:

    file.nc has a size in MB that is sizeC.

    and:

    ncks -3 file.nc file_n3.nc returns a file with sizeU.

    SizeU is almost 4x SizeC. All right, compression is really good. The problem lies with the bandwidth. Monitoring the transfer the total transfered between local machine and opendap server is SizeU not SizeC.
    Actually with different servers the difference can be 2 times SizeU ( Another problem see below).

    So a n4 file.nc with 10mb compressed have a equivalent file in n3 format, file_n3.nc, having a ~40mb size uncompressed, but the transfer stream was 40mb.

    So, it looks like NCO cannot request a compressed stream from the opendap server to save bandwidth. I've already change the .dodsrc file to apply a DEFLATE flag but turning it on/off doesn't seems to change anything at all ( I tried both on $HOME and on current folder of ncks command call).

    Problem2 : Requesting compressed streams in some opendap servers can lead to transfers 2 times larger than the uncompressed data.

    Example: I have another server that I requested a variable with ncks -4 -L1 -v var2 url2 file2.nc. Everything works out of the box, but this time, the total transfer is 2*SizeU.

    So a n4 file.nc with 10mb compressed results in a 40mb n3 file, but teh transfer was 80mb!

    This looks like a behaviour in the opendap side and was actually my first problem. In the tests between different servers i realize that in some servers the transfer are done with the uncompressed data and others with double the uncompressed data. In any case there is no way to request transfers with compressed data.

    The same behaviour (double SizeU) was found using "ncks -3" opendap requests (no compression). Since I'm only a user, can't say the difference between the opendap/threadds servers that cause this. Only difference is that one is accessed without credentials and the second one ( with the double SizeU) is with user/pass credentials. Both extended filetypes are NC_FORMAT_DAP2, mode = 0 (whatever this means).

    I don't need to say that compressing the data before can be a huge improvement in speed ( some fields can be compressed really hard to near to 5x-6x to the expected value if there is repeated missing points). I found this when the network managers start to complain about data usage and send me a report saying that our usage was 4x the reported ones, just on the particular server that creates "double" the data...

    Any clues to save some bandwidth?

     
    • When you use ncks to acquire data remotely using OPeNDAP, the data are transferred using DAP format and converted to netCDF by ncks on the client machine. This is why the transfer size is uncompressed. If data type conversions are done as part of the transfer, this may increase the size. For instance, some servers may convert packed short integers to unpacked floats by applying scale_factor and add_offset attributes.

      Some OPeNDAP server implementations will allow you to download a netCDF file from the OPeNDAP server using a regular web browser. This is done by converting to netCDF on the server side and then transferring the data as a simple netCDF file transfer. Newer versions of the Hyrax OPeNDAP server support netCDF-4 conversion but this has not been deployed very many places yet. (I'm not 100% sure it supports deflation in the conversion to netcdf4.

      Some other OPeNDAP servers run in an HTTP server that is configured on the fly gzip compression at the file level, which will produce compressed data transfers.

      The bottom line is that you are at the mercy of the organization who is running a particular OPeNDAP server if you want the data transfered in compressed form. All of the possibilities above require some work on their part, reprioritization and possible compromises, so the best thing to do may be to communicate with them to tell them about the benefits their user community would get by offering the ability to get netCDF data extracted form the OPeNDAP servers that are compressed before transfer.

      On May 13, 2014, at 2:43 AM, ocehugo ocehugo@users.sf.net<mailto:ocehugo@users.sf.net> wrote:

      Hello,

      I would like to report some strange behaviour of NCO/Opendap/Threadds/Netcdf api. Since I don't know where the problem is, i will post here instead of talking directly to unidata since I'm using nco to download some datasets.

      Problem: Requesting compressed data do not transfer compressed data, instead it transfer the stream in a uncompressed way.

      Example: Requesting a opendap request with ncks -4 -L1 -v var1 url file.nc returns the file.nc in a netcdf-4 format and compressed with the simpler deflate level.

      So:

      file.nc has a size in MB that is sizeC.

      and:

      ncks -3 file.nc file_n3.nc returns a file with sizeU.

      SizeU is almost 4x SizeC. All right, compression is really good. The problem lies with the bandwidth. Monitoring the transfer the total transfered between local machine and opendap server is SizeU not SizeC.
      Actually with different servers the difference can be 2 times SizeU ( Another problem see below).

      So a n4 file.nc with 10mb compressed have a equivalent file in n3 format, file_n3.nc, having a ~40mb size uncompressed, but the transfer stream was 40mb.

      So, it looks like NCO cannot request a compressed stream from the opendap server to save bandwidth. I've already change the .dodsrc file to apply a DEFLATE flag but turning it on/off doesn't seems to change anything at all ( I tried both on $HOME and on current folder of ncks command call).

      Problem2 : Requesting compressed streams in some opendap servers can lead to transfers 2 times larger than the uncompressed data.

      Example: I have another server that I requested a variable with ncks -4 -L1 -v var2 url2 file2.nc. Everything works out of the box, but this time, the total transfer is 2*SizeU.

      So a n4 file.nc with 10mb compressed results in a 40mb n3 file, but teh transfer was 80mb!

      This looks like a behaviour in the opendap side and was actually my first problem. In the tests between different servers i realize that in some servers the transfer are done with the uncompressed data and others with double the uncompressed data. In any case there is no way to request transfers with compressed data.

      The same behaviour (double SizeU) was found using "ncks -3" opendap requests (no compression). Since I'm only a user, can't say the difference between the opendap/threadds servers that cause this. Only difference is that one is accessed without credentials and the second one ( with the double SizeU) is with user/pass credentials. Both extended filetypes are NC_FORMAT_DAP2, mode = 0 (whatever this means).

      I don't need to say that compressing the data before can be a huge improvement in speed ( some fields can be compressed really hard to near to 5x-6x to the expected value if there is repeated missing points). I found this when the network managers start to complain about data usage and send me a report saying that our usage was 4x the reported ones, just on the particular server that creates "double" the data...

      Any clues to save some bandwidth?


      Nco with Opendap/Deflate and bandwidth usagehttps://sourceforge.net/p/nco/discussion/9829/thread/c3ef4f7a/?limit=25#ff74


      Sent from sourceforge.nethttp://sourceforge.net because you indicated interest in https://sourceforge.net/p/nco/discussion/9829/

      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

      --
      Dr. Christopher Lynnes NASA/GSFC, Code 610.2 phone: 301-614-5185
      "The future is already here--it's just not very evenly distributed." Wm. Gibson

       
      Attachments
  • Charlie Zender
    Charlie Zender
    2014-05-13

    Thanks for explaining the role of the server in this, Chris.
    If anyone learns of client-side ways in which NCO could help optimize DAP transfers, please let me know. This includes adding recommendations for
    .dodsrc files to the NCO manual. Right now, NCO does nothing special to optimize DAP transfers. FWIW, if someone has a good use case (i.e., where it might speed-up real-world work), then we might try to support client-side caching as described here:
    https://www.unidata.ucar.edu/software/netcdf/docs/netcdf/DAP-Support.html#DAP-Support

     
  • ocehugo
    ocehugo
    2014-05-15

    Hello Christopher, thanks for all the info. INdeed when thinking about the offset and scale factor the 2*SizeU pattern in the stream make sense. I was expecting that this would be a opendap server side issue, but at least now we I have something to share with the guys that run some of the servers that I use.

    Zender, maybe for coordinates the caching feature should worth it, I've been using -C to avoid downloading coordinates in every request (which save some bytes if you are downloading a 5000 record variables one by one)...with caching the -C will be dropped at the cost of increasing the size of individual local files.