From: Jeff W. <js...@fa...> - 2004-11-02 22:56:17
|
Francesc: There was no good reason for using Array instead of EArray for rank-1 variables, other than I wasn't sure EArrays were appropriate for variables that were not going to be appended to. Thanks for extending the script! I noticed one typo (StringAtom was used instead of tables.StringAtom). Also, it doesn't yet work on files with netcdf character variables. I get errors like this: Traceback (most recent call last): File "nctoh5_new.py", line 175, in ? overwritefile) File "nctoh5_new.py", line 61, in nctoh5 vardata.append(var[n:n+1]) File "/sw/lib/python2.3/site-packages/tables/EArray.py", line 196, in append naarr = convertIntoNA(object, self.atom) File "/sw/lib/python2.3/site-packages/tables/utils.py", line 208, in convertIntoNA shape=arr.shape) File "/sw/src/root-numarray-py23-1.0-1/sw/lib/python2.3/site-packages/numarray/numarraycore.py", line 318, in array File "/sw/src/root-numarray-py23-1.0-1/sw/lib/python2.3/site-packages/numarray/numarraycore.py", line 256, in getTypeObject File "/sw/src/root-numarray-py23-1.0-1/sw/lib/python2.3/site-packages/numarray/numarraycore.py", line 245, in _typeFromTypeAndTypecode File "/sw/src/root-numarray-py23-1.0-1/sw/lib/python2.3/site-packages/numarray/numerictypes.py", line 457, in getType TypeError: Not a numeric type If I replace vardata.append(var[n:n+1]) with if dtype == 'c': chararr = numarray.strings.array(var[n].tolist()) newshape = list(chararr.shape) newshape.insert(0,1) chararr.setshape(tuple(newshape)) vardata.append(chararr) else: vardata.append(var[n:n+1]) it seems to work. Note that I have to reshape the chararray to have an extra singleton dimension or pytables complains that the data being appended has the wrong shape. This is also the reason I had to use var[n:n+1] instead of var[n] in the append. Is there a better way to do this? -Jeff -- Jeffrey S. Whitaker Phone : (303)497-6313 Meteorologist FAX : (303)497-6449 NOAA/OAR/CDC R/CDC1 Email : Jef...@no... 325 Broadway Web : www.cdc.noaa.gov/~jsw Boulder, CO, USA 80303-3328 Office : Skaggs Research Cntr 1D-124 |
From: Francesc A. <fa...@py...> - 2004-11-03 18:08:15
|
A Dimarts 02 Novembre 2004 23:56, Jeff Whitaker va escriure: > Francesc: There was no good reason for using Array instead of EArray > for rank-1 variables, other than I wasn't sure EArrays were appropriate > for variables that were not going to be appended to. Yes, they are. The only reason for using an Array instead of an EArray is a matter of simplicity. For me, creating Array objects from the interactive console is far easier than EArray, but after the creation, both objects works very similar. However, EArray do support filters (apart of being extensible), so, for programs, and if compression is desirable, I do recommend using EArray objects for everything, even when you don't want to enlarge them. > If I replace > > vardata.append(var[n:n+1]) > > with > > if dtype == 'c': > chararr = numarray.strings.array(var[n].tolist()) > newshape = list(chararr.shape) > newshape.insert(0,1) > chararr.setshape(tuple(newshape)) > vardata.append(chararr) > else: > vardata.append(var[n:n+1]) > > > it seems to work. Note that I have to reshape the chararray to have an > extra singleton dimension or pytables complains that the data being > appended has the wrong shape. This is also the reason I had to use > var[n:n+1] instead of var[n] in the append. Is there a better way to do > this? Well, this is a subtle problem, as append(array) expects array to be of the same shape as the atomic shape. When the array has a dimension less, I guess it would be safe to suppose that what the user wants is to add a single row in the extensible dimension. Frankly, I don't know if implementing this kind of behaviour would help the user to understand how append() works. By the way, I've solved in PyTables the problem when the object to append is a Numeric object with Char type ('c' typecode). I'm attaching my new version for your inspection (beware, this will run only with PyTables 0.9!). You surely have noted that my code may convert NetCDF files with enlargeable dimensions other that the first one. I don't know whether this is suported in NetCDF or not. I only know that Scientific Python does not seem to support that. Finally, the new (attached) version does the copy in buckets of records instead of just one single record at a time. That improves the conversion speed considerably for large variables. Original code: $ ./nctoh5.bck -vo /tmp/test.nc /tmp/test-3-old.h5 +=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=+ Starting conversion from /tmp/test.nc to /tmp/test-3-old.h5 Applying filters: None +=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=+ Number of variables copied: 2 KBytes copied: 7812.969 Time copying: 38.142 s (real) 36.88 s (cpu) 97% Copied variable/sec: 0.1 Copied KB/s : 204 Bucked conversion code: $ ./nctoh5 -vo /tmp/test.nc /tmp/test-3.h5 +=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=+ Starting conversion from /tmp/test.nc to /tmp/test-3.h5 Applying filters: None +=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=+ Number of variables copied: 2 KBytes copied: 7812.969 Time copying: 5.55 s (real) 5.42 s (cpu) 98% Copied variable/sec: 0.4 Copied KB/s : 1407 Maybe this is not important in many situations, but well, I think this is not going to hurt neither <wink> Cheers, -- Francesc Alted ---------------------------------------------------------- #!/usr/bin/env python """ convert netCDF file to HDF5 using Scientific.IO.NetCDF and PyTables. Jeff Whitaker <jef...@no...> Added some flags to select filters, as well as some small improvements. Francesc Altet <fa...@ca...> This requires Scientific from http://starship.python.net/~hinsen/ScientificPython """ import Scientific.IO.NetCDF as NetCDF import tables, sys, os.path, getopt, time def nctoh5(ncfilename, h5filename, filters, overwritefile): # open netCDF file ncfile = NetCDF.NetCDFFile(ncfilename, mode = "r") # open h5 file if overwritefile: h5file = tables.openFile(h5filename, mode = "w") else: h5file = tables.openFile(h5filename, mode = "a") # loop over variables in netCDF file. nobjects = 0; nbytes = 0 # Initialize counters for varname in ncfile.variables.keys(): var = ncfile.variables[varname] vardims = list(var.dimensions) vardimsizes = [ncfile.dimensions[vardim] for vardim in vardims] # Check if any dimension is enlargeable extdim = -1; ndim = 0 for vardim in vardimsizes: if vardim == None: extdim = ndim break ndim += 1 # use long_name for title. if hasattr(var,'long_name'): title = var.long_name else: # or, just use some bogus title. title = varname + ' array' # Create an EArray to keep the NetCDF variable if extdim < 0: # Make 0 the enlargeable dimension extdim = 0 vardimsizes[extdim] = 0 dtype=var.typecode() if dtype == 'c': # Special case for Numeric character objects # (on which base Scientific Python works) atom = tables.StringAtom(shape=tuple(vardimsizes), length=1) else: atom = tables.Atom(dtype=var.typecode(), shape=tuple(vardimsizes)) vardata = h5file.createEArray(h5file.root, varname, atom, title, filters=filters, expectedrows=vardimsizes[extdim]) # write data to enlargeable array one chunk of records at a time. # (so the whole array doesn't have to be kept in memory). nrowsinbuf = vardata._v_maxTuples # The slices parameter for var.__getitem__() slices = [slice(0, dim, 1) for dim in var.shape] # range to copy start = 0; stop = var.shape[extdim]; step = 1 # Start the copy itself for start2 in range(start, stop, step*nrowsinbuf): # Save the records on disk stop2 = start2+step*nrowsinbuf if stop2 > stop: stop2 = stop # Set the proper slice in the extensible dimension slices[extdim] = slice(start2, stop2, step) vardata.append(var[tuple(slices)]) # Increment the counters nobjects += 1 nbytes += reduce(lambda x,y:x*y, vardata.shape) * vardata.itemsize # set variable attributes. for key,val in var.__dict__.iteritems(): setattr(vardata.attrs,key,val) setattr(vardata.attrs,'dimensions',tuple(vardims)) # set global (file) attributes. for key,val in ncfile.__dict__.iteritems(): setattr(h5file.root._v_attrs,key,val) # Close the file h5file.close() return (nobjects, nbytes) usage = """usage: %s [-h] [-v] [-o] [--complevel=(0-9)] [--complib=lib] [--shuffle=(0|1)] [--fletcher32=(0|1)] netcdffilename hdf5filename -h -- Print usage message. -v -- Show more information. -o -- Overwite destination file. --complevel=(0-9) -- Set a compression level (0 for no compression, which is the default). --complib=lib -- Set the compression library to be used during the copy. lib can be set to "zlib", "lzo" or "ucl". Defaults to "zlib". --shuffle=(0|1) -- Activate or not the shuffling filter (default is active if complevel>0). --fletcher32=(0|1) -- Whether to activate or not the fletcher32 filter (not active by default). \n""" % os.path.basename(sys.argv[0]) try: opts, pargs = getopt.getopt(sys.argv[1:], 'hvo', ['complevel=', 'complib=', 'shuffle=', 'fletcher32=', ]) except: (type, value, traceback) = sys.exc_info() print "Error parsing the options. The error was:", value sys.stderr.write(usage) sys.exit(0) # default options verbose = 0 overwritefile = 0 complevel = None complib = None shuffle = None fletcher32 = None # Get the options for option in opts: if option[0] == '-h': sys.stderr.write(usage) sys.exit(0) elif option[0] == '-v': verbose = 1 elif option[0] == '-o': overwritefile = 1 elif option[0] == '--complevel': complevel = int(option[1]) elif option[0] == '--complib': complib = option[1] elif option[0] == '--shuffle': shuffle = int(option[1]) elif option[0] == '--fletcher32': fletcher32 = int(option[1]) else: print option[0], ": Unrecognized option" sys.stderr.write(usage) sys.exit(0) # if we pass a number of files different from 2, abort if len(pargs) <> 2: print "You need to pass both source and destination!." sys.stderr.write(usage) sys.exit(0) # Catch the files passed as the last arguments ncfilename = pargs[0] h5filename = pargs[1] # Build the Filters instance if (complevel, complib, shuffle, fletcher32) == (None,)*4: filters = None else: if complevel is None: complevel = 0 if complevel > 0 and shuffle is None: shuffle = 1 else: shuffle = 0 if complib is None: complib = "zlib" if fletcher32 is None: fletcher32 = 0 filters = tables.Filters(complevel=complevel, complib=complib, shuffle=shuffle, fletcher32=fletcher32) # Some timing t1 = time.time() cpu1 = time.clock() # Copy the file if verbose: print "+=+"*20 print "Starting conversion from %s to %s" % (ncfilename, h5filename) print "Applying filters:", filters print "+=+"*20 # Do the conversion (nobjects, nbytes) = nctoh5(ncfilename, h5filename, filters, overwritefile) # Gather some statistics t2 = time.time() cpu2 = time.clock() tcopy = round(t2-t1, 3) cpucopy = round(cpu2-cpu1, 3) tpercent = int(round(cpucopy/tcopy, 2)*100) if verbose: print "Number of variables copied:", nobjects print "KBytes copied:", round(nbytes/1024.,3) print "Time copying: %s s (real) %s s (cpu) %s%%" % \ (tcopy, cpucopy, tpercent) print "Copied variable/sec: ", round(nobjects / float(tcopy),1) print "Copied KB/s :", int(nbytes / (tcopy * 1024)) |
From: Jeff W. <js...@fa...> - 2004-11-03 18:14:07
|
Francesc Alted wrote: >A Dimarts 02 Novembre 2004 23:56, Jeff Whitaker va escriure: > > >>Francesc: There was no good reason for using Array instead of EArray >>for rank-1 variables, other than I wasn't sure EArrays were appropriate >>for variables that were not going to be appended to. >> >> > >Yes, they are. The only reason for using an Array instead of an EArray is a >matter of simplicity. For me, creating Array objects from the interactive >console is far easier than EArray, but after the creation, both objects >works very similar. However, EArray do support filters (apart of being >extensible), so, for programs, and if compression is desirable, I do >recommend using EArray objects for everything, even when you don't want to >enlarge them. > > > Francesc: OK, thanks for the explanation. >>If I replace >> >> vardata.append(var[n:n+1]) >> >>with >> >> if dtype == 'c': >> chararr = numarray.strings.array(var[n].tolist()) >> newshape = list(chararr.shape) >> newshape.insert(0,1) >> chararr.setshape(tuple(newshape)) >> vardata.append(chararr) >> else: >> vardata.append(var[n:n+1]) >> >> >>it seems to work. Note that I have to reshape the chararray to have an >>extra singleton dimension or pytables complains that the data being >>appended has the wrong shape. This is also the reason I had to use >>var[n:n+1] instead of var[n] in the append. Is there a better way to do >>this? >> >> > >Well, this is a subtle problem, as append(array) expects array to be of the >same shape as the atomic shape. When the array has a dimension less, I guess >it would be safe to suppose that what the user wants is to add a single row >in the extensible dimension. Frankly, I don't know if implementing this kind >of behaviour would help the user to understand how append() works. > > If updating the EArray via slice assignment were supported in the future this would be a non-issue, since I guess one could just do vardata[n] = var[n] >By the way, I've solved in PyTables the problem when the object to append is >a Numeric object with Char type ('c' typecode). I'm attaching my new version >for your inspection (beware, this will run only with PyTables 0.9!). > >You surely have noted that my code may convert NetCDF files with enlargeable >dimensions other that the first one. I don't know whether this is suported >in NetCDF or not. I only know that Scientific Python does not seem to >support that. > > > Excellent! Thanks for taking an interest in my little script. You've taught me a lot about PyTables along the way. BTW: netCDF only allows the first dimension to be unlimited. Interestingly, the upcoming netCDF 4 will be a wrapper on top of the HDF5 API, and will support compression and multiple unlimited dimensions. -Jeff -- Jeffrey S. Whitaker Phone : (303)497-6313 Meteorologist FAX : (303)497-6449 NOAA/OAR/CDC R/CDC1 Email : Jef...@no... 325 Broadway Web : www.cdc.noaa.gov/~jsw Boulder, CO, USA 80303-3328 Office : Skaggs Research Cntr 1D-124 |
From: Francesc A. <fa...@py...> - 2004-11-03 18:40:53
|
A Dimecres 03 Novembre 2004 19:13, Jeff Whitaker va escriure: > > If updating the EArray via slice assignment were supported in the future > this would be a non-issue, since I guess one could just do > > vardata[n] = var[n] Yeah, I think so because I've just implemented slice assignments for Array, EArray and VLArray objects in upcoming pytables 0.9. However, bear in mind that the new release will only let you do an assignment to already *existing* elements in the object. So, I'm afraid that you will need to continue using append() for this particular case. > BTW: netCDF only allows the first dimension to be unlimited. > Interestingly, the upcoming netCDF 4 will be a wrapper on top of the > HDF5 API, and will support compression and multiple unlimited dimensions. Ok then. I'll let that support for upcoming netCDF 4 (but for just only one unlimited dimension, as this is all what PyTables supports). Cheers, -- Francesc Alted |