You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(5) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
|
Feb
(2) |
Mar
|
Apr
(5) |
May
(11) |
Jun
(7) |
Jul
(18) |
Aug
(5) |
Sep
(15) |
Oct
(4) |
Nov
(1) |
Dec
(4) |
2004 |
Jan
(5) |
Feb
(2) |
Mar
(5) |
Apr
(8) |
May
(8) |
Jun
(10) |
Jul
(4) |
Aug
(4) |
Sep
(20) |
Oct
(11) |
Nov
(31) |
Dec
(41) |
2005 |
Jan
(79) |
Feb
(22) |
Mar
(14) |
Apr
(17) |
May
(35) |
Jun
(24) |
Jul
(26) |
Aug
(9) |
Sep
(57) |
Oct
(64) |
Nov
(25) |
Dec
(37) |
2006 |
Jan
(76) |
Feb
(24) |
Mar
(79) |
Apr
(44) |
May
(33) |
Jun
(12) |
Jul
(15) |
Aug
(40) |
Sep
(17) |
Oct
(21) |
Nov
(46) |
Dec
(23) |
2007 |
Jan
(18) |
Feb
(25) |
Mar
(41) |
Apr
(66) |
May
(18) |
Jun
(29) |
Jul
(40) |
Aug
(32) |
Sep
(34) |
Oct
(17) |
Nov
(46) |
Dec
(17) |
2008 |
Jan
(17) |
Feb
(42) |
Mar
(23) |
Apr
(11) |
May
(65) |
Jun
(28) |
Jul
(28) |
Aug
(16) |
Sep
(24) |
Oct
(33) |
Nov
(16) |
Dec
(5) |
2009 |
Jan
(19) |
Feb
(25) |
Mar
(11) |
Apr
(32) |
May
(62) |
Jun
(28) |
Jul
(61) |
Aug
(20) |
Sep
(61) |
Oct
(11) |
Nov
(14) |
Dec
(53) |
2010 |
Jan
(17) |
Feb
(31) |
Mar
(39) |
Apr
(43) |
May
(49) |
Jun
(47) |
Jul
(35) |
Aug
(58) |
Sep
(55) |
Oct
(91) |
Nov
(77) |
Dec
(63) |
2011 |
Jan
(50) |
Feb
(30) |
Mar
(67) |
Apr
(31) |
May
(17) |
Jun
(83) |
Jul
(17) |
Aug
(33) |
Sep
(35) |
Oct
(19) |
Nov
(29) |
Dec
(26) |
2012 |
Jan
(53) |
Feb
(22) |
Mar
(118) |
Apr
(45) |
May
(28) |
Jun
(71) |
Jul
(87) |
Aug
(55) |
Sep
(30) |
Oct
(73) |
Nov
(41) |
Dec
(28) |
2013 |
Jan
(19) |
Feb
(30) |
Mar
(14) |
Apr
(63) |
May
(20) |
Jun
(59) |
Jul
(40) |
Aug
(33) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
From: Francesc A. <fa...@py...> - 2004-11-10 19:24:10
|
Hi Kevin, A Dimecres 10 Novembre 2004 18:33, kevin lester va escriure: > The INSTALL_Windows mentions that VS C++ 6.0 or Intel > C Compiler be used for this process. I am trying to > avoid anymore MS products, so I would like to use the > open source 'Dev-C++' (or similiar) software to build > HDF5 (do to the Patch problems with the pre-built > binaries). > > My questions are is this possible and can you give me > a few pointers as to how I do this? I very much have a > need for pyTables and at this point, I am kind of > stuck. The HDF5 1.6.3 patched windows binary is available at: ftp://ftp.ncsa.uiuc.edu/HDF/HDF5/current/patches/bin/WinXP-patch.tar.gz Cheers, -- Francesc Altet |
From: kevin l. <lke...@ya...> - 2004-11-10 17:33:49
|
Being more of a financial analyst and only a studying programmer, I am having difficulty.. having never compiled source code before. The INSTALL_Windows mentions that VS C++ 6.0 or Intel C Compiler be used for this process. I am trying to avoid anymore MS products, so I would like to use the open source 'Dev-C++' (or similiar) software to build HDF5 (do to the Patch problems with the pre-built binaries). My questions are is this possible and can you give me a few pointers as to how I do this? I very much have a need for pyTables and at this point, I am kind of stuck. Thank you for any help you might provide me. I currently have Windows XP and an Intel proccessor. Kevin __________________________________ Do you Yahoo!? Check out the new Yahoo! Front Page. www.yahoo.com |
From: Norbert N. <Nor...@gm...> - 2004-11-10 14:55:16
|
Hi there, find enclosed two tiny patches: * the first to throw an AttributeError on __getattr__ for nonexistant attributes in AttributeSet. Formerly, the routine returned "None", which is pretty much against convention in Python and breaks the builtin "hasattr" routine * the second to correct the behavior for shape=() arrays. These worked fine before, except for copying of trees. Ciao, Nobbi PS: In case this list is not the appropriate place for tiny patch submissions, please point me to the correct address. -- _________________________________________Norbert Nemec Bernhardstr. 2 ... D-93053 Regensburg Tel: 0941 - 2009638 ... Mobil: 0179 - 7475199 eMail: <No...@Ne...> |
From: Jeff W. <jef...@no...> - 2004-11-10 14:50:14
|
Francesc Altet wrote: > >>BTW: I've added a couple more command line options to nctoh5 - the new >>version is at http://whitaker.homeunix.org/~jeff/nctoh5. The new >>switches are: >> >> --unpackshort=(0|1) -- unpack short integer variables to float variables >> using scale_factor and add_offset netCDF variable attributes >> (not active by default). >> --quantize=(0|1) -- quantize data to improve compression using >> least_significant_digit netCDF variable attribute (not active by >>default). >> See http://www.cdc.noaa.gov/cdc/conventions/cdc_netcdf_standard.shtml >> for further explanation of what this attribute means. >> >> > >Great!, however I can't get the file. Is the above URL correct? > >Cheers, > > > Sorry - it's there now. -Jeff -- Jeffrey S. Whitaker Phone : (303)497-6313 NOAA/OAR/CDC R/CDC1 FAX : (303)497-6449 325 Broadway Web : http://www.cdc.noaa.gov/~jsw Boulder, CO, USA 80305-3328 Office: Skaggs Research Cntr 1D-124 |
From: Francesc A. <fa...@py...> - 2004-11-10 14:05:34
|
A Dimecres 10 Novembre 2004 14:23, Jeff Whitaker va escriure: > Francesc: That new version of _calcbuffersize fixed the problem - > compression ratios with the 'medium' value of bufmultfactor are back to > what they were in 0.8.1. Changing to any of the other values has very > little effect on file size. Thanks for your prompt attention to this > problem! Good. The patch has been uploaded to the Release-0.9_patches branch in CVS. It will hopefully appear in next 0.9.1 release. > BTW: I've added a couple more command line options to nctoh5 - the new > version is at http://whitaker.homeunix.org/~jeff/nctoh5. The new > switches are: > > --unpackshort=(0|1) -- unpack short integer variables to float variables > using scale_factor and add_offset netCDF variable attributes > (not active by default). > --quantize=(0|1) -- quantize data to improve compression using > least_significant_digit netCDF variable attribute (not active by > default). > See http://www.cdc.noaa.gov/cdc/conventions/cdc_netcdf_standard.shtml > for further explanation of what this attribute means. Great!, however I can't get the file. Is the above URL correct? Cheers, -- Francesc Altet |
From: Jeff W. <jef...@no...> - 2004-11-10 13:23:55
|
Francesc Altet wrote: >I've ended with a new rewrite of EArray._calcBufferSize method, that I'm >including at the end of this message. Please, play with the different values >in lines: > > #bufmultfactor = int(1000 * 10) # Conservative value > bufmultfactor = int(1000 * 20) # Medium value > #bufmultfactor = int(1000 * 50) # Aggresive value > #bufmultfactor = int(1000 * 100) # Very Aggresive value > >and tell me your feedback. > > > Francesc: That new version of _calcbuffersize fixed the problem - compression ratios with the 'medium' value of bufmultfactor are back to what they were in 0.8.1. Changing to any of the other values has very little effect on file size. Thanks for your prompt attention to this problem! BTW: I've added a couple more command line options to nctoh5 - the new version is at http://whitaker.homeunix.org/~jeff/nctoh5. The new switches are: --unpackshort=(0|1) -- unpack short integer variables to float variables using scale_factor and add_offset netCDF variable attributes (not active by default). --quantize=(0|1) -- quantize data to improve compression using least_significant_digit netCDF variable attribute (not active by default). See http://www.cdc.noaa.gov/cdc/conventions/cdc_netcdf_standard.shtml for further explanation of what this attribute means. -Jeff -- Jeffrey S. Whitaker Phone : (303)497-6313 NOAA/OAR/CDC R/CDC1 FAX : (303)497-6449 325 Broadway Web : http://www.cdc.noaa.gov/~jsw Boulder, CO, USA 80305-3328 Office: Skaggs Research Cntr 1D-124 |
From: Francesc A. <fa...@py...> - 2004-11-10 10:50:10
|
I've ended with a new rewrite of EArray._calcBufferSize method, that I'm including at the end of this message. Please, play with the different values in lines: #bufmultfactor = int(1000 * 10) # Conservative value bufmultfactor = int(1000 * 20) # Medium value #bufmultfactor = int(1000 * 50) # Aggresive value #bufmultfactor = int(1000 * 100) # Very Aggresive value and tell me your feedback. -- Francesc Altet def _calcBufferSize(self, atom, extdim, expectedrows, compress): """Calculate the buffer size and the HDF5 chunk size. The logic to do that is based purely in experiments playing with different buffer sizes, chunksize and compression flag. It is obvious that using big buffers optimize the I/O speed. This might (should) be further optimized doing more experiments. """ rowsize = atom.atomsize() #bufmultfactor = int(1000 * 10) # Conservative value bufmultfactor = int(1000 * 20) # Medium value #bufmultfactor = int(1000 * 50) # Aggresive value #bufmultfactor = int(1000 * 100) # Very Aggresive value rowsizeinfile = rowsize expectedfsizeinKb = (expectedrows * rowsizeinfile) / 1024 if expectedfsizeinKb <= 100: # Values for files less than 100 KB of size buffersize = 5 * bufmultfactor elif (expectedfsizeinKb > 100 and expectedfsizeinKb <= 1000): # Values for files less than 1 MB of size buffersize = 10 * bufmultfactor elif (expectedfsizeinKb > 1000 and expectedfsizeinKb <= 20 * 1000): # Values for sizes between 1 MB and 20 MB buffersize = 20 * bufmultfactor elif (expectedfsizeinKb > 20 * 1000 and expectedfsizeinKb <= 200 * 1000): # Values for sizes between 20 MB and 200 MB buffersize = 40 * bufmultfactor elif (expectedfsizeinKb > 200 * 1000 and expectedfsizeinKb <= 2000 * 1000): # Values for sizes between 200 MB and 2 GB buffersize = 50 * bufmultfactor else: # Greater than 2 GB buffersize = 60 * bufmultfactor # Max Tuples to fill the buffer maxTuples = buffersize // rowsize chunksizes = list(atom.shape) # Check if at least 1 tuple fits in buffer if maxTuples > 1: # Yes. So the chunk sizes for the non-extendeable dims will be # unchanged chunksizes[extdim] = maxTuples else: # No. reduce other dimensions until we get a proper chunksizes # shape chunksizes[extdim] = 1 # Only one row in extendeable dimension for j in range(len(chunksizes)): newrowsize = atom.itemsize for i in chunksizes[j+1:]: newrowsize *= i maxTuples = buffersize // newrowsize if maxTuples > 1: break chunksizes[j] = 1 # Compute the chunksizes correctly for this j index chunksize = maxTuples if j < len(chunksizes): # Only modify chunksizes[j] if needed if chunksize < chunksizes[j]: chunksizes[j] = chunksize else: chunksizes[-1] = 1 # very large itemsizes! # Compute the correct maxTuples number newrowsize = atom.itemsize for i in chunksizes: newrowsize *= i maxTuples = buffersize // newrowsize return (buffersize, maxTuples, chunksizes) |
From: Francesc A. <fa...@py...> - 2004-11-10 10:16:21
|
Hi again, I've been looking deep into the problem, and it seems like I have a solution. The problem was that I had a mistake when I was implementing indexation, and the parameters for EArray chunk size computation remains for my early tests for optimizing chunksizes just for indexes. After that, I've moved the computation for optimum index chunksizes out of EArray module, but I forgot to restablish the correct values for general EArrays :-/ Check with the next patch (against original 0.9 sources): --- /home/falted/PyTables/exports/pytables-0.9/tables/EArray.py 2004-10-05 14:30:31.000000000 +0200 +++ EArray.py 2004-11-10 11:08:22.000000000 +0100 @@ -224,7 +224,7 @@ #bufmultfactor = int(1000 * 2) # Is a good choice too, # specially for very large tables and large available memory #bufmultfactor = int(1000 * 1) # Optimum for sorted object - bufmultfactor = int(1000 * 1) # Optimum for sorted object + bufmultfactor = int(1000 * 100) # Optimum for sorted object rowsizeinfile = rowsize expectedfsizeinKb = (expectedrows * rowsizeinfile) / 1024 That should get the 0.8.1 compression ratios back. You can play with increasing the bufmultfactor still more, and you will get better ratios, but I'm afraid that this will make the access to small portions of the EArray slower (much more data should be read compared with the desired range). Please, tell me about your findings and I'll fix that in CVS afterwards. Cheers, -- Francesc Altet |
From: Francesc A. <fa...@py...> - 2004-11-09 21:20:54
|
A Dimarts 09 Novembre 2004 22:10, Jeffrey S Whitaker va escriure: > Francesc: That helped a little bit. Now I get > > [mac28:~/python] jsw% ls -l test.nc test.h5 > -rw-r--r-- 1 jsw jsw 9281104 9 Nov 14:04 test.h5 > -rw-r--r-- 1 jsw jsw 26355656 4 Nov 17:10 test.nc > > Still a long way from the 0.8.1 result of 5344279 though. Ok, but we are in the correct way. Let me study a bit more carefully the problem, and I'll get back with an answer. Cheers, -- Francesc Altet |
From: Jeffrey S W. <Jef...@no...> - 2004-11-09 21:10:21
|
Francesc Altet wrote: > Hi Jeff, > > Yep, it seems that some rework on buffer sizes calculation in 0.9 has made > the chunk sizes for compression much smaller, and hence the compression > ratio. Please, try to apply the next patch and tell me if that works better: > > --- pytables-0.9/tables/EArray.py 2004-10-05 14:30:31.000000000 +0200 > +++ EArray.py 2004-11-09 21:51:11.000000000 +0100 > @@ -254,7 +254,7 @@ > if maxTuples > 10: > # Yes. So the chunk sizes for the non-extendeable dims will be > # unchanged > - chunksizes[extdim] = maxTuples // 10 > + chunksizes[extdim] = maxTuples > else: > # No. reduce other dimensions until we get a proper chunksizes > # shape > @@ -268,7 +268,7 @@ > break > chunksizes[j] = 1 > # Compute the chunksizes correctly for this j index > - chunksize = maxTuples // 10 > + chunksize = maxTuples > if j < len(chunksizes): > # Only modify chunksizes[j] if needed > if chunksize < chunksizes[j]: > > > If works better, I'll have to double check that indexation performance won't > suffer because of this change. To say the truth, I don't quite remember why > I've reduced the chunksizes by a factor of 10, although I want to believe > that there was a good reason :-/ > > Cheers, > > A Dimarts 09 Novembre 2004 17:02, Jeffrey S Whitaker va escriure: > >>Hi: >> >>I just noticed that compression doesn't seem to be working right (for me >>at least) in 0.9. Here's an example: >> >>with pytables 0.9 >> >>[mac28:~/python] jsw% nctoh5 --complevel=6 -o test.nc test.h5 >> >>[mac28:~/python] jsw% ls -l test.nc test.h5 >>-rw-r--r-- 1 jsw jsw 12089048 9 Nov 08:59 test.h5 >>-rw-r--r-- 1 jsw jsw 26355656 4 Nov 17:10 test.nc >> >>with pytables 0.8.1 >> >>[mac28:~/python] jsw% ls -l test.nc test.h5 >>-rw-r--r-- 1 jsw jsw 5344279 9 Nov 09:00 test.h5 >>-rw-r--r-- 1 jsw jsw 26355656 4 Nov 17:10 test.nc >> >>No matter what netcdf file I use as input, the resulting h5 file is >>about twice as large using 0.9 as it is in 0.8.1. >> >>BTW: the test.nc file I used here can be found at >>ftp://ftp.cdc.noaa.gov/Public/jsw. >> >> >>-Jeff >> > > Francesc: That helped a little bit. Now I get [mac28:~/python] jsw% ls -l test.nc test.h5 -rw-r--r-- 1 jsw jsw 9281104 9 Nov 14:04 test.h5 -rw-r--r-- 1 jsw jsw 26355656 4 Nov 17:10 test.nc Still a long way from the 0.8.1 result of 5344279 though. -Jeff -- Jeffrey S. Whitaker Phone : (303)497-6313 Meteorologist FAX : (303)497-6449 NOAA/OAR/CDC R/CDC1 Email : Jef...@no... 325 Broadway Web : www.cdc.noaa.gov/~jsw Boulder, CO, USA 80303-3328 Office : Skaggs Research Cntr 1D-124 |
From: Francesc A. <fa...@py...> - 2004-11-09 20:57:58
|
Hi Jeff, Yep, it seems that some rework on buffer sizes calculation in 0.9 has made the chunk sizes for compression much smaller, and hence the compression ratio. Please, try to apply the next patch and tell me if that works better: --- pytables-0.9/tables/EArray.py 2004-10-05 14:30:31.000000000 +0200 +++ EArray.py 2004-11-09 21:51:11.000000000 +0100 @@ -254,7 +254,7 @@ if maxTuples > 10: # Yes. So the chunk sizes for the non-extendeable dims will be # unchanged - chunksizes[extdim] = maxTuples // 10 + chunksizes[extdim] = maxTuples else: # No. reduce other dimensions until we get a proper chunksizes # shape @@ -268,7 +268,7 @@ break chunksizes[j] = 1 # Compute the chunksizes correctly for this j index - chunksize = maxTuples // 10 + chunksize = maxTuples if j < len(chunksizes): # Only modify chunksizes[j] if needed if chunksize < chunksizes[j]: If works better, I'll have to double check that indexation performance won't suffer because of this change. To say the truth, I don't quite remember why I've reduced the chunksizes by a factor of 10, although I want to believe that there was a good reason :-/ Cheers, A Dimarts 09 Novembre 2004 17:02, Jeffrey S Whitaker va escriure: > Hi: > > I just noticed that compression doesn't seem to be working right (for me > at least) in 0.9. Here's an example: > > with pytables 0.9 > > [mac28:~/python] jsw% nctoh5 --complevel=6 -o test.nc test.h5 > > [mac28:~/python] jsw% ls -l test.nc test.h5 > -rw-r--r-- 1 jsw jsw 12089048 9 Nov 08:59 test.h5 > -rw-r--r-- 1 jsw jsw 26355656 4 Nov 17:10 test.nc > > with pytables 0.8.1 > > [mac28:~/python] jsw% ls -l test.nc test.h5 > -rw-r--r-- 1 jsw jsw 5344279 9 Nov 09:00 test.h5 > -rw-r--r-- 1 jsw jsw 26355656 4 Nov 17:10 test.nc > > No matter what netcdf file I use as input, the resulting h5 file is > about twice as large using 0.9 as it is in 0.8.1. > > BTW: the test.nc file I used here can be found at > ftp://ftp.cdc.noaa.gov/Public/jsw. > > > -Jeff > -- Francesc Altet |
From: Jeffrey S W. <Jef...@no...> - 2004-11-09 16:02:37
|
Hi: I just noticed that compression doesn't seem to be working right (for me at least) in 0.9. Here's an example: with pytables 0.9 [mac28:~/python] jsw% nctoh5 --complevel=6 -o test.nc test.h5 [mac28:~/python] jsw% ls -l test.nc test.h5 -rw-r--r-- 1 jsw jsw 12089048 9 Nov 08:59 test.h5 -rw-r--r-- 1 jsw jsw 26355656 4 Nov 17:10 test.nc with pytables 0.8.1 [mac28:~/python] jsw% ls -l test.nc test.h5 -rw-r--r-- 1 jsw jsw 5344279 9 Nov 09:00 test.h5 -rw-r--r-- 1 jsw jsw 26355656 4 Nov 17:10 test.nc No matter what netcdf file I use as input, the resulting h5 file is about twice as large using 0.9 as it is in 0.8.1. BTW: the test.nc file I used here can be found at ftp://ftp.cdc.noaa.gov/Public/jsw. -Jeff -- Jeffrey S. Whitaker Phone : (303)497-6313 Meteorologist FAX : (303)497-6449 NOAA/OAR/CDC R/CDC1 Email : Jef...@no... 325 Broadway Web : www.cdc.noaa.gov/~jsw Boulder, CO, USA 80303-3328 Office : Skaggs Research Cntr 1D-124 |
From: Francesc A. <fa...@py...> - 2004-11-08 17:22:23
|
Hi, We are in the process of making our web sites (pytables.org and carabos.com) more attractive to newcomers, and I think it would be both useful and attractive for potential PyTables new users to read some quotes from real users, that is, you <wink>. Please, help to improve our web sites and contribute with one quote. It would be enough with a one or two phrases. Be kind with us, otherwise we will be forced to politely decline to publish yours :) Thanks in advance! -- Francesc Altet |
From: Francesc A. <fa...@py...> - 2004-11-08 12:11:44
|
Dear PyTables users, Unfortunately, PyTables 0.9 has received its first bug report shortly after the first release. The good news is that it has been fixed now. The problem iwas that the new setup.py won't install PyTables if either lzo or ucl are not installed on the system, although that they are optional libraries. This problem only happens on Unix platforms, though. A new version of pytables-0.9.tar.gz with a cure for this has been uploaded to: http://sourceforge.net/project/showfiles.php?group_id=63486 For those that already downloaded the tar package and don't want to download it again, it will be enough to apply the next patch: --- ../exports/pytables-0.9/setup.py 2004-11-05 16:33:58.000000000 +0100 +++ setup.py 2004-11-08 11:23:21.000000000 +0100 @@ -94,6 +94,7 @@ else: if not incdir or not libdir: print "Optional %s libraries or include files not found. Disabling support for them." % (libname,) + return else: # Necessary to include code for optional libs def_macros.append(("HAVE_"+libname.upper()+"_LIB", 1)) Sorry for the inconveniences, -- Francesc Altet |
From: Jeffrey S W. <Jef...@no...> - 2004-11-05 17:19:53
|
Friedemann: It is possible to improve compression by quantizing the data in your netCDF file, then using shuffle + zlib compression when converting to h5. For example, if your data in array 'dat' only has useful information in the tenths digit, you can truncate the data using Numeric.around(scale*dat)/scale where scale = 2**bits, and bits is determined so that a precision of 0.1 is retained (in this case bits=4). I've found that this can improve compression by up to a factor of 2, if you have data with not many significant digits. This is often the case with the meteorological observations I deal with, but your mileage may vary. BTW: In my experience, shuffle + zlib (level 6 is sufficient) always produces the smallest files. -Jeff -- Jeffrey S. Whitaker Phone : (303)497-6313 Meteorologist FAX : (303)497-6449 NOAA/OAR/CDC R/CDC1 Email : Jef...@no... 325 Broadway Web : www.cdc.noaa.gov/~jsw Boulder, CO, USA 80303-3328 Office : Skaggs Research Cntr 1D-124 |
From: Francesc A. <fa...@py...> - 2004-11-05 16:47:57
|
Announcing PyTables 0.9 ----------------------- I'm very proud to announce the latest and most powerful flavor of PyTables ever. On this release you will find a series of quite exciting new features, being the most important the indexing capabilities, in-kernel selections, support for complex datatypes and the possibility to modify values in both tables *and* arrays (yeah, finally :). What is ------- PyTables is a hierarchical database package designed to efficiently manage extremely large amounts of data (supports full 64-bit file addressing). It features an object-oriented interface that, combined with C extensions for the peformance-critical parts of the code, makes it a very easy to use tool for high performance data saving and retrieving. It is built on top of the HDF5 library and the numarray package, and provides containers for both heterogeneous data (Tables) and homogeneous data (Array, EArray). It also sports a container for keeping lists of objects of variable length on a very efficient way (VLArray). A flexible support of filters allows you to compress your data on-the-flight by using different compressors and compression enablers. Moreover, its powerful browsing and searching capabilities allow you to do data selections over tables exceeding gigabytes of data in just tenths of second. Changes more in depth --------------------- New features: - Indexing of columns in tables. That allow to make data selections on tables up to 500 times faster than standard selections (for ex. doing a selection along an indexed column of 100 milion of rows takes less than 1 second on a modern CPU). Perhaps the most interesting thing about the indexing algorithm implemented by PyTables is that the time taken to index grows *lineraly* with the length of the data, so, making the indexation process to be *scalable* (quite differently to many relational databases). This means that it can index, in a relatively quick way, arbitrarily large table columns (for ex. indexing a column of 100 milion of rows takes just 100 seconds, i.e. at a rate of 1 Mrow/sec). See more detailed info about that in http://pytables.sourceforge.net/doc/SciPy04.pdf. - In-kernel selections. This feature allow to make data selections on tables up to 5 times faster than standard selections (i.e. pre-0.9 selections), without a need to create an index. As a hint of how fast these selections can be, they are up to 10 times faster than a traditional relational database. Again, see http://pytables.sourceforge.net/doc/SciPy04.pdf for some experiments on that matter. - Support of complex datatypes for all the data objects (i.e. Table, Array, EArray and VLArray). With that, the complete set of datatypes of Numeric and numarray packages are supported. Thanks to Tom Hedley for providing the patches for Array, EArray and VLArray objects, as well as updating the User's Manual and adding unit tests for the new functionality. - Modification of values. You can modifiy Table, Array, EArray and VLArray values. See Table.modifyRows, Table.modifyColumns() and the newly introduced __setitem__() method for Table, Array, EArray and VLArray entities in the Library Reference of User's Manual. - A new sub-package called "nodes" is there. On it, there will be included different modules to make more easy working with different entities (like images, files, ...). The first module that has been added to this sub-package is "FileNode", whose mission is to enable the creation of a database of nodes which can be used like regular opened files in Python. In other words, you can store a set of files in a PyTables database, and read and write it as you would do with any other file in Python. Thanks to Ivan Vilata i Balaguer for contributing this. Improvements: - New __len__(self) methods added in Arrays, Tables and Columns. This, in combination with __getitem__(self,key) allows to better emulate sequences. - Better capabilities to import generic HDF5 files. In particular, Table objects (in the HDF5_HL naming schema) with "holes" in their compound type definition are supported. That allows to read certain files produced by NASA (thanks to Stephen Walton for reporting this). - Much improved test units. More than 2000 different tests has been implemented which accounts for more than 13000 loc (this represents twice of the PyTables library code itself (!)). Backward-incompatible API changes: - The __call__ special method has been removed from objects File, Group, Table, Array, EArray and VLArray. Now, you should use walkNodes() in File and Group and iterrows in Table, Array, EArray and VLArray to get the same functionality. This would provide better compatibility with IPython as well. 'nctoh5', a new importing utility: - Jeff Whitaker has contributed a script to easily convert NetCDF files into HDF5 files using Scientific Python and PyTables. It has been included and documented as a new utility. Bug fixes: - A call to File.flush() now invoke a call to H5Fflush() so to effectively flushing all the file contents to disk. Thanks to Shack Toms for reporting this and providing a patch. - SF #1054683: Security hole in utils.checkNameValidity(). Reported in 2004-10-26 by ivilata - SF #1049297: Suggestion: new method File.delAttrNode(). Reported in 2004-10-18 by ivilata - SF #1049285: Leak in AttributeSet.__delattr__(). Reported in 2004-10-18 by ivilata - SF #1014298: Wrong method call in examples/tutorial1-2.py. Reported in 2004-08-23 by ivilata - SF #1013202: Cryptic error appending to EArray on RO file. Reported in 2004-08-21 by ivilata - SF #991715: Table.read(field="var1", flavor="List") fails. Reported in 2004-07-15 by falted - SF #988547: Wrong file type assumption in File.__new__. Reported in 2004-07-10 by ivilata Where PyTables can be applied? ------------------------------ PyTables is not designed to work as a relational database competitor, but rather as a teammate. If you want to work with large datasets of multidimensional data (for example, for multidimensional analysis), or just provide a categorized structure for some portions of your cluttered RDBS, then give PyTables a try. It works well for storing data from data acquisition systems (DAS), simulation software, network data monitoring systems (for example, traffic measurements of IP packets on routers), very large XML files, or for creating a centralized repository for system logs, to name only a few possible uses. What is a table? ---------------- A table is defined as a collection of records whose values are stored in fixed-length fields. All records have the same structure and all values in each field have the same data type. The terms "fixed-length" and "strict data types" seem to be quite a strange requirement for a language like Python that supports dynamic data types, but they serve a useful function if the goal is to save very large quantities of data (such as is generated by many scientific applications, for example) in an efficient manner that reduces demand on CPU time and I/O resources. What is HDF5? ------------- For those people who know nothing about HDF5, it is a general purpose library and file format for storing scientific data made at NCSA. HDF5 can store two primary objects: datasets and groups. A dataset is essentially a multidimensional array of data elements, and a group is a structure for organizing objects in an HDF5 file. Using these two basic constructs, one can create and store almost any kind of scientific data structure, such as images, arrays of vectors, and structured and unstructured grids. You can also mix and match them in HDF5 files according to your needs. Platforms --------- I'm using Linux (Intel 32-bit) as the main development platform, but PyTables should be easy to compile/install on many other UNIX machines. This package has also passed all the tests on a UltraSparc platform with Solaris 7 and Solaris 8. It also compiles and passes all the tests on a SGI Origin2000 with MIPS R12000 processors, with the MIPSPro compiler and running IRIX 6.5. It also runs fine on Linux 64-bit platforms, like AMD Opteron running GNU/Linux 2.4.21 Server, Intel Itanium (IA64) running GNU/Linux 2.4.21 or PowerPC G5 with Linux 2.6.x in 64bit mode. It has also been tested in MacOSX platforms (10.2 but should also work on newer versions). Regarding Windows platforms, PyTables has been tested with Windows 2000 and Windows XP (using the Microsoft Visual C compiler), but it should also work with other flavors as well. Web site -------- Go to the PyTables web site for more details: http://pytables.sourceforge.net/ To know more about the company behind the PyTables development, see: http://www.carabos.com/ Share your experience --------------------- Let me know of any bugs, suggestions, gripes, kudos, etc. you may have. Bon profit! -- Francesc Altet |
From: Francesc A. <fa...@py...> - 2004-11-05 15:15:33
|
A Divendres 05 Novembre 2004 15:15, fri...@am... va escriure: > I have two questions concerning compression: > > 1) The effect of zlib-compression > --------------------------------- > Tests are made with a 30 MB nc-file resulting in h5 files of > > 156 MB without any compression > 16 MB with zlib compression level 9 > > ...but using external zip or lzo program it results in approx. > 2 MB That's weird... Which compression flags are you using? In particular, are you using the shuffle filter? My experience is that a compressor+shuffle *always* give better results than using the compressor alone. Perhaps your data is an exception to this rule. Can you try: $ nctoh5 --complib=zlib --shuffle=1 ncfile h5file and $ nctoh5 --complib=zlib --shuffle=0 ncfile h5file so as to see if shuffle is adding entropy to data instead of reducing it? > 2) internal lzo compression > --------------------------- > I placed lzo.dll resp. lzo1.dll in ~system32 directory where zlib.dll and msvcrt.dll live too. > But I get a UserWarning from Leaf.py line 90: lzo compression library is not available... > > What I have to do? Did you install the LZO & UCL aware PyTables auto-installable? I mean, something like tables-0.8.1-LU.win32-py2.3.exe instead of tables-0.8.1.win32-py2.3.exe Cheers, -- Francesc Altet |
From: <fri...@am...> - 2004-11-05 14:16:37
|
Thank you for optimizing the nctoh5 script! I have two questions concerning compression: 1) The effect of zlib-compression --------------------------------- Tests are made with a 30 MB nc-file resulting in h5 files of 156 MB without any compression 16 MB with zlib compression level 9 ...but using external zip or lzo program it results in approx. 2 MB (I get nearly the same result when I compress the original nc file.) Is there a chance to get smaller h5 files out of nctoh5.py ? 2) internal lzo compression --------------------------- I placed lzo.dll resp. lzo1.dll in ~system32 directory where zlib.dll and msvcrt.dll live too. But I get a UserWarning from Leaf.py line 90: lzo compression library is not available... What I have to do? Thanks in advance, Friedemann Gehrt |
From: Francesc A. <fa...@py...> - 2004-11-03 18:40:53
|
A Dimecres 03 Novembre 2004 19:13, Jeff Whitaker va escriure: > > If updating the EArray via slice assignment were supported in the future > this would be a non-issue, since I guess one could just do > > vardata[n] = var[n] Yeah, I think so because I've just implemented slice assignments for Array, EArray and VLArray objects in upcoming pytables 0.9. However, bear in mind that the new release will only let you do an assignment to already *existing* elements in the object. So, I'm afraid that you will need to continue using append() for this particular case. > BTW: netCDF only allows the first dimension to be unlimited. > Interestingly, the upcoming netCDF 4 will be a wrapper on top of the > HDF5 API, and will support compression and multiple unlimited dimensions. Ok then. I'll let that support for upcoming netCDF 4 (but for just only one unlimited dimension, as this is all what PyTables supports). Cheers, -- Francesc Alted |
From: Jeff W. <js...@fa...> - 2004-11-03 18:14:07
|
Francesc Alted wrote: >A Dimarts 02 Novembre 2004 23:56, Jeff Whitaker va escriure: > > >>Francesc: There was no good reason for using Array instead of EArray >>for rank-1 variables, other than I wasn't sure EArrays were appropriate >>for variables that were not going to be appended to. >> >> > >Yes, they are. The only reason for using an Array instead of an EArray is a >matter of simplicity. For me, creating Array objects from the interactive >console is far easier than EArray, but after the creation, both objects >works very similar. However, EArray do support filters (apart of being >extensible), so, for programs, and if compression is desirable, I do >recommend using EArray objects for everything, even when you don't want to >enlarge them. > > > Francesc: OK, thanks for the explanation. >>If I replace >> >> vardata.append(var[n:n+1]) >> >>with >> >> if dtype == 'c': >> chararr = numarray.strings.array(var[n].tolist()) >> newshape = list(chararr.shape) >> newshape.insert(0,1) >> chararr.setshape(tuple(newshape)) >> vardata.append(chararr) >> else: >> vardata.append(var[n:n+1]) >> >> >>it seems to work. Note that I have to reshape the chararray to have an >>extra singleton dimension or pytables complains that the data being >>appended has the wrong shape. This is also the reason I had to use >>var[n:n+1] instead of var[n] in the append. Is there a better way to do >>this? >> >> > >Well, this is a subtle problem, as append(array) expects array to be of the >same shape as the atomic shape. When the array has a dimension less, I guess >it would be safe to suppose that what the user wants is to add a single row >in the extensible dimension. Frankly, I don't know if implementing this kind >of behaviour would help the user to understand how append() works. > > If updating the EArray via slice assignment were supported in the future this would be a non-issue, since I guess one could just do vardata[n] = var[n] >By the way, I've solved in PyTables the problem when the object to append is >a Numeric object with Char type ('c' typecode). I'm attaching my new version >for your inspection (beware, this will run only with PyTables 0.9!). > >You surely have noted that my code may convert NetCDF files with enlargeable >dimensions other that the first one. I don't know whether this is suported >in NetCDF or not. I only know that Scientific Python does not seem to >support that. > > > Excellent! Thanks for taking an interest in my little script. You've taught me a lot about PyTables along the way. BTW: netCDF only allows the first dimension to be unlimited. Interestingly, the upcoming netCDF 4 will be a wrapper on top of the HDF5 API, and will support compression and multiple unlimited dimensions. -Jeff -- Jeffrey S. Whitaker Phone : (303)497-6313 Meteorologist FAX : (303)497-6449 NOAA/OAR/CDC R/CDC1 Email : Jef...@no... 325 Broadway Web : www.cdc.noaa.gov/~jsw Boulder, CO, USA 80303-3328 Office : Skaggs Research Cntr 1D-124 |
From: Francesc A. <fa...@py...> - 2004-11-03 18:08:15
|
A Dimarts 02 Novembre 2004 23:56, Jeff Whitaker va escriure: > Francesc: There was no good reason for using Array instead of EArray > for rank-1 variables, other than I wasn't sure EArrays were appropriate > for variables that were not going to be appended to. Yes, they are. The only reason for using an Array instead of an EArray is a matter of simplicity. For me, creating Array objects from the interactive console is far easier than EArray, but after the creation, both objects works very similar. However, EArray do support filters (apart of being extensible), so, for programs, and if compression is desirable, I do recommend using EArray objects for everything, even when you don't want to enlarge them. > If I replace > > vardata.append(var[n:n+1]) > > with > > if dtype == 'c': > chararr = numarray.strings.array(var[n].tolist()) > newshape = list(chararr.shape) > newshape.insert(0,1) > chararr.setshape(tuple(newshape)) > vardata.append(chararr) > else: > vardata.append(var[n:n+1]) > > > it seems to work. Note that I have to reshape the chararray to have an > extra singleton dimension or pytables complains that the data being > appended has the wrong shape. This is also the reason I had to use > var[n:n+1] instead of var[n] in the append. Is there a better way to do > this? Well, this is a subtle problem, as append(array) expects array to be of the same shape as the atomic shape. When the array has a dimension less, I guess it would be safe to suppose that what the user wants is to add a single row in the extensible dimension. Frankly, I don't know if implementing this kind of behaviour would help the user to understand how append() works. By the way, I've solved in PyTables the problem when the object to append is a Numeric object with Char type ('c' typecode). I'm attaching my new version for your inspection (beware, this will run only with PyTables 0.9!). You surely have noted that my code may convert NetCDF files with enlargeable dimensions other that the first one. I don't know whether this is suported in NetCDF or not. I only know that Scientific Python does not seem to support that. Finally, the new (attached) version does the copy in buckets of records instead of just one single record at a time. That improves the conversion speed considerably for large variables. Original code: $ ./nctoh5.bck -vo /tmp/test.nc /tmp/test-3-old.h5 +=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=+ Starting conversion from /tmp/test.nc to /tmp/test-3-old.h5 Applying filters: None +=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=+ Number of variables copied: 2 KBytes copied: 7812.969 Time copying: 38.142 s (real) 36.88 s (cpu) 97% Copied variable/sec: 0.1 Copied KB/s : 204 Bucked conversion code: $ ./nctoh5 -vo /tmp/test.nc /tmp/test-3.h5 +=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=+ Starting conversion from /tmp/test.nc to /tmp/test-3.h5 Applying filters: None +=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=++=+ Number of variables copied: 2 KBytes copied: 7812.969 Time copying: 5.55 s (real) 5.42 s (cpu) 98% Copied variable/sec: 0.4 Copied KB/s : 1407 Maybe this is not important in many situations, but well, I think this is not going to hurt neither <wink> Cheers, -- Francesc Alted ---------------------------------------------------------- #!/usr/bin/env python """ convert netCDF file to HDF5 using Scientific.IO.NetCDF and PyTables. Jeff Whitaker <jef...@no...> Added some flags to select filters, as well as some small improvements. Francesc Altet <fa...@ca...> This requires Scientific from http://starship.python.net/~hinsen/ScientificPython """ import Scientific.IO.NetCDF as NetCDF import tables, sys, os.path, getopt, time def nctoh5(ncfilename, h5filename, filters, overwritefile): # open netCDF file ncfile = NetCDF.NetCDFFile(ncfilename, mode = "r") # open h5 file if overwritefile: h5file = tables.openFile(h5filename, mode = "w") else: h5file = tables.openFile(h5filename, mode = "a") # loop over variables in netCDF file. nobjects = 0; nbytes = 0 # Initialize counters for varname in ncfile.variables.keys(): var = ncfile.variables[varname] vardims = list(var.dimensions) vardimsizes = [ncfile.dimensions[vardim] for vardim in vardims] # Check if any dimension is enlargeable extdim = -1; ndim = 0 for vardim in vardimsizes: if vardim == None: extdim = ndim break ndim += 1 # use long_name for title. if hasattr(var,'long_name'): title = var.long_name else: # or, just use some bogus title. title = varname + ' array' # Create an EArray to keep the NetCDF variable if extdim < 0: # Make 0 the enlargeable dimension extdim = 0 vardimsizes[extdim] = 0 dtype=var.typecode() if dtype == 'c': # Special case for Numeric character objects # (on which base Scientific Python works) atom = tables.StringAtom(shape=tuple(vardimsizes), length=1) else: atom = tables.Atom(dtype=var.typecode(), shape=tuple(vardimsizes)) vardata = h5file.createEArray(h5file.root, varname, atom, title, filters=filters, expectedrows=vardimsizes[extdim]) # write data to enlargeable array one chunk of records at a time. # (so the whole array doesn't have to be kept in memory). nrowsinbuf = vardata._v_maxTuples # The slices parameter for var.__getitem__() slices = [slice(0, dim, 1) for dim in var.shape] # range to copy start = 0; stop = var.shape[extdim]; step = 1 # Start the copy itself for start2 in range(start, stop, step*nrowsinbuf): # Save the records on disk stop2 = start2+step*nrowsinbuf if stop2 > stop: stop2 = stop # Set the proper slice in the extensible dimension slices[extdim] = slice(start2, stop2, step) vardata.append(var[tuple(slices)]) # Increment the counters nobjects += 1 nbytes += reduce(lambda x,y:x*y, vardata.shape) * vardata.itemsize # set variable attributes. for key,val in var.__dict__.iteritems(): setattr(vardata.attrs,key,val) setattr(vardata.attrs,'dimensions',tuple(vardims)) # set global (file) attributes. for key,val in ncfile.__dict__.iteritems(): setattr(h5file.root._v_attrs,key,val) # Close the file h5file.close() return (nobjects, nbytes) usage = """usage: %s [-h] [-v] [-o] [--complevel=(0-9)] [--complib=lib] [--shuffle=(0|1)] [--fletcher32=(0|1)] netcdffilename hdf5filename -h -- Print usage message. -v -- Show more information. -o -- Overwite destination file. --complevel=(0-9) -- Set a compression level (0 for no compression, which is the default). --complib=lib -- Set the compression library to be used during the copy. lib can be set to "zlib", "lzo" or "ucl". Defaults to "zlib". --shuffle=(0|1) -- Activate or not the shuffling filter (default is active if complevel>0). --fletcher32=(0|1) -- Whether to activate or not the fletcher32 filter (not active by default). \n""" % os.path.basename(sys.argv[0]) try: opts, pargs = getopt.getopt(sys.argv[1:], 'hvo', ['complevel=', 'complib=', 'shuffle=', 'fletcher32=', ]) except: (type, value, traceback) = sys.exc_info() print "Error parsing the options. The error was:", value sys.stderr.write(usage) sys.exit(0) # default options verbose = 0 overwritefile = 0 complevel = None complib = None shuffle = None fletcher32 = None # Get the options for option in opts: if option[0] == '-h': sys.stderr.write(usage) sys.exit(0) elif option[0] == '-v': verbose = 1 elif option[0] == '-o': overwritefile = 1 elif option[0] == '--complevel': complevel = int(option[1]) elif option[0] == '--complib': complib = option[1] elif option[0] == '--shuffle': shuffle = int(option[1]) elif option[0] == '--fletcher32': fletcher32 = int(option[1]) else: print option[0], ": Unrecognized option" sys.stderr.write(usage) sys.exit(0) # if we pass a number of files different from 2, abort if len(pargs) <> 2: print "You need to pass both source and destination!." sys.stderr.write(usage) sys.exit(0) # Catch the files passed as the last arguments ncfilename = pargs[0] h5filename = pargs[1] # Build the Filters instance if (complevel, complib, shuffle, fletcher32) == (None,)*4: filters = None else: if complevel is None: complevel = 0 if complevel > 0 and shuffle is None: shuffle = 1 else: shuffle = 0 if complib is None: complib = "zlib" if fletcher32 is None: fletcher32 = 0 filters = tables.Filters(complevel=complevel, complib=complib, shuffle=shuffle, fletcher32=fletcher32) # Some timing t1 = time.time() cpu1 = time.clock() # Copy the file if verbose: print "+=+"*20 print "Starting conversion from %s to %s" % (ncfilename, h5filename) print "Applying filters:", filters print "+=+"*20 # Do the conversion (nobjects, nbytes) = nctoh5(ncfilename, h5filename, filters, overwritefile) # Gather some statistics t2 = time.time() cpu2 = time.clock() tcopy = round(t2-t1, 3) cpucopy = round(cpu2-cpu1, 3) tpercent = int(round(cpucopy/tcopy, 2)*100) if verbose: print "Number of variables copied:", nobjects print "KBytes copied:", round(nbytes/1024.,3) print "Time copying: %s s (real) %s s (cpu) %s%%" % \ (tcopy, cpucopy, tpercent) print "Copied variable/sec: ", round(nobjects / float(tcopy),1) print "Copied KB/s :", int(nbytes / (tcopy * 1024)) |
From: Jeff W. <js...@fa...> - 2004-11-02 22:56:17
|
Francesc: There was no good reason for using Array instead of EArray for rank-1 variables, other than I wasn't sure EArrays were appropriate for variables that were not going to be appended to. Thanks for extending the script! I noticed one typo (StringAtom was used instead of tables.StringAtom). Also, it doesn't yet work on files with netcdf character variables. I get errors like this: Traceback (most recent call last): File "nctoh5_new.py", line 175, in ? overwritefile) File "nctoh5_new.py", line 61, in nctoh5 vardata.append(var[n:n+1]) File "/sw/lib/python2.3/site-packages/tables/EArray.py", line 196, in append naarr = convertIntoNA(object, self.atom) File "/sw/lib/python2.3/site-packages/tables/utils.py", line 208, in convertIntoNA shape=arr.shape) File "/sw/src/root-numarray-py23-1.0-1/sw/lib/python2.3/site-packages/numarray/numarraycore.py", line 318, in array File "/sw/src/root-numarray-py23-1.0-1/sw/lib/python2.3/site-packages/numarray/numarraycore.py", line 256, in getTypeObject File "/sw/src/root-numarray-py23-1.0-1/sw/lib/python2.3/site-packages/numarray/numarraycore.py", line 245, in _typeFromTypeAndTypecode File "/sw/src/root-numarray-py23-1.0-1/sw/lib/python2.3/site-packages/numarray/numerictypes.py", line 457, in getType TypeError: Not a numeric type If I replace vardata.append(var[n:n+1]) with if dtype == 'c': chararr = numarray.strings.array(var[n].tolist()) newshape = list(chararr.shape) newshape.insert(0,1) chararr.setshape(tuple(newshape)) vardata.append(chararr) else: vardata.append(var[n:n+1]) it seems to work. Note that I have to reshape the chararray to have an extra singleton dimension or pytables complains that the data being appended has the wrong shape. This is also the reason I had to use var[n:n+1] instead of var[n] in the append. Is there a better way to do this? -Jeff -- Jeffrey S. Whitaker Phone : (303)497-6313 Meteorologist FAX : (303)497-6449 NOAA/OAR/CDC R/CDC1 Email : Jef...@no... 325 Broadway Web : www.cdc.noaa.gov/~jsw Boulder, CO, USA 80303-3328 Office : Skaggs Research Cntr 1D-124 |
From: Francesc A. <fa...@py...> - 2004-11-02 13:54:06
|
Hi, I've been working on a 'beautified' version of nctoh5 to include it as an utility of PyTables. I am attaching it at the bottom of this message. The new version has support for specifying filters parameters on command line and to optimize the I/O speed (through the use of the expectedrows parameter of createEArray). Also, I've added a small patch that should deal with Numeric single string characters (typecode 'c') arrays, but not tested it, though. Jeffrey: I don't understand why you create an Array (and not an EArray) for unidimensional NetCDF variables. I've changed the new code to create always and EArray (just to take advantage of filters). If you have some reason against doing this, please tell me. A Dilluns 01 Novembre 2004 17:09, Jeffrey S Whitaker va escriure: > fri...@am... wrote: >=20 > > Hi Jeffrey, > >=20 > > thanks for your nctoh5-script in the pytables list. Should it work with= CharType-Variables? IsDescription.py complains about =BBIllegal type: 'c'= =AB if only one CharType Variable is used in the netCDF-File-Arrays.=20 > >=20 > > I use EnthoughtPyton 2.3 along with pytables-0.81, ScientificIO-2.4.6, = numarray-1.1, numeric-23.6 > >=20 > > Thanks in advance, > > Friedemann >=20 >=20 >=20 > Friedemann: I can't think of any reason why it wouldn't work. I'm=20 > cc'ing the pytables list in case anyone there has an idea. It would=20 > help if you could post your netcdf file somewhere so I could try it out. >=20 > -Jeff >=20 =2D-=20 =46rancesc Alted =2D--------------------------------------------------------------------- #!/usr/bin/env python """ convert netCDF file to HDF5 using Scientific.IO.NetCDF and PyTables. Jeff Whitaker <jef...@no...> Added some flags to select filters, as well as some small improvements. =46rancesc Altet <fa...@ca...> This requires Scientific from=20 http://starship.python.net/~hinsen/ScientificPython """ import Scientific.IO.NetCDF as NetCDF import tables, sys, os.path, getopt, time def nctoh5(ncfilename, h5filename, filters, overwritefile): # open netCDF file ncfile =3D NetCDF.NetCDFFile(ncfilename, mode =3D "r") # open h5 file if overwritefile: h5file =3D tables.openFile(h5filename, mode =3D "w") else: h5file =3D tables.openFile(h5filename, mode =3D "a") =20 # loop over variables in netCDF file. nobjects =3D 0; nbytes =3D 0 # Initialize counters for varname in ncfile.variables.keys(): var =3D ncfile.variables[varname] vardims =3D list(var.dimensions) vardimsizes =3D [ncfile.dimensions[vardim] for vardim in vardims] # Check if any dimension is enlargeable edim =3D -1; ndim =3D 0 for vardim in vardimsizes: if vardim =3D=3D None: edim =3D ndim break ndim +=3D 1 # use long_name for title. if hasattr(var,'long_name'): title =3D var.long_name else: # or, just use some bogus title. title =3D varname + ' array' # Create an EArray to keep the NetCDF variable if edim < 0: # Make 0 the enlargeable dimension edim =3D 0 vardimsizes[edim] =3D 0 dtype=3Dvar.typecode() if dtype =3D=3D 'c': # Special case for Numeric character objects # (on which base Scientific Python works) atom =3D StringAtom(shape=3Dtuple(vardimsizes), length=3D1)=20 else: atom =3D tables.Atom(dtype=3Dvar.typecode(), shape=3Dtuple(vard= imsizes)) vardata =3D h5file.createEArray(h5file.root, varname, atom, title, filters=3Dfilters, expectedrows=3Dvardimsizes[edim]) # write data to enlargeable array on record at a time. # (so the whole array doesn't have to be kept in memory). for n in range(var.shape[0]): vardata.append(var[n:n+1]) # Increment the counters nobjects +=3D 1 nbytes +=3D reduce(lambda x,y:x*y, vardata.shape) * vardata.itemsize # set variable attributes. for key,val in var.__dict__.iteritems(): setattr(vardata.attrs,key,val) setattr(vardata.attrs,'dimensions',tuple(vardims)) # set global (file) attributes. for key,val in ncfile.__dict__.iteritems(): setattr(h5file.root._v_attrs,key,val) # Close the file h5file.close() return (nobjects, nbytes) usage =3D """usage: %s [-h] [-v] [-o] [--complevel=3D(0-9)] [--complib=3Dli= b] [--shuffle=3D(0|1)] [--fletcher32=3D(0|1)] netcdffilename hdf5filename -h -- Print usage message. -v -- Show more information. -o -- Overwite destination file. --complevel=3D(0-9) -- Set a compression level (0 for no compression, which is the default). --complib=3Dlib -- Set the compression library to be used during the copy. lib can be set to "zlib", "lzo" or "ucl". Defaults to "zlib". --shuffle=3D(0|1) -- Activate or not the shuffling filter (default is acti= ve if complevel>0). --fletcher32=3D(0|1) -- Whether to activate or not the fletcher32 filter (= not active by default). \n""" % os.path.basename(sys.argv[0]) try: opts, pargs =3D getopt.getopt(sys.argv[1:], 'hvo', ['complevel=3D', 'complib=3D', 'shuffle=3D', 'fletcher32=3D', ]) except: (type, value, traceback) =3D sys.exc_info() print "Error parsing the options. The error was:", value sys.stderr.write(usage) sys.exit(0) # default options verbose =3D 0 overwritefile =3D 0 complevel =3D None complib =3D None shuffle =3D None fletcher32 =3D None # Get the options for option in opts: if option[0] =3D=3D '-h': sys.stderr.write(usage) sys.exit(0) elif option[0] =3D=3D '-v': verbose =3D 1 elif option[0] =3D=3D '-o': overwritefile =3D 1 elif option[0] =3D=3D '--complevel': complevel =3D int(option[1]) elif option[0] =3D=3D '--complib': complib =3D option[1] elif option[0] =3D=3D '--shuffle': shuffle =3D int(option[1]) elif option[0] =3D=3D '--fletcher32': fletcher32 =3D int(option[1]) else: print option[0], ": Unrecognized option" sys.stderr.write(usage) sys.exit(0) =20 # if we pass a number of files different from 2, abort if len(pargs) <> 2: print "You need to pass both source and destination!." sys.stderr.write(usage) sys.exit(0) # Catch the files passed as the last arguments ncfilename =3D pargs[0] h5filename =3D pargs[1] =20 # Build the Filters instance if (complevel, complib, shuffle, fletcher32) =3D=3D (None,)*4: filters =3D None else: if complevel is None: complevel =3D 0 if complevel > 0 and shuffle is None: shuffle =3D 1 else: shuffle =3D 0 if complib is None: complib =3D "zlib" if fletcher32 is None: fletcher32 =3D 0 filters =3D tables.Filters(complevel=3Dcomplevel, complib=3Dcomplib, shuffle=3Dshuffle, fletcher32=3Dfletcher32) # Some timing t1 =3D time.time() cpu1 =3D time.clock() # Copy the file if verbose:=20 print "+=3D+"*20 print "Starting conversion from %s to %s" % (ncfilename, h5filename) print "Applying filters:", filters print "+=3D+"*20 # Do the conversion (nobjects, nbytes) =3D nctoh5(ncfilename, h5filename, filters, overwritefil= e) # Gather some statistics t2 =3D time.time() cpu2 =3D time.clock() tcopy =3D round(t2-t1, 3) cpucopy =3D round(cpu2-cpu1, 3) tpercent =3D int(round(cpucopy/tcopy, 2)*100) if verbose: print "Number of variables copied:", nobjects print "KBytes copied:", round(nbytes/1024.,3) print "Time copying: %s s (real) %s s (cpu) %s%%" % \ (tcopy, cpucopy, tpercent) print "Copied variable/sec: ", round(nobjects / float(tcopy),1) print "Copied KB/s :", int(nbytes / (tcopy * 1024)) |
From: Jeffrey S W. <Jef...@no...> - 2004-11-01 16:09:40
|
fri...@am... wrote: > Hi Jeffrey, > > thanks for your nctoh5-script in the pytables list. Should it work with CharType-Variables? IsDescription.py complains about »Illegal type: 'c'« if only one CharType Variable is used in the netCDF-File-Arrays. > > I use EnthoughtPyton 2.3 along with pytables-0.81, ScientificIO-2.4.6, numarray-1.1, numeric-23.6 > > Thanks in advance, > Friedemann Friedemann: I can't think of any reason why it wouldn't work. I'm cc'ing the pytables list in case anyone there has an idea. It would help if you could post your netcdf file somewhere so I could try it out. -Jeff -- Jeffrey S. Whitaker Phone : (303)497-6313 Meteorologist FAX : (303)497-6449 NOAA/OAR/CDC R/CDC1 Email : Jef...@no... 325 Broadway Web : www.cdc.noaa.gov/~jsw Boulder, CO, USA 80303-3328 Office : Skaggs Research Cntr 1D-124 |
From: Francesc A. <fa...@py...> - 2004-10-24 16:20:45
|
Hi Jeff, A Diumenge 24 Octubre 2004 14:16, Jeff Whitaker va escriure: [...] > I have a question > regarding EArrays - I see how to append values and to write the entire > array at once, but is there a way to update an array record without > re-writing the whole thing? Nope. Although this on my TODO list, there is not such a support yet. Meanwhile, you can do that using Table objects (assignment to table cells is supported in 0.9). > Here's the nctoh5 script (requires Scientific from > http://starship.python.net/~hinsen/ScientificPython). Thanks for contributing that :) -- Francesc Alted |