You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(5) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
|
Feb
(2) |
Mar
|
Apr
(5) |
May
(11) |
Jun
(7) |
Jul
(18) |
Aug
(5) |
Sep
(15) |
Oct
(4) |
Nov
(1) |
Dec
(4) |
2004 |
Jan
(5) |
Feb
(2) |
Mar
(5) |
Apr
(8) |
May
(8) |
Jun
(10) |
Jul
(4) |
Aug
(4) |
Sep
(20) |
Oct
(11) |
Nov
(31) |
Dec
(41) |
2005 |
Jan
(79) |
Feb
(22) |
Mar
(14) |
Apr
(17) |
May
(35) |
Jun
(24) |
Jul
(26) |
Aug
(9) |
Sep
(57) |
Oct
(64) |
Nov
(25) |
Dec
(37) |
2006 |
Jan
(76) |
Feb
(24) |
Mar
(79) |
Apr
(44) |
May
(33) |
Jun
(12) |
Jul
(15) |
Aug
(40) |
Sep
(17) |
Oct
(21) |
Nov
(46) |
Dec
(23) |
2007 |
Jan
(18) |
Feb
(25) |
Mar
(41) |
Apr
(66) |
May
(18) |
Jun
(29) |
Jul
(40) |
Aug
(32) |
Sep
(34) |
Oct
(17) |
Nov
(46) |
Dec
(17) |
2008 |
Jan
(17) |
Feb
(42) |
Mar
(23) |
Apr
(11) |
May
(65) |
Jun
(28) |
Jul
(28) |
Aug
(16) |
Sep
(24) |
Oct
(33) |
Nov
(16) |
Dec
(5) |
2009 |
Jan
(19) |
Feb
(25) |
Mar
(11) |
Apr
(32) |
May
(62) |
Jun
(28) |
Jul
(61) |
Aug
(20) |
Sep
(61) |
Oct
(11) |
Nov
(14) |
Dec
(53) |
2010 |
Jan
(17) |
Feb
(31) |
Mar
(39) |
Apr
(43) |
May
(49) |
Jun
(47) |
Jul
(35) |
Aug
(58) |
Sep
(55) |
Oct
(91) |
Nov
(77) |
Dec
(63) |
2011 |
Jan
(50) |
Feb
(30) |
Mar
(67) |
Apr
(31) |
May
(17) |
Jun
(83) |
Jul
(17) |
Aug
(33) |
Sep
(35) |
Oct
(19) |
Nov
(29) |
Dec
(26) |
2012 |
Jan
(53) |
Feb
(22) |
Mar
(118) |
Apr
(45) |
May
(28) |
Jun
(71) |
Jul
(87) |
Aug
(55) |
Sep
(30) |
Oct
(73) |
Nov
(41) |
Dec
(28) |
2013 |
Jan
(19) |
Feb
(30) |
Mar
(14) |
Apr
(63) |
May
(20) |
Jun
(59) |
Jul
(40) |
Aug
(33) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
From: Francesc A. <fa...@py...> - 2004-05-21 12:04:30
|
A Dijous 20 Maig 2004 22:48, John Bradbury va escriure: > Thank you for your prompt response. > > As I may be using very large tables the Table.removeRows() may be too slow. > Is it feasible to have a deleted flag in the record so that all records with > the flag set would be ignored? Then you would only have to append the > modified record. This is an excellent idea. Although I must think first which is the best way to implement such a flag, I think that would be feasible (and much more efficient than the present approach). Then, if the user do *lots* of deletions, he can always use the "ptrepack" utility in order to physically get rid of the rows marked as deleted. Thanks, -- Francesc Alted |
From: Francesc A. <fa...@py...> - 2004-05-20 18:20:06
|
A Dijous 20 Maig 2004 18:32, John Bradbury va escriure: > I guess this may be a stupid question as I have not been able to get it > answered before. > > Can you updata existing records in pytables or do you have to copy the tbale > to a new location? I don't see any functions which do this. Yes, you can. However, currently, you have to manually read the record into a private variable, delete it in table (see Table.removeRows() method), modify the record as you want, and append() it again. Be cautious because deleting rows is a *potentially* slow operation in pytables, as it implies a recopying of part of the table (although this has been optimized in C, if the table is large). I was somewhat reluctant to offer a kind of Table.modifyRow() in order to do that because I wanted to find a better solution for modifications (instead of read-->delete->append procedure). The problem is that HDF5 does not support modifications at that time, and under my request, they said that this is not high priority for them. Meanwhile, I'll add the addition of a modifyRow() in my TODO list (even though the first version could represent a sub-optimal approach). Regards, -- Francesc Alted |
From: Francesc A. <fa...@py...> - 2004-05-11 08:15:20
|
Hi David, A Divendres 07 Maig 2004 06:02, David Sumner va escriure: > Is it possible to use pytables in a parallel or threaded environment? I > am looking at developing an application that will support live additions > of data and at the same time allow data to be extracted for reporting > purposes. I would not be concerned with locking because I see only one > thread/process that would add data. There would be one or more > threads/processes that would allow queries to be run against the dataset > whose results would be returned for live reporting. A couple of weeks ago or so, another person wrote to me about the same question. PyTables 0.8 has a bug that prevents the buffers to be flushed to disk when a call to Leaf.flush() is made. This is solved in CVS now, and the patch will appear in the next public release of pytables (0.8.1). So, in principle, you should be able to have a process writing and other reading the same file. However, some problem that I currently have tracked down to the HDF5 library, makes the reading process to not be aware of the updates in the file, unless you close and re-open the file in the reader, which could be quite inconvenient. I wrote about that problem to the NCSA people, and they asked to me to provide a C example on that, but this would require some nice amount of work I haven't had time yet, and I don't really know if I will have some in the next future. If you want to contribute to this effort, you may provide this C example (or buy some of my time in order to do it). Regarding to the threaded environment, I know that HDF5 supports it (flag --enable-threadsafe during configuration), but I did not made any try with that and pytables. If you try this, I would be glad to hear about your experiences. Regards, -- Francesc Alted |
From: David S. <da...@dr...> - 2004-05-07 04:02:57
|
Hello, Is it possible to use pytables in a parallel or threaded environment? I am looking at developing an application that will support live additions of data and at the same time allow data to be extracted for reporting purposes. I would not be concerned with locking because I see only one thread/process that would add data. There would be one or more threads/processes that would allow queries to be run against the dataset whose results would be returned for live reporting. Thanks, David Sumner |
From: Francesc A. <fa...@py...> - 2004-05-06 17:50:45
|
Hi Chuck, A Dijous 06 Maig 2004 18:53, chuck clark va escriure: > Now, it is the end of the month and I'd like to do a summary for the > month. I've been able to generate that monthly summary for all statistics > except one from the daily summaries. However, for this last stat I need > to look at all datapoints. Is there a way that I can "link" tables so > that when I traverse all the rows in _20040301 my row iterator > automatically continues on to _20040302? > > I can do it with a for loop or even write a facade that makes a list of > tables look like one table but I wanted to make sure this wasn't a bit of > functionality that already exists which I am missing. I understand what you want to do, but this is not currently implemented in PyTables, so you will need to write the code for yourself (it shouldn't be complicated, I guess). I am pondering to add soon reference support in pytables, and eventually support this kind of table-chain iterator. No promises, though. > Do I read this incorrectly, even if you had the time you couldn't truly > implement it on top of HDF5 since HDF5 doesn't support it yet. No, you read well, but I think this documentation is outdated (HDF5 1.4). Go to this one: http://hdf.ncsa.uiuc.edu/HDF5/doc/UG/index.html and follow the "Datatypes" hyperlink. This documentation is updated to HDF5 1.6.2 and seems like if time and data support *is* there. Cheers, -- Francesc Alted |
From: chuck c. <cc...@zi...> - 2004-05-06 16:53:32
|
Quick question...I have a datafile with the structure root.service._20040301 ._20040302 .... ._20040331 Where _YYYYMMDD is a table full of stats detailing the performance of the particular service over a day at one minute intervals. Each day I do some analysis on the table for that day for all the services I'm monitoring and it has worked out really well. Now, it is the end of the month and I'd like to do a summary for the month. I've been able to generate that monthly summary for all statistics except one from the daily summaries. However, for this last stat I need to look at all datapoints. Is there a way that I can "link" tables so that when I traverse all the rows in _20040301 my row iterator automatically continues on to _20040302? I can do it with a for loop or even write a facade that makes a list of tables look like one table but I wanted to make sure this wasn't a bit of functionality that already exists which I am missing. Also Francesc, in the last email you replied to me that you hadn't had time to incorporate timestamps into PyTables yet. I went back to the apache site and found this link: http://hdf.ncsa.uiuc.edu/HDF5/doc/H5.intro.html#Intro-ODatasets where it says: "Atomic classes include integer, float, date and time, string, bit field, and opaque. (Note: Only integer, float and string classes are available in the current implementation.) " Do I read this incorrectly, even if you had the time you couldn't truly implement it on top of HDF5 since HDF5 doesn't support it yet. cheers, chuck |
From: Francesc A. <fa...@py...> - 2004-04-28 21:31:17
|
Hi Chuck, A Dimecres 28 Abril 2004 19:15, chuck clark va escriure: > So far everything is working out well but I can't say that I've used > PyTables enough yet to make any suggestions. As I get deeper into it I'll > be sure to post. Also I'd like to express my thanks for the great > documentation that goes along with this project. It is responsible for me > getting this far without having to post to the list. Documentation is an area that I very much want to care about. I'm happy that you like it :). > However, I do have a few questions: > Is there a way to get a Column of a table returned as a numarray so I can > computed means and standard deviations with NumPy? Or do I just read the > column as a Python list and then create a numarray out of it? There are several ways to do that. I consider the one introduced in the latest release (0.8) to be the most convenient in this case. Say that your column is named "energy" and you want to compute the mean and standard deviation. Look at that: >>> fileh = tables.openFile("test.h5", "r") >>> fileh.root.tuple0.cols.energy /tuple0.cols.energy (Column(1,), Float64) # elements of column are scalars >>> fileh.root.tuple0.cols.energy[:].nelements() 20000 # 20,000 elements in the columns >>> fileh.root.tuple0.cols.energy[:].mean() 19999.0 # self-explanatory >>> fileh.root.tuple0.cols.energy[:].stddev() 11547.294055318762 # self-explanatory This is explained in documentation too (with some examples) in: http://pytables.sourceforge.net/html-doc/usersguide4.html#subsection4.5.5 and http://pytables.sourceforge.net/html-doc/usersguide4.html#subsection4.5.6 > > One of my fields is a timestamp. I don't see such a datatype in Appendix > A. However, upon digging further it appears that timestamps are supported > by the HDF5 spec but have not yet been implemented. Is this correct? Yeah, it is. I've have not (free) time to implement that yet. > If so, how are other people getting around this? I use the excellent > mx.DateTime library and am heading down the path of calling .ticks() on > any timestamp fields and storing that as a Float32Col. That could be a nice workaround, of course. Regards, -- Francesc Alted |
From: chuck c. <cc...@zi...> - 2004-04-28 17:15:43
|
I just started using PyTables last week and I'd not come across HDF5 before so I'm still getting up to speed. However, I think using PyTables to complement a system I already have is going to work out well. I am responsible for the backend of a large website. This backend uses Jini and is distributed across 20 machines and could be classified as a Services Oriented Architecture. Each machine has at least one VM (but possibly up to four VMs) running and each VM has anywhere from 15-30 services running. Each VM is printing out performance statistics (total number of requests, number of successful, failed or active requests, processing time for each request, time spent waiting for other backend systems to respond etc) for every running service at one minute intervals. As you can imagine this is a lot of data but we have a strict SLA which in addition to specifying uptime requirements has response time requirements. Each month I generate a report based on these files. Initially this was done entirely in Python. Then I moved to loading these files into MySQL and then into PostgreSQL. Now I've decided to store the actual log files in an HDF5 format and use PyTables to compute hourly, daily, weekly and monthly "roll-ups" of averages and standard deviations of response times. These roll-ups will be stored in PostgreSQL since many people query them. Relatively few people query out a log file entry for a specific minute. Currently I'm converting the log files from a pipe delimited file to HDF5 using PyTables. Eventually I'd like to have the application generate the file in HDF5 format to avoid the transform step. Then I plan to use PyTables to compute the "roll-ups". So far everything is working out well but I can't say that I've used PyTables enough yet to make any suggestions. As I get deeper into it I'll be sure to post. Also I'd like to express my thanks for the great documentation that goes along with this project. It is responsible for me getting this far without having to post to the list. However, I do have a few questions: Is there a way to get a Column of a table returned as a numarray so I can computed means and standard deviations with NumPy? Or do I just read the column as a Python list and then create a numarray out of it? One of my fields is a timestamp. I don't see such a datatype in Appendix A. However, upon digging further it appears that timestamps are supported by the HDF5 spec but have not yet been implemented. Is this correct? If so, how are other people getting around this? I use the excellent mx.DateTime library and am heading down the path of calling .ticks() on any timestamp fields and storing that as a Float32Col. Thanks, chuck |
From: Tom H. <T.D...@ba...> - 2004-04-20 23:06:37
|
Dear Francesc and Pytables users, Just a quick message to say there is a patch uploaded to provide support for complex datatypes in arrays (see patch 928606 for more details). In a pytables file the complex type (single or double precision) is represented as a hdf5 compound class with 2 float members: "r" and "i", that contain the real an impaginary parts respectively. This is analogous to the numarray representation: e.g. struct {double r; double i;} for double precision. I've sucessfully been using pytables with this patch for a couple of weeks with complex arrays of the Array type, however, it also works with VLArrays and EArrays. For me, it's a very convenient way of storing arrays that work across machines with different endianness. In particular, I'm storing arrays of of electromagnetic field data generated by a mode solver for hollow-core photonic crystal fibres that i'm working on for my Ph.D. e.g. see http://www.opticsexpress.org/abstract.cfm?URI=OPEX-11-22-2854 I've tried to make this a useful contribution by documenting the changes in the userguide and extending the tests - let me know if anything is missing. Also, the patch is still valid against yesterdays cvs version. Hope you find it useful, Tom. |
From: Hanneke S. <H.S...@ge...> - 2004-04-07 12:58:02
|
Dear all, I wanted to let you know that the solution proposed by Francesc as stated below works for my problem concerning reading image data in H5 format! Thanks a lot for your quick help Francesc! Regards, Hanneke Schuurmans >On another hand, if all you want is to access only to the data as a >2-dimensional array, without the advantages that represents the IMAGE >object, you can apply the next patch: > >--- tables/Group.py.orig 2004-04-06 19:33:57.000000000 +0200 >+++ tables/Group.py 2004-04-06 19:34:00.000000000 +0200 >@@ -219,6 +219,8 @@ > return Table() > elif class_ == "ARRAY": > return Array() >+ elif class_ == "IMAGE": >+ return Array() > elif class_ == "EARRAY": > return EArray() > elif class_ == "VLARRAY": |
From: Francesc A. <fa...@py...> - 2004-04-06 17:41:05
|
Hello Hanneke, A Dimarts 06 Abril 2004 12:25, Hanneke Schuurmans va escriure: > Dear Francesc (and others), > > I need help on the following: > For my research I have to edit radar images. These images are stored in > HDF5 format. However when I want to > > open the file I get the error code as printed below. > The radar image is stored in /image1/image_data and consist of 8-bit data, > stored in a 2-dimensional array > > of 256 rows x 256 columns. I want to save only the image data in a matrix. > > I assume that the image dataset is not yet implemented in pytables. Could > you tell me if this can be done Yes, you are right. The IMAGE dataset is a special dataset that is implemented in the hdf5_hl (HDF5 High Level) library. See http://hdf.ncsa.uiuc.edu/HDF5/hdf5_hl/ and http://hdf.ncsa.uiuc.edu/HDF5/doc/ADGuide/ImageSpec.html for more info on that. Unluckly, PyTables implements all the datasets in hdf5_hl (ARRAYS and TABLES), except the IMAGE ones. Conceptually, it's not difficult to add this support, but it should represent a fair amount of work. If you want to have a try, look at the "Table" class in pytables/src/hdf5Extension.pyx and see how it links with the functions of the pytables/src/H5TB.c (a modified version of the file that comes with hdf5_hl library). I would suggest to add a new extension class (doing it in Pyrex is far more easy that writing it in pure C) called "Image", and provide methods for interacting with the features of the desired functions in hdf5_hl/src/H5IM.c (where the IMAGE code is). Then add a class in Python to wrap the Image extension class (you can follow the pytables/tables/Table.py as an example). You can complete your work by adding some unit tests for the new class (look at pytables/test/test_tables.py for a guide) and you are done (well, almost, because you should document your work and send it to me so as to include this support in the main PyTables distribution so that others users may benefit on that ;-). Other thing is that the IMAGE dataset seems to need support for multidimensional Attributes. Although this is not yet supported officially, however it should be not difficult at all to implement (see pytables/src/hdf5Extension.pyx:AttributeSet._g_getNodeAttr method). Of course, this kind of support is available under a commercial agreement. On another hand, if all you want is to access only to the data as a 2-dimensional array, without the advantages that represents the IMAGE object, you can apply the next patch: --- tables/Group.py.orig 2004-04-06 19:33:57.000000000 +0200 +++ tables/Group.py 2004-04-06 19:34:00.000000000 +0200 @@ -219,6 +219,8 @@ return Table() elif class_ == "ARRAY": return Array() + elif class_ == "IMAGE": + return Array() elif class_ == "EARRAY": return EArray() elif class_ == "VLARRAY": and perhaps it would work (beware, I haven't tested it my self!) Regards, -- Francesc Alted |
From: Hanneke S. <H.S...@ge...> - 2004-04-06 10:25:39
|
Dear Francesc (and others), I need help on the following: For my research I have to edit radar images. These images are stored in HDF5 format. However when I want to open the file I get the error code as printed below. The radar image is stored in /image1/image_data and consist of 8-bit data, stored in a 2-dimensional array of 256 rows x 256 columns. I want to save only the image data in a matrix. I assume that the image dataset is not yet implemented in pytables. Could you tell me if this can be done easily or could you maybe tell me how I could do this "myself" (I have some colleagues at university who are skilled in python.) If you want I can send you an example of a radar h5-file. Kind regards, Hanneke Schuurmans >>> from numarray import * >>> from tables import* >>> f=openFile('I:/HDF5/radar.h5') C:\Python23\lib\site-packages\tables\File.py:188: UserWarning: 'I:/HDF5/radar.h5' does exist, is an HDF5 file, but has not a PyTables format. Trying to guess what's here from HDF5 metadata. I can't promise you getting the correct objects, but I will do my best!. warnings.warn( \ Info: Can't deal with multidimensional attribute 'geo_product_corners' in node 'geographic'. Info: Can't deal with multidimensional attribute 'radar_location' in node 'radar1'. Info: Can't deal with multidimensional attribute 'radar_location' in node 'radar2'. C:\Python23\lib\site-packages\tables\Group.py:229: UserWarning: Class ID 'IMAGE' for Leaf /image1/image_data is unknown and will become <UnImplemented> type. (class_, self._g_join(name)), UserWarning) >>> f File(filename='I:/HDF5/radar.h5', title=None, mode='r', trMap={}, rootUEP='/', filters=Filters()) / (Group) None /geographic (Group) None /image1 (Group) None /image1/image_data (UnImplemented(256, 256), zlib(9)) None NOTE: <The UnImplemented object represents a PyTables unimplemented dataset present in the 'I:/HDF5/radar.h5' HDF5 file. If you wanna see this kind of HDF5 dataset implemented in PyTables, please, contact the developers.> /overview (Group) None /radar1 (Group) None /radar2 (Group) None /image1/calibration (Group) None /image1/statistics (Group) None /geographic/map_projection (Group) None ir. J.M. (Hanneke) Schuurmans (PhD student) Department of Physical Geography Faculty of Geosciences, Utrecht University, The Netherlands E-mail: h.s...@ge... Telephone: +31(0)30-2532988 (direct), +31(0)30-2532749 (secretary) Fax: +31(0)30-2531145 Visiting address: Heidelberglaan 2, Utrecht Postal address: P.O.Box 80.115, 3508 TC Utrecht |
From: Francesc A. <fa...@py...> - 2004-04-05 17:45:54
|
Hi David, A Divendres 02 Abril 2004 21:43, David Sumner va escriure: > I am using PyTables 0.8 and am setting a dictionary as an attribute on a > table. > > When the table is created, I add a dictionary as an attribute of the > table like so: > site.attrs.totalTime = {entry.day: 0} > > Later I modify the value like so: > Site.attrs.totalTime[entry.day] += timespan > > After I have done all my processing, I flush the hdf5 file and then > close it. When I open it up and get the table, the site.attrs.totalTime > variable exists, the key I added also exists, but the value is set to 0, > instead of the last value it contained before the last closing. While > modifying the value of the totalTime dictionary I am printing the new > value to the screen for debugging purposes and I am seeing that the > value is being modified. No, you should be lost in some place. Right now, you can't update an attribute by modifying the python space of it. I.e. the next does not work (procedure 1): >>> t1.attrs.test2 = {"day":0} >>> t1.attrs.test2 {'day': 0} >>> t1.attrs.test2["day"] += 1 >>> t1.attrs.test2 {'day': 0} # Same value for "day" key! If you want to do that, you should do (procedure 2): >>> test2 = t1.attrs.test2 >>> test2 {'day': 0} >>> test2["day"] += 1 >>> t1.attrs.test2 = test2 >>> t1.attrs.test2 {'day': 1} # Value correctly updated This is a consequence of the fact that, when t1.attrs.test2 is called, a python object (in this case, a dicitonary) is returned. If you try to update that object in the way "t1.attrs.test2["day"] += 1", only the returned object in memory will be updated, not on disk. So, if you want to update parts of complex attributes like dictionaries or lists, you will have to use procedure 2. Perhaps it would be better to document better this little trick. Regards, -- Francesc Alted |
From: David S. <da...@dr...> - 2004-04-02 19:43:55
|
I am using PyTables 0.8 and am setting a dictionary as an attribute on a table. When the table is created, I add a dictionary as an attribute of the table like so: site.attrs.totalTime = {entry.day: 0} Later I modify the value like so: Site.attrs.totalTime[entry.day] += timespan After I have done all my processing, I flush the hdf5 file and then close it. When I open it up and get the table, the site.attrs.totalTime variable exists, the key I added also exists, but the value is set to 0, instead of the last value it contained before the last closing. While modifying the value of the totalTime dictionary I am printing the new value to the screen for debugging purposes and I am seeing that the value is being modified. It is odd that it is keeping the table attribute and the key that is created, but the value of that key in the dictionary is being lost. I have installed and use pytables 0.7.2 with my same code and have seen the same issue. Is there something I am doing wrong? Thanks, David |
From: Francesc A. <fa...@py...> - 2004-03-17 08:49:12
|
Ooops, I forgot to announce this issue here! ---------- Missatge transmes ---------- Subject: PyTables & numarray 0.9 warning Date: Monday 15 March 2004 14:18 From: Francesc Alted <fa...@py...> To: pyt...@py... Hi, Many of you will know that a new version of numarray (0.9) has been released past week. This new version has a number of cool features (specially, being faster in certain situations that affect directly to PyTables efficency :-). Unfortunately, the new version of numarray has deprecated the "buffer" keyword on the array() constructor, and that precise keyword was used in PyTables (just in one line). I've uploaded a new version of PyTables in the SourceForge repository with a cure on that. Please, if you have downloaded PyTables *before* March, 9th, download again from the PyTables web site (http://www.pytables.org) and rebuild the software (or install the new autoinstallable binaries for Windows). If you don't feel like having to do that, you can just apply this patch to get rid of the problem: -8<--8<--8<--8<--8<--8<--8<--8<--8<--8<--8<--8<--8<--8<--8<--8<--8<--8<--8<- --- tables/Array.py.orig 2004-03-15 13:56:01.000000000 +0100 +++ tables/Array.py 2004-03-15 13:56:13.000000000 +0100 @@ -483,7 +483,7 @@ if repr(self.type) == "CharType": arr = strings.array(None, itemsize=self.itemsize, shape=shape) else: - arr = numarray.array(buffer=None, type=self.type, shape=shape) + arr = numarray.array(None, type=self.type, shape=shape) # Set the same byteorder than on-disk arr._byteorder = self.byteorder # Protection against reading empty arrays -8<--8<--8<--8<--8<--8<--8<--8<--8<--8<--8<--8<--8<--8<--8<--8<--8<--8<--8<- My apologies for the inconveniences, -- Francesc Alted |
From: Francesc A. <fa...@py...> - 2004-03-11 17:18:19
|
Hi, I'm thinking on hanging some user quotes about PyTables on its web. May you send some sentence or two about your opinion on it and what you are using it for? Come on, contribute with a quote! Cheers, -- Francesc Alted |
From: Francesc A. <fa...@py...> - 2004-03-08 11:56:33
|
Hi Bernard, A Monday 08 March 2004 10:01, Bernard Kaplan va escriure: > Dear community, > > I have to develop a program that performs numerical analysis on data > that come from a fab production line. Every month I can count on > approximately 100 000 new entries. Each entry is composed on the one > hand of general information (such as date, machine, ...) and on the > other hand of raw data that we measure (a matrix of size 2000x1000 or > more). So far I gather the general information in a relational database > (firebird - kinterbasdb) and the data are just kept in individual files. > I appreciate the database because I can sort my data on the different > columns of my table and I can perform fast search to organize my huge > number of entries. But I also realize that the numerical treatment that > will follow will become quite cumbersome. This is why I am interested in > PyTables (to be honest I am also interested in PyTables because I trully > hate SQL and love Python) > > Here are my questions: > - can I replace my database with PyTables ? Well, it depends. Normally PyTables is not designed to work as RDB replacement, but rather as a helper of it (or alone if you don't need relational or indexing capabilities). Read behind for a better explanation. > - is it possible to sort efficiently (meaning fast) a table in PyTables > along a specific column ? How ? It is possible, but you need to do some hacking. You can read the column, then sort it with the numarray.argsort function (http://stsdas.stsci.edu/numarray/numarray-0.8.html/node33.html) to get the sorted indices, then rewrite the table following this new order. However, this will only work for columns that fits in-memory. An out-of-core algorithm for doing the same could be done if there is enough interest. > - does the concept of primary key in a database exist in PyTables ? I > use primary key to avoid inserting two times the same row in my table. No, the only primary key is the row number > Is there an equivalent way to do it in PyTables? What about caching the primary key in a list and checking if the element already exists on it before adding to the table? > - how does PyTables compare with relational databases such as Firebird, > SQLite,... in terms of performance ? See http://pytables.sourceforge.net/html/HowFast.html > - Are my questions relevant or do you instead advise me to keep to > relational database ? Completely relevant. I would advise you to combine an RDB an PyTables and get the best of the two worlds. Regards, -- Francesc Alted |
From: Bernard K. <cha...@be...> - 2004-03-08 09:08:46
|
Dear community, I have to develop a program that performs numerical analysis on data that come from a fab production line. Every month I can count on approximately 100 000 new entries. Each entry is composed on the one hand of general information (such as date, machine, ...) and on the other hand of raw data that we measure (a matrix of size 2000x1000 or more). So far I gather the general information in a relational database (firebird - kinterbasdb) and the data are just kept in individual files. I appreciate the database because I can sort my data on the different columns of my table and I can perform fast search to organize my huge number of entries. But I also realize that the numerical treatment that will follow will become quite cumbersome. This is why I am interested in PyTables (to be honest I am also interested in PyTables because I trully hate SQL and love Python) Here are my questions: - can I replace my database with PyTables ? - is it possible to sort efficiently (meaning fast) a table in PyTables along a specific column ? How ? - does the concept of primary key in a database exist in PyTables ? I use primary key to avoid inserting two times the same row in my table. Is there an equivalent way to do it in PyTables? - how does PyTables compare with relational databases such as Firebird, SQLite,... in terms of performance ? - Are my questions relevant or do you instead advise me to keep to relational database ? Thanks a lot for your answers. Bernard Kaplan |
From: Francesc A. <fa...@py...> - 2004-03-03 13:20:36
|
Announcing PyTables 0.8 =2D---------------------- I'm happy to announce the availability of PyTables 0.8. PyTables is a hierarchical database package designed to efficiently manage very large amounts of data. PyTables is built on top of the HDF5 library and the numarray package. It features an object-oriented interface that, combined with natural naming and C-code generated from Pyrex sources, makes it a fast, yet extremely easy-to-use tool for interactively saving and retrieving very large amounts of data. It also provides flexible indexed access on disk to anywhere in the data. PyTables is not designed to work as a relational database competitor, but rather as a teammate. If you want to work with large datasets of multidimensional data (for example, for multidimensional analysis), or just provide a categorized structure for some portions of your cluttered RDBS, then give PyTables a try. It works well for storing data from data acquisition systems (DAS), simulation software, network data monitoring systems (for example, traffic measurements of IP packets on routers), working with very large XML files or as a centralized repository for system logs, to name only a few possible uses. =20 In this release you will find: - Variable Length Arrays (VLA's) for saving a collection of variable length of elements in each row of an array. - Extensible Arrays (EA's) for extending homogeneous datasets on disk. - Powerful replication capabilities, ranging from single leaves up to complete hierarchies. - With the introduction of the UnImplemented class, greatly=20 improved HDF5 native file import capabilities. - Two new useful utilities: ptdump & ptrepack. - Improved documentation (with the help of Scott Prater). - New record on data size achieved: 5.6 TB (!) in one single file. - Enhanced platform support. New platforms: MacOSX, FreeBSD, Linux64, IRIX64 (yes, a clean 64-bit port is there) and probably more. - More tests units (now exceeding 800). - Many other minor improvements. More in detail: What's new =2D---------- - The new VLArray class enables you to store large lists of rows=20 containing variable numbers of elements. The elements can=20 be scalars or fully multimensional objects, in the PyTables=20 tradition. This class supports two special objects as rows:=20 Unicode strings (UTF-8 codification is used internally) and=20 generic Python objects (through the use of cPickle). - The new EArray class allows you to enlarge already existing multidimensional homogeneous data objects. Consider it an extension of the already existing Array class, but=20 with more functionality. Online compression or other filters=20 can be applied to EArray instances, for example. Another nice feature of EA's is their support for fully multidimensional data selection with extended slices. You can write "earray[1,2:3,...,4:200]", for example, to get the desired dataset slice from the disk. This is implemented using the powerful selection capabilities of the HDF5 library, which results in very highly efficient I/O operations. The same functionality has been added to Array objects as well. - New UnImplemented class. If a dataset contains unsupported datatypes, it will be associated with an UnImplemented instance, then inserted into to the object tree as usual. This allows you to continue to work with supported objects while retaining access to attributes of unsupported datasets. This has changed from previous versions, where a RuntimeError occurred when an unsupported object was encountered. The combination of the new UnImplemented class with the=20 support for new datatypes will enable PyTables to greatly=20 increase the number of types of native HDF5 files that can be read and modified. - Boolean support has been added for all the Leaf objects. - The Table class has now an append() method that allows you to save large buffers of data in one go (i.e. bypassing the Row accessor). This can greatly improve data gathering speed. - The standard HDF5 shuffle filter (to further enhance the compression level) is supported. - The standard HDF5 fletcher32 checksum filter is supported. - As the supported number of filters is growing (and may be further increased in the future), a Filters() class has been introduced to handle filters more easily. In order to add support for this class, it was necessary to make a change in the createTable() method that is not backwards compatible: the "compress" and "complib" parameters are deprecated now and the "filters" parameter should be used in their place. You will be able to continue using the old parameters (only a Deprecation warning will be issued) for the next few releases, but you should migrate to the new version as soon as possible. In general, you can easily migrate old code by substituting code in its place: =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0table =3D fileh.createTable(g= roup, 'table', Test, '', =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0complevel, complib) =A0should be replaced by =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0table =3D fileh.createTable(g= roup, 'table', Test, '', =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0Filters(complevel, compl= ib)) - A copy() method that supports slicing and modification of =A0=A0=A0=A0=A0=A0=A0=A0=A0filtering capabilities has been added for all t= he Leaf =A0=A0=A0=A0=A0=A0=A0=A0=A0objects. See the User's Manual for more informa= tion. - A couple of new methods, namely copyFile() and copyChilds(), =A0=A0=A0=A0=A0=A0=A0=A0=A0have been added to File class, to permit easy r= eplication =A0=A0=A0=A0=A0=A0=A0=A0=A0of complete hierarchies or sub-hierarchies, eve= n to =A0=A0=A0=A0=A0=A0=A0=A0=A0other files. You can change filters during the = copy =A0=A0=A0=A0=A0=A0=A0=A0=A0process as well. - Two new utilities has been added: ptdump and =A0=A0=A0=A0=A0=A0=A0=A0=A0ptrepack. The utility ptdump allows the user to= examine=20 the=A0contents of PyTables files (both metadata and actual =A0=A0=A0=A0=A0=A0=A0=A0=A0data). The powerful ptrepack utility lets you=20 =A0=A0=A0=A0=A0=A0=A0=A0=A0selectively copy (portions of) hierarchies to s= pecific =A0=A0=A0=A0=A0=A0=A0=A0=A0locations in other files. It can be also used a= s an =A0=A0=A0=A0=A0=A0=A0=A0=A0importer for generic HDF5 files. =A0=A0=A0=A0=A0=A0=A0- The meaning of the stop parameter in read() methods= has =A0=A0=A0=A0=A0=A0=A0=A0=A0changed. Now a value of 'None' means the last r= ow, and a =A0=A0=A0=A0=A0=A0=A0=A0=A0value of 0 (zero) means the first row. This is = more =A0=A0=A0=A0=A0=A0=A0=A0=A0consistent with the range() function in python = and the =A0=A0=A0=A0=A0=A0=A0=A0=A0__getitem__() special method in numarray. - The method Table.removeRows() is no longer limited by table=20 size. You can now delete rows regardless of the size of the=20 table. - The "numarray" value has been added to the flavor parameter =A0=A0=A0=A0=A0=A0=A0=A0=A0in the Table.read() method for completeness. - The attributes (.attr instance variable) are Python =A0=A0=A0=A0=A0=A0=A0=A0=A0properties now. Access to their values is no lo= nger =A0=A0=A0=A0=A0=A0=A0=A0=A0lazy, i.e. you will be able to see both system = or user =A0=A0=A0=A0=A0=A0=A0=A0=A0attributes from the command line using the tab-= completion =A0=A0=A0=A0=A0=A0=A0=A0=A0capability of your python console (if enabled). - Documentation has been greatly improved to explain all the =A0=A0=A0=A0=A0=A0=A0=A0=A0new functionality. In particular, the internal = format of =A0=A0=A0=A0=A0=A0=A0=A0=A0PyTables is now fully described. You can now bu= ild =A0=A0=A0=A0=A0=A0=A0=A0=A0"native" PyTables files using any generic HDF5= =A0software=20 by just duplicating their format. - Many new tests have been added, not only to check new =A0=A0=A0=A0=A0=A0=A0=A0=A0functionality but also to more stringently chec= k=20 =A0=A0=A0=A0=A0=A0=A0=A0=A0existing functionality. There are more than 800= different =A0=A0=A0=A0=A0=A0=A0=A0=A0tests now (and the number is increasing :). - PyTables has a new record in the data size that fits in one single file: more than 5 TB (yeah, more than 5000 GB), that accounts for 11 GB compressed, has been created on an AMD Opteron machine running Linux-64 (the 64 bits version of the Linux kernel). See the gory details in: http://pytables.sf.net/html/HowFast.html. - New platforms supported: PyTables has been compiled and tested under Linux32 (Intel), Linux64 (AMD Opteron and Alpha), Win32 (Intel), MacOSX (PowerPC), FreeBSD (Intel), Solaris (6, 7, 8 and 9 with UltraSparc), IRIX64 (IRIX 6.5 with R12000) and it probably works in many more architectures. In particular, release 0.8 is the first one that provides a relatively clean porting to 64-bit platforms. - As always, some bugs have been solved (especially bugs that =A0=A0=A0=A0=A0=A0=A0=A0=A0occur when deleting and/or overwriting attribut= es). - And last, but definitely not least, a new donations section has been=A0added to the PyTables web site (http://sourceforge.net/projects/pytables, then follow the "Donations" tag). If you like PyTables and want this effort to continue, please, donate! What is a table? =2D--------------- A table is defined as a collection of records whose values are stored in fixed-length fields. All records have the same structure and all values in each field have the same data type. =A0The terms "fixed-length" and "strict data types" seem to be quite a strange requirement for an language like Python that supports dynamic data types, but they serve a useful function if the goal is to save very large quantities of data (such as is generated by many scientific applications, for example) in an efficient manner that reduces demand on CPU time and I/O resources. What is HDF5? =2D------------ =46or those people who know nothing about HDF5, it is is a general purpose library and file format for storing scientific data made at NCSA. HDF5 can store two primary objects: datasets and groups. A dataset is essentially a multidimensional array of data elements, and a group is a structure for organizing objects in an HDF5 file. Using these two basic constructs, one can create and store almost any kind of scientific data structure, such as images, arrays of vectors, and structured and unstructured grids. You can also mix and match them in HDF5 files according to your needs. Platforms =2D-------- I'm using Linux (Intel 32-bit) as the main development platform, but PyTables should be easy to compile/install on many other UNIX machines. This package has also passed all the tests on a UltraSparc platform with Solaris 7 and Solaris 8. It also compiles and passes all the tests on a SGI Origin2000 with MIPS R12000 processors, with the MIPSPro compiler and running IRIX 6.5. It also runs fine on Linux 64-bit platforms, like an AMD Opteron running SuSe Linux Enterprise Server. It has also been tested in MacOSX platforms (10.2 but should also work on newer versions). Regarding Windows platforms, PyTables has been tested with Windows 2000 and Windows XP (using the Microsoft Visual C compiler), but it should also work with other flavors as well. An example? =2D---------- =46or online code examples, have a look at http://pytables.sourceforge.net/html/tut/tutorial1-1.html and, for newly introduced Variable Length Arrays: http://pytables.sourceforge.net/html/tut/vlarray2.html Web site =2D------- Go to the PyTables web site for more details: http://pytables.sourceforge.net/ Share your experience =2D-------------------- Let me know of any bugs, suggestions, gripes, kudos, etc. you may have. Have fun! =2D- Francesc Alted fa...@py... |
From: Francesc A. <fa...@op...> - 2004-02-06 08:47:42
|
Hi Joanna, A Dijous 05 Febrer 2004 02:31, Muench, Joanna va escriure: > I'm running into two problems building the Description classes. The first > is that I won't know until run-time what the size of the arrays I want to > store are. Therefore I need to be able to define the shapes of columns > dynamically before instantiating the Table object. It seems that there > should be a clever way of doing this, but I'm just not seeing it (another > metaclass?). The description parameter of the createTable method (see section 4.4.2 of the user's manual: http://pytables.sourceforge.net/html-doc/usersguide4.html#subsection4.4.2), besides of a IsDescription instance, accept as well a dictionary and a RecArray (from the numarray package) object as metadata descriptors. Use any of these to dinamically specificy the metadata. See http://pytables.sourceforge.net/html-doc/usersguide3.html#section3.3 for an example of that (the "Event" dictionary). In that example, the raw Col() class has been used for defining the properties of columns, but you can use StringCol, IntCol, ... as well. Maybe I should made the example clearer. > > Given that I can figure out how to take care of dynamic sizing, there are > times I will be storing a list of length 1. PyTables doesn't like lists of > length 1. > > An example: > > from tables import * > > LENGTH = 12 > NUM_NAMES = 1 > > class ExampleDescription(IsDescription): > > personId = Int64Col() > ages = Float32Col(shape=LENGTH) > names = StringCol(length=16, shape=NUM_NAMES) > > if I then try to assign exampleRow['names'] = ['fred',] I get a bad value > type. Is this the intended behavior? This is happening because a shape of 1 is interpreted as an scalar (i.e. you should just write: exampleRow['names'] = 'fred'). If you want to pass a list with only one element, use better a shape = (1,). Regards, -- Francesc Alted |
From: Muench, J. <jm...@fh...> - 2004-02-05 01:31:39
|
I'm running into two problems building the Description classes. The first is that I won't know until run-time what the size of the arrays I want to store are. Therefore I need to be able to define the shapes of columns dynamically before instantiating the Table object. It seems that there should be a clever way of doing this, but I'm just not seeing it (another metaclass?). Given that I can figure out how to take care of dynamic sizing, there are times I will be storing a list of length 1. PyTables doesn't like lists of length 1. An example: from tables import * LENGTH = 12 NUM_NAMES = 1 class ExampleDescription(IsDescription): personId = Int64Col() ages = Float32Col(shape=LENGTH) names = StringCol(length=16, shape=NUM_NAMES) if I then try to assign exampleRow['names'] = ['fred',] I get a bad value type. Is this the intended behavior? Thanks in advance! Joanna |
From: Francesc A. <fa...@op...> - 2004-01-26 06:54:54
|
A Dissabte 24 Gener 2004 22:20, Mike va escriure: > Hi Francesc, > > > This is a problem with Pyrex 0.8. Pyrex 0.9 should deal with that > > just fine. > > In fact, with Pyrex 0.9, the next line is generated: > > What is Pyrex? I only saw a brief mention to it in your install package. Pyrex is a package to build python extensiona but with a Python flavor. See http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/ for more info. > Attached is the results of the test. I see. You are having problems because pytables 0.7.2 has not a clean 64 bits implementation. See this: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=226653 Rigth now, I've solved these problems (at least all the pytables tests passes now in 64 bits platforms) in the CVS repository, but, as I'm still doing some API changes on the new functionality I can't recommend you to test the new bleeding edge version. However, I've successfully tested previous versions of pytables on SGI Irix 6.5 by using the gcc (2.95 I think) compiler, not the native one. You failed to send me info about your system and compiler versions, but, if you are using the native SGI compiler, you can give the gcc a try. Or wait until pytables 0.8 will be out. Cheers, -- Francesc Alted |
From: Francesc A. <fa...@op...> - 2004-01-21 09:04:32
|
Hello Mike, A Dimecres 21 Gener 2004 05:15, vareu escriure: > Hello, I was trying to compile PyTables on an Irix box today and found 3 > "bugs" in hdf5Extension.c. These were ignored on my Linux and HP-UX > compiles, but Irix was a little sensitive to them. This os PyTables 7.2. > > The first is line 40: > > Was: staticforward char *__pyx_f[]; > Should be: staticforward char *__pyx_f[1]; //same as windows code > > Then lines 1977 and 2017 had problems trying to cast an int type to a > PyObject *. As it turned out "__pyx_1" was redefined inside their methods. > Globally it is an int, used for return codes. Inside the method it was a > pointer. This is a problem with Pyrex 0.8. Pyrex 0.9 should deal with that just fine. In fact, with Pyrex 0.9, the next line is generated: staticforward char **__pyx_f; that is more or less the same as your solution. > > Fixing those got it to compile OK (lots of warnings though). But the test > suite failed. I haven't figured that one out yet. Could you send me your *complete* test output (i.e. don't forget to pass both -v and verbose parameters), please?. Also, add some machine and compiler info. -- Francesc Alted |
From: Francesc A. <fa...@op...> - 2004-01-21 08:49:09
|
A Dimarts 20 Gener 2004 21:45, vareu escriure: > Hello, > Thank you for the quick response. Working more with pytables I have > noticed another behaviour that does not seem to be intended. I can > assign a attribute to a table like the following: table.attrs.attribute > = value. But, I am unable to change the attribute's value after the > initial definition. Even if I assign a new value I still see the initial > value when I read the value. Is there another way that I should change > the value of an attribute on a table? Yeps, that's another bug that I've already noted yesterday, and is already solved in CVS. Today I've added some additional unit tests in order to ensure that this works correctly. Until pytables 0.8 would be out, you can delete the attribute and then create it again. That should work in 0.7 series. Cheers, -- Francesc Alted |
From: Francesc A. <fa...@op...> - 2004-01-20 19:25:53
|
Hi David, You have discovered a bug in the de-serializing process of attribute objects. This is fixed in CVS now and the fix will be included in the soon-to-be-released pytables 0.8. Meanwhile, you can apply a cPickle.loads on the retrieved value. The next code would be useful to mantain forward compatibility in your code with the new pytables version: try: attrval = cPickle.loads(table.attrs.dayIndex) except: retval = table.attrs.dayIndex Regarding the other question, there are several approaches that can easy the access the data, but the one you have adopted is perfectly good. One possible improvement could be to use the datetime module to automatically convert from a date structure into the timestamp (i.e. the number resulting from a time.time() call), and then do the selections in the normal way. That way you don't have to keep the helper dictionary as an attribute. But, of course, a lookup with the dictionary can be faster. It's up to you. Cheers, A Dimarts 20 Gener 2004 18:42, David Sumner va escriure: > Hello, > I am prototyping an application with pytables acting as the storage > layer. I am wanting to add a simple index of the data being collected in > a table. Data collection for this table will occur over weeks and > months. I was wanting to store a timestamp for each day of recorded data > in a dictionary attached to this table with the day timestamp as the key > and the index of the first row being added to the table for this day as > the value. What I was hoping to accomplish was an ability to lookup a > timestamp in the dictionary, returning the row index and then utitlize > that for selection of records or removal of records for particular days. > > What I have done is added a an attribute like so: > table.attrs.dayIndex = {} > > When I first create the hdf5 file and create the groups and tables and > add data, the dayIndex attribute functions as a dictionary should. When > I flush and save this data to the hdf5 file, close and reopen the file, > the attribute becomes a string and no longer functions as a dictionary. > My question is, how may I accomplish this and after a reopening of the > hdf5 file, still utilize python dictionaries stored as attributes? > > Are there other methods that I may use that would be as or more > efficient. Like storing another table as a leaf of the table that stores > the items I want to index. > > Thanks, > David Sumner -- Francesc Alted |