You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(5) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
|
Feb
(2) |
Mar
|
Apr
(5) |
May
(11) |
Jun
(7) |
Jul
(18) |
Aug
(5) |
Sep
(15) |
Oct
(4) |
Nov
(1) |
Dec
(4) |
2004 |
Jan
(5) |
Feb
(2) |
Mar
(5) |
Apr
(8) |
May
(8) |
Jun
(10) |
Jul
(4) |
Aug
(4) |
Sep
(20) |
Oct
(11) |
Nov
(31) |
Dec
(41) |
2005 |
Jan
(79) |
Feb
(22) |
Mar
(14) |
Apr
(17) |
May
(35) |
Jun
(24) |
Jul
(26) |
Aug
(9) |
Sep
(57) |
Oct
(64) |
Nov
(25) |
Dec
(37) |
2006 |
Jan
(76) |
Feb
(24) |
Mar
(79) |
Apr
(44) |
May
(33) |
Jun
(12) |
Jul
(15) |
Aug
(40) |
Sep
(17) |
Oct
(21) |
Nov
(46) |
Dec
(23) |
2007 |
Jan
(18) |
Feb
(25) |
Mar
(41) |
Apr
(66) |
May
(18) |
Jun
(29) |
Jul
(40) |
Aug
(32) |
Sep
(34) |
Oct
(17) |
Nov
(46) |
Dec
(17) |
2008 |
Jan
(17) |
Feb
(42) |
Mar
(23) |
Apr
(11) |
May
(65) |
Jun
(28) |
Jul
(28) |
Aug
(16) |
Sep
(24) |
Oct
(33) |
Nov
(16) |
Dec
(5) |
2009 |
Jan
(19) |
Feb
(25) |
Mar
(11) |
Apr
(32) |
May
(62) |
Jun
(28) |
Jul
(61) |
Aug
(20) |
Sep
(61) |
Oct
(11) |
Nov
(14) |
Dec
(53) |
2010 |
Jan
(17) |
Feb
(31) |
Mar
(39) |
Apr
(43) |
May
(49) |
Jun
(47) |
Jul
(35) |
Aug
(58) |
Sep
(55) |
Oct
(91) |
Nov
(77) |
Dec
(63) |
2011 |
Jan
(50) |
Feb
(30) |
Mar
(67) |
Apr
(31) |
May
(17) |
Jun
(83) |
Jul
(17) |
Aug
(33) |
Sep
(35) |
Oct
(19) |
Nov
(29) |
Dec
(26) |
2012 |
Jan
(53) |
Feb
(22) |
Mar
(118) |
Apr
(45) |
May
(28) |
Jun
(71) |
Jul
(87) |
Aug
(55) |
Sep
(30) |
Oct
(73) |
Nov
(41) |
Dec
(28) |
2013 |
Jan
(19) |
Feb
(30) |
Mar
(14) |
Apr
(63) |
May
(20) |
Jun
(59) |
Jul
(40) |
Aug
(33) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
From: <ho...@gl...> - 2003-08-22 14:17:36
|
SGVsbG8sDQoNCkkgaGF2ZSBzb21lIHByb2JsZW1zIHdpdGggcHl0YWJsZXMgMC43LjEuIFRoZSBm aWxlIGZvcm1hdCBzZWVtcyB0bw0KaGF2ZSBjaGFuZ2VkLiBXaGVuIEkgcnVuIGFuIG9sZCBoNWR1 bXAgb24gYSBuZXcgcHl0YWJsZXMgZ2VuZXJhdGVkDQpmaWxlIEkgZ2V0IGxvdHMgb2YNCg0KaDVk dW1wIGVycm9yOiB1bmtub3duIG9iamVjdCAib3JkZXIiDQoNCm1lc3NhZ2VzIHdpdGggZGlmZmVy ZW50IG9iamVjdCBuYW1lcywgd2l0aCBhIDEuNiBoNWR1bXAgSSBnZXQNCg0KLi4uDQogICAgICBE QVRBU0VUICJvcmRlciIgew0KICAgICAgICAgREFUQVRZUEUgIEg1VF9TVFJJTkcgew0KICAgICAg ICAgICAgICAgU1RSU0laRSAxOw0KICAgICAgICAgICAgICAgU1RSUEFEIEg1VF9TVFJfTlVMTFRF Uk07DQogICAgICAgICAgICAgICBDU0VUIEg1VF9DU0VUX0FTQ0lJOw0KICAgICAgICAgICAgICAg Q1RZUEUgSDVUX0NfUzE7DQogICAgICAgICAgICB9ICAgICAgICAgICANCiAgICAgICAgIERBVEFT UEFDRSAgU0lNUExFIHsgKCA5LCAzNiApIC8gKCA5LCAzNiApIH0gDQouLi4NCg0KSSBjYW4ndCBz ZWUgYSBkaWZmZXJlbmNlIHRvIGRhdGFzZXRzIGFjY2VwdGVkIGJ5IHRoZSBvbGQgaDVkdW1wLCBi dXQNCnRoZXJlcyB0aGUgZXJyb3IgbWVzc2FnZS4NCg0KQW5vdGhlciBwcm9ibGVtIEknbSBzdGls bCBpbnZlc3RpZ2F0aW5nIGlzIHRoYXQgSERGNSAxLjYuMCBkb2VzIG5vdA0KYWNjZXB0IGVtcHR5 IGFycmF5cyBhcyBpbnB1dC4gSSBnZXQgDQoNCkhERjUtRElBRzogRXJyb3IgZGV0ZWN0ZWQgaW4g SERGNSBsaWJyYXJ5IHZlcnNpb246IDEuNi4wIHRocmVhZCAwLiAgQmFjayB0cmFjZSBmb2xsb3dz Lg0KICAjMDAwOiBINVMuYyBsaW5lIDE3MDggaW4gSDVTY3JlYXRlX3NpbXBsZSgpOiB6ZXJvIHNp emVkIGRpbWVuc2lvbiBmb3Igbm9uLXVubGltaXRlZCBkaW1lbnNpb24NCiAgICBtYWpvcigwMSk6 IEZ1bmN0aW9uIGFyZ3VtZW50cw0KICAgIG1pbm9yKDA1KTogQmFkIHZhbHVlDQoNCndoZW4gdHJ5 aW5nIHRvIHNhdmUNCg0KemVyb3MoKDEsIDAsIDMpLCAnZCcpDQoNCldlIGhhZCBzdWNjZXNzZnVs bHkgd3JpdHRlbiB0aGVzZSBhcnJheXMgd2l0aCBweXRhYmxlcyAwLjMgb3IgMC40IGFuZA0KSSBo YXZlbid0IGZvdW5kIHRoZSB0aW1lIHRvIHVwZ3JhZGUgdW50aWwgbm93LiBVbmZvcnR1bmF0ZWx5 IDAuNy4xDQpkb2VzIG5vdCB3b3JrIHdpdGggSERGNSAxLjQuNC4gVGhlcmUgd2FzIGEgbWlzc2lu ZyBzeW1ib2wgd2hlbg0KaW1wb3J0aW5nLiBBbSBJIHN0cnVjayB3aXRoIHB5dGFibGVzIDAuMyBv ciBpcyB0aGVyZSBhbm90aGVyIHNvbHV0aW9uPw0KDQpLaW5kIHJlZ2FyZHMNCg0KQmVydGhvbGQg SPZsbG1hbm4NCi0tIA0KR2VybWFuaXNjaGVyIExsb3lkIEFHDQpDQUUgRGV2ZWxvcG1lbnQNClZv cnNldHplbiAzNQ0KMjA0NTkgSGFtYnVyZw0KUGhvbmU6ICs0OSgwKTQwIDM2MTQ5LTczNzQNCkZh eDogKzQ5KDApNDAgMzYxNDktNzMyMA0KZS1tYWlsOiBob2VsQGdsLWdyb3VwLmNvbQ0KSW50ZXJu ZXQ6IGh0dHA6Ly93d3cuZ2wtZ3JvdXAuY29tIA0KIA0KIA0KIA0KKioqKioqKioqKioqKioqKioq KioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKiAgDQogDQpQbGVhc2Ugbm90aWNlOiBX ZSB3b3VsZCBsaWtlIHRvIGluZm9ybSB5b3UgdGhhdCB0aGUgZS1tYWlsIGFkZHJlc3Mgb2YgR2Vy bWFuaXNjaGVyIExsb3lkIGFzIHdlbGwgYXMgb3VyIGludGVybmV0IGFkZHJlc3MgaGFkIGJlZW4g Y2hhbmdlZCB0byAgZ2wtZ3JvdXAuY29tIHdpdGggZWZmZWN0IGZyb20gMXN0IE1hcmNoIDIwMDMu IA0KIA0KVGhpcyBtZWFucyB0aGF0IHRoZSBwcmV2aW91cyBhZGRyZXNzIHNob3J0bWFya0BnZXJt YW5sbG95ZC5vcmcgd2lsbCBiZSByZXBsYWNlZCBieSBzaG9ydG1hcmtAZ2wtZ3JvdXAuY29tLiBG cm9tIG5vdyBvbiB0aGUgR0wgaG9tZXBhZ2UgY2FuIGJlIGFjY2Vzc2VkIGF0IHRoZSBhZGRyZXNz ICdodHRwOi8vd3d3LmdsLWdyb3VwLmNvbScuIFRoZSBvbGQgYWRkcmVzc2VzIHJlbWFpbiB2YWxp ZCBmb3IgYSB0cmFuc2l0aW9uYWwgcGVyaW9kLiANCiANCiANCioqKioqKioqKioqKioqKioqKioq KioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKiogDQogDQogDQogDQpUaGlzIGUtbWFpbCBj b250YWlucyBjb25maWRlbnRpYWwgaW5mb3JtYXRpb24gZm9yIHRoZSBleGNsdXNpdmUgYXR0ZW50 aW9uIG9mIHRoZSBpbnRlbmRlZCBhZGRyZXNzZWUuIEFueSBhY2Nlc3Mgb2YgdGhpcmQgcGFydGll cyB0byB0aGlzIGUtbWFpbCBpcyB1bmF1dGhvcmlzZWQuIEFueSB1c2Ugb2YgdGhpcyBlLW1haWwg YnkgdW5pbnRlbmRlZCByZWNpcGllbnRzIHN1Y2ggYXMgY29weWluZywgZGlzdHJpYnV0aW9uLCBk aXNjbG9zdXJlIGV0Yy4gaXMgcHJvaGliaXRlZCBhbmQgbWF5IGJlIHVubGF3ZnVsLiBXaGVuIGFk ZHJlc3NlZCB0byBvdXIgY2xpZW50cyB0aGUgY29udGVudCBvZiB0aGlzIGUtbWFpbCBpcyBzdWJq ZWN0IHRvIHRoZSBHZW5lcmFsIFRlcm1zIGFuZCBDb25kaXRpb25zIG9mIEdMJ3MgR3JvdXAgb2Yg Q29tcGFuaWVzIGFwcGxpY2FibGUgYXQgdGhlIGRhdGUgb2YgdGhpcyBlLW1haWwuICANCiANCkdM J3MgR3JvdXAgb2YgQ29tcGFuaWVzIGRvZXMgbm90IHdhcnJhbnQgYW5kL29yIGd1YXJhbnRlZSB0 aGF0IHRoaXMgbWVzc2FnZSBhdCB0aGUgbW9tZW50IG9mIHJlY2VpcHQgaXMgYXV0aGVudGljLCBj b3JyZWN0IGFuZCBpdHMgY29tbXVuaWNhdGlvbiBmcmVlIG9mIGVycm9ycywgaW50ZXJydXB0aW9u IGV0Yy4gIA0KIA0K |
From: Francesc A. <fa...@op...> - 2003-08-14 18:46:43
|
Hello!, This is only to tell you that I'll be in vacation until September, the 2nd. During this time, I'm not going to have internet connection, so don't expect an answer to possible questions until then. Cheers, -- Francesc Alted |
From: Francesc A. <fa...@op...> - 2003-08-08 22:07:54
|
Hi everybody, I've recently uploaded the 0.7.1 version of pytables. This is a mainly a bug-fixing release, where the next problems has been addressed: - Fixed several memory leaks. After that, the memory consumption when using large object trees has dropped sensibly. However, there remains some small leaks, but hopefully they are not very important unless you use *huge* object trees. - Fixed a bug that make the __getitem__ special method in table to fail when the stop parameter in a extended slice was not specified. That is, table[10:] now correctly returns table[10:table.nrows+1], and not table[10:11]. - The removeRows() method in Table did not update the NROWS attribute in Table objects, giving place to errors after doing further updating operations (removing or adding more rows) in the same table. This has been fixed now. Apart of these fixes, a new lazy reading algorithm for attributes has been activated by default. With that, the opening of objects with large hierarchies has been improved by 60% (you can obtain another additional 10% if using python 2.3 instead of python 2.2). Let me know of any glitch I could have introduced on this release. Enjoy! -- Francesc Alted |
From: Vineet J. <vin...@ya...> - 2003-08-01 09:21:31
|
When using psyco on windowsxp with pytables I get the following warning and the execution time increases by about 20%. I'm using python 2.2.3 and psyco 1.0. Any ideas? C:\Win32app\Python22\Lib\site-packages\numarray\records.py:488: warning: eval()/ execfile() cannot see the locals in functions bound by Psyco; consider using eva l() in its two- or three-arguments form def _parseFormats(self, formats): |
From: Vineet J. <vin...@ya...> - 2003-08-01 07:31:46
|
I changed to the hd5 win2k file (based on vc6) That fixed the problem. Thanks. Vineet -----Original Message----- From: pyt...@li... [mailto:pyt...@li...] On Behalf Of Francesc Alted Sent: Thursday, July 31, 2003 4:48 PM To: Vineet Jain Cc: pyt...@li... Subject: Re: [Pytables-users] PyTables 0.7 pre-release A Dijous 31 Juliol 2003 23:12, vareu escriure: > My enviornment is windows XP pro. I installed everything as outlined in > your email. I just upgraded to pytables 0.7 and it fails the following > two test cases (test_create, and test_tree). My program was also > crashing. Here is the output from running test_all: > Mmm... Which HDF5 libraries are you using, the winxp-net, the win2k or have you compiled it yourself?. I've also detected problems with the .NET version of the libs (see the "Caveat" note at the "Binary Installation" section: http://pytables.sourceforge.net/html-doc/usersguide2.html#section2.2). > C:\programming\pytables-0.7\test>test_all.py > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > -=-= > PyTables version: 0.7 > Extension version: $Id: hdf5Extension.pyx,v 1.68 2003/07/25 14:31:57 > falted Exp > $ > HDF5 version: 1.6.0 > numarray version: 0.6 > LZO version: 1.07 (Oct 18 2000) > UCL version: 1.01 (Jan 02 2002) > Python version: 2.2.3 (#42, May 30 2003, 18:12:08) [MSC 32 bit > (Intel)] > Byte-ordering: little > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > -=-= > Numeric (version 23.0) is present. Adding the Numeric test suite. > .............................. > C:\programming\pytables-0.7\test> > > > I also wrote a small test program which crashes on 7717 iteration. It > seems to work fine for smaller number of iterations: > > from tables import * > > class Price(IsDescription): > year = Col("Float32", 1) # integer > month = Col("Int32", 1) # integer > day = Col("Int32", 1) # integer > hour = Col("Int32", 1) # integer > min = Col("Int32", 1) # integer > sec = Col("Int32", 1) # integer > open = Col("Int32", 1) # float (single-precision) > high = Col("Int32", 1) # float (single-precision) > low = Col("Int32", 1) # float (single-precision) > close = Col("Int32", 1) # double (double-precision) > volume = Col("Int32", 1) # double (double-precision) > > def main(): > # Open a file in "w"rite mode > fileh = openFile("c:/sp1.hd5", mode = "w") > > # Create a new table in newgroup group > table = fileh.createTable('/', name='table', description=Price, > compress=9) > price = table.row > > for row in range(10000): > print row > # First, assign the values to the price record > price['year'] = 0 > price['month'] = 0 > price['day'] = 0 > price['hour'] = 0 > price['min'] = 0 > price['sec'] = 0 > price['open'] = 0 > price['high'] = 0 > price['low'] = 0 > price['close'] = 0 > price['volume'] = 0 > # This injects the row values. > price.append() > > > # We need to flush the buffers in table in order to get an > # accurate number of records on it. > table.flush() > > # Finally, close the file > fileh.close() > > if __name__ == '__main__': > main() > > Let me know if you need any help. > > Thanks, > > Vineet -- Francesc Alted ------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01 /01 _______________________________________________ Pytables-users mailing list Pyt...@li... https://lists.sourceforge.net/lists/listinfo/pytables-users |
From: Francesc A. <fa...@op...> - 2003-07-31 23:48:01
|
A Dijous 31 Juliol 2003 23:12, vareu escriure: > My enviornment is windows XP pro. I installed everything as outlined in > your email. I just upgraded to pytables 0.7 and it fails the following > two test cases (test_create, and test_tree). My program was also > crashing. Here is the output from running test_all: > Mmm... Which HDF5 libraries are you using, the winxp-net, the win2k or have you compiled it yourself?. I've also detected problems with the .NET version of the libs (see the "Caveat" note at the "Binary Installation" section: http://pytables.sourceforge.net/html-doc/usersguide2.html#section2.2). > C:\programming\pytables-0.7\test>test_all.py > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > -=-= > PyTables version: 0.7 > Extension version: $Id: hdf5Extension.pyx,v 1.68 2003/07/25 14:31:57 > falted Exp > $ > HDF5 version: 1.6.0 > numarray version: 0.6 > LZO version: 1.07 (Oct 18 2000) > UCL version: 1.01 (Jan 02 2002) > Python version: 2.2.3 (#42, May 30 2003, 18:12:08) [MSC 32 bit > (Intel)] > Byte-ordering: little > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > -=-= > Numeric (version 23.0) is present. Adding the Numeric test suite. > .............................. > C:\programming\pytables-0.7\test> > > > I also wrote a small test program which crashes on 7717 iteration. It > seems to work fine for smaller number of iterations: > > from tables import * > > class Price(IsDescription): > year = Col("Float32", 1) # integer > month = Col("Int32", 1) # integer > day = Col("Int32", 1) # integer > hour = Col("Int32", 1) # integer > min = Col("Int32", 1) # integer > sec = Col("Int32", 1) # integer > open = Col("Int32", 1) # float (single-precision) > high = Col("Int32", 1) # float (single-precision) > low = Col("Int32", 1) # float (single-precision) > close = Col("Int32", 1) # double (double-precision) > volume = Col("Int32", 1) # double (double-precision) > > def main(): > # Open a file in "w"rite mode > fileh = openFile("c:/sp1.hd5", mode = "w") > > # Create a new table in newgroup group > table = fileh.createTable('/', name='table', description=Price, > compress=9) > price = table.row > > for row in range(10000): > print row > # First, assign the values to the price record > price['year'] = 0 > price['month'] = 0 > price['day'] = 0 > price['hour'] = 0 > price['min'] = 0 > price['sec'] = 0 > price['open'] = 0 > price['high'] = 0 > price['low'] = 0 > price['close'] = 0 > price['volume'] = 0 > # This injects the row values. > price.append() > > > # We need to flush the buffers in table in order to get an > # accurate number of records on it. > table.flush() > > # Finally, close the file > fileh.close() > > if __name__ == '__main__': > main() > > Let me know if you need any help. > > Thanks, > > Vineet -- Francesc Alted |
From: Francesc A. <fa...@op...> - 2003-07-30 21:05:13
|
Hi again, I've uploaded a new version of PyTables 0.7. Some small flaws in C-code has been discovered and fixed for Windows and Solaris (Scott Prater warned about the later and provided a fix, thanks!). Besides, a bug in the checking of the maximum number of columns in a table has been detected (thanks to Windows for being so delicate ;-). I've also uploaded the binaries for Windows (both Python 2.2 and 2.3). They have been created and tested with WindowsXP and seems to work well. If nobody has nothing to object, I'll be announcing this release by the end of the week. Enjoy!, -- Francesc Alted Announcing PyTables 0.7 ----------------------- PyTables is a hierarchical database package designed to efficently deal with very large amounts of data. PyTables is built on top of the HDF5 library and the numarray package and features an object-oriented interface that, combined with C-code generated from Pyrex sources, makes it a fast, yet extremely easy to use, tool for interactively save and retrieve large amounts of data. This is the third public beta release. The version 0.6 was internal and will never be released. On this release you will find: - new AttributeSet class - 25% I/O speed improvement - fully multidimensional table cells support - new column descriptors - row deletion in tables is finally here - much more! More in detail: What's new ----------- - A new AttributeSet class has been added. This will allow the addition and deletion of generic attributes (any scalar type plus any Python object supported by Pickle) as easy as this: table.attrs.date = "2003/07/28 10:32" # Attach a string to table group._v_attrs.tempShift = 1.2 # Attach a float to group array.attrs.detectorList = [1,2,3,4] # Attach a list to array del array.attrs.detectorList # Detach detectorList attr from array - PyTables now has support for fully multidimensional table cells. This has been made possible in part by implementation of multidimensional cells in numarray.records.RecArray object. Thanks to numarray crew, and especially to Jin-chung Hsu, for willingly accepting to do that, and also for including some cache improvements in RecArray. - New column descriptors added: IntCol, Int8Col, UInt8Col, Int16Col, UInt16Col, Int32Col, UInt32Col, Int64Col, UInt64Col, FloatCol, Float32Col, Float64Col and StringCol. I think they are more explicit and easy-to-use than the now deprecated (but still supported) Col() descriptor. All the examples and user's manual has been accordingly updated. - The new Table.removeRows(start, stop) function allows you to remove rows from tables. This feature was requested a long time ago. There are still limitations, however: you cannot delete rows in extremely large Tables (as the remaining rows after the stop parameter are stored in memory). Nor is the performance optimized. These issues will hopefully be addressed in future releases. - Added iterators to File, Group and Table (they now support the special __iter__() method). They make the object much more user-friendly, especially in interactive mode. See documentation for usage examples. - Added a __getitem__() method to Table that works more or less like read(), but with extended slices support. - As a consequence of rewriting table iterators in C (with the help of Pyrex, of course) the table read performance has been improved between 20% and 30%. Data selections in PyTables are now starting to beat powerful relational databases like SQLite, even compared to in-core selects (!). I think there is still room for another 20% or 30% speed improvement, so stay tuned. - A checksum is now added automatically when using LZO (not with UCL where I'm having some difficulties implementing that capability). The Adler32 algorithm has been chosen because of its speed. With that, the compressing/decompressing speed has dropped 1% or 2%, which is hardly noticeable. I think this addition will allow the cautious user to be a bit more confident about this excellent compressor. Code has been added to be able to read files created without this checksum (so you can be confident that you will be able to read your existing files compressed with LZO and UCL). - Recursion has been removed from PyTables. Before, this made the maximum depth tree to be less than the Python recursion limit (which depends on implementation, but is around 900, at least in Linux). Now, the limit has been set (somewhat arbitrarily) at 2048. Thanks to John Nielsen for implementing the new iterative method!. - A new rootUEP parameter to openFile() has been added. You can now define the root from which you want to start to build the object tree. Thanks to John Nielsen for the suggestion and a first implementation. - Some (non-serious) bugs were discovered and fixed. - Updated documentation to explain all these new bells and whistles. It is also available on the web: http://pytables.sourceforge.net/html-doc/usersguide-html.html - Added more unit tests (more than 350 now!) - PyTables 0.7 *needs* the newest numarray 0.6 and HDF-1.6.0 to compile and work. It has been tested with Python 2.2.3 and Python 2.3 and should work fine on both versions. What is a table? ---------------- A table is defined as a collection of records whose values are stored in fixed-length fields. All records have the same structure and all values in each field have the same data type. The terms "fixed-length" and "strict data types" seems to be quite a strange requirement for an language like Python, that supports dynamic data types, but they serve a useful function if the goal is to save very large quantities of data (such as is generated by many scientific applications, for example) in an efficient manner that reduces demand on CPU time and I/O resources. Quite a bit effort has been invested to make browsing the hierarchical *data structure a pleasant experience. PyTables implements a few easy-to-use methods for browsing. What is HDF5? ------------- For those people who know nothing about HDF5, it is is a general purpose library and file format for storing scientific data made at NCSA. HDF5 can store two primary objects: datasets and groups. A dataset is essentially a multidimensional array of data elements, and a group is a structure for organizing objects in an HDF5 file. Using these two basic constructs, one can create and store almost any kind of scientific data structure, such as images, arrays of vectors, and structured and unstructured grids. You can also mix and match them in HDF5 files according to your needs. Platforms --------- I'm using Linux as the main development platform, but PyTables should be easy to compile/install on other UNIX machines. This package has also passed all the tests on a UltraSparc platform with Solaris 7 and Solaris 8. It also compiles and passes all the tests on a SGI Origin2000 with MIPS R12000 processors and running IRIX 6.5. Regarding Windows platforms, PyTables has been tested with Windows 2000 and Windows XP, but it should also work with other flavors. An example? ----------- For online code examples, have a look at http://pytables.sourceforge.net/tut/tutorial1-1.html and http://pytables.sourceforge.net/tut/tutorial1-2.html Web site -------- Go to the PyTables web site for more details: http://pytables.sourceforge.net/ Share your experience --------------------- Let me know of any bugs, suggestions, gripes, kudos, etc. you may have. Have fun! -- Francesc Alted fa...@op... |
From: Francesc A. <fa...@op...> - 2003-07-29 19:26:41
|
Hi everybody, I'm about to release PyTables 0.7, and I would be grateful if somebody can test this version on her platform and tell me if it the tests were ok or not. Please, do not forget to tell me the platform where you have tested the package!. You can download it from the sourceforge site: http://sourceforge.net/project/showfiles.php?group_id=63486 If all goes well, I'll made the official announcement by the end of this week. I'm attaching the announcement. Hope you will enjoy the new features ;-) -- Francesc Alted Announcing PyTables 0.7 ----------------------- This is the third public beta release. The version 0.6 was internal and will never be released. On this release you will find: - new AttributeSet class - 25% I/O speed improvement - fully multidimensional table cells support - new column descriptors - row deletion in tables is finally here - much more! More in detail: What's new ----------- - A new AttributeSet class has been added. This will allow the addition and deletion of generic attributes (any scalar type plus any Python object supported by Pickle) as easy as this: table.attrs.date = "2003/07/28 10:32" # Attach a string to table group._v_attrs.tempShift = 1.2 # Attach a float to group array.attrs.detectorList = [1,2,3,4] # Attach a list to array del array.attrs.detectorList # Detach detectorList attr from array - PyTables now has fully multidimensional table cells support. This has been possible in part by the implementation of multidimensional cells in numarray.records.RecArray object. Thanks to numarray crew for their hard work!. - New column descriptors added: IntCol, Int8Col, UInt8Col, Int16Col, UInt16Col, Int32Col, UInt32Col, Int64Col, UInt64Col, FloatCol, Float32Col, Float64Col and StringCol. I think they are more explicit and easy-to-use than the, now deprecated (but still supported!), Col() descriptor. All the examples and user's manual has been accordingly updated. - The new Table.removeRows(start, stop) allow to remove rows from tables!. This was a long-time asked feature. There is still a limitation, though: you cannot delete rows in extremely large Tables (because it is needed to put the remaining rows after the stop parameter in memory). Also, in this release, the performance is not optimized. These issues will hopefully be addressed in future releases. - Added iterators to File, Group and Table (they support now the __iter__() special method). They allow much better object usability, especially in interactive mode. See documentation for examples of use. - Added a __getitem__() method to Table, that works more or less like read(), but with extended slices support. - As a consequence of rewriting table iterators in C (with the help of Pyrex, of course) the table read performance has been improved between 20% and 30%. With that, data selections in PyTables are starting to beat powerful relational databases like SQLite, even for the in-core case (!). I think there is still room for another 20% to 30% improvement, so stay tuned. - A checksum is now added automatically when using LZO (not with UCL where I'm having some difficulties implementing that capability). The Adler32 algorithm has been chosen because of its speed. With that, the compressing/decompressing speed has dropped 1% or 2%, which is hardly noticeable. I think this addition will allow the cautious user to be a bit more confident about this excellent compressor. Code has been added to still be able to read files created without this checksum (so you can be confident that you will be able to read your existing files compressed with LZO and UCL). - Recursion has been removed from PyTables. Before, this made the maximum depth tree to be less than the Python recursion limit (which depends on implementation, but is around 900, at least in Linux). Now, the limit has been set (somewhat arbitrarily) to be 2048. Thanks to John Nielsen for implementing the new iterative method!. - A new rootUEP parameter to openFile() has been added. You can now define the root from where you want to start build the object tree. Thanks to John Nielsen for the suggestion and a first implementation of it. - Some (non-serious) bugs were discovered and fixed. - Updated documentation, so that you can learn more about how to use all these new jingles and bells. It is also available on the web: http://pytables.sourceforge.net/html-doc/usersguide-html.html - Added more unit tests (up to 353 now!) - PyTables 0.7 *needs* the newest numarray 0.6 and HDF-1.6.0 to compile and work. It has been tested with Python 2.2.3 and Python 2.3c2 and should work fine on both versions. What it is ---------- In short, PyTables provides a powerful and very Pythonic interface to process and organize your table and array data on disk. Its goal is to enable the end user to manipulate easily scientific data tables and Numerical and numarray Python objects in a persistent hierarchical structure. The foundation of the underlying hierarchical data organization is the excellent HDF5 library (http://hdf.ncsa.uiuc.edu/HDF5). A table is defined as a collection of records whose values are stored in fixed-length fields. All records have the same structure and all values in each field have the same data type. The terms "fixed-length" and "strict data types" seems to be quite a strange requirement for an language like Python, that supports dynamic data types, but they serve a useful function if the goal is to save very large quantities of data (such as is generated by many scientific applications, for example) in an efficient manner that reduces demand on CPU time and I/O resources. Quite a bit effort has been invested to make browsing the hierarchical *data structure a pleasant experience. PyTables implements two (orthogonal) easy-to-use methods for browsing. What is HDF5? ------------- For those people who know nothing about HDF5, it is is a general purpose library and file format for storing scientific data made at NCSA. HDF5 can store two primary objects: datasets and groups. A dataset is essentially a multidimensional array of data elements, and a group is a structure for organizing objects in an HDF5 file. Using these two basic constructs, one can create and store almost any kind of scientific data structure, such as images, arrays of vectors, and structured and unstructured grids. You can also mix and match them in HDF5 files according to your needs. Platforms --------- I'm using Linux as the main development platform, but PyTables should be easy to compile/install on other UNIX machines. This package has also passed all the tests on a UltraSparc platform with Solaris 7 and Solaris 8. It also compiles and passes all the tests on a SGI Origin2000 with MIPS R12000 processors and running IRIX 6.5. Regarding Windows platforms, PyTables has been tested with Windows 2000 and Windows XP, but it should also work with other flavors. An example? ----------- For online code examples, have a look at http://pytables.sourceforge.net/tut/tutorial1-1.html and http://pytables.sourceforge.net/tut/tutorial1-2.html There is also an small one attached at the end of this message. Web site -------- Go to the PyTables web site for more details: http://pytables.sourceforge.net/ Share your experience --------------------- Let me know of any bugs, suggestions, gripes, kudos, etc. you may have. Have fun! -- Francesc Alted fa...@op... --------------- Example of use -------------------------------------- from numarray import * from tables import * class Particle(IsDescription): name = StringCol(length=16) # 16-character String lati = Int16Col() # integer longi = IntCol() # integer pressure = Float32Col(shape=(2,3)) # 2-D float array (single-precision) temperature = Float64Col(shape=(2,3)) # 2-D float array (double-precision) # Open a file in "w"rite mode fileh = openFile("table-simple.h5", mode = "w") # Create a new table in root table = fileh.createTable(fileh.root, 'table', Particle, "Title example") particle = table.row # Fill the table with 10 particles for i in xrange(10): # First, assign the values to the Particle record particle['name'] = 'Particle: %6d' % (i) particle['lati'] = i particle['longi'] = 10 - i particle['pressure'] = array(i*arange(2*3), shape=(2,3)) particle['temperature'] = float(i**2) # This injects the row values. particle.append() # We need to flush the buffers in table in order to get an # accurate number of records on it. table.flush() # Delete the third and fourth row table.removeRows(3,5) print "Table metadata:" print repr(table) print "Table contents:" for row in table: print row print "name column of the 5th row:" print table[4].field("name") # Finally, close the file fileh.close() |
From: Francesc A. <fa...@op...> - 2003-07-16 20:37:04
|
A Dimecres 16 Juliol 2003 21:03, vareu escriure: > Oh ok, I misunderstood what you were implementing lazily. > > Regarding memory, I wasn't clear. > > If you do an, hdf_load(file), you'll get something that acts like a > python dictionary > that will load off of the disk the data when needed. It's not like a > pickle which loads everything off of the disk into memory. However, be certain that you are not duplicating the functionality provided by the "natural naming" implementation. For example, given the tree: root / a / b / \ a c(Table) you can already access an object in this way: root.a.b.c. Interpreting a Table object as a dictionary, would be just a matter of providing a __getitem__ and __setitem__ methods to Table. So that: file.root.a.b.c["var1"] can be an alias of file.root.a.b.c.read(field="var1") and the same for Group: file.root.a.b[1] or: file.root.a.b["c"] would be the same as: file.root.a.b.c Maybe this is what you want. In fact, I was planning to implement that in the next release (specially for Table objects, but I'm still not sure about Group). That would fulfill your goal?. -- Francesc Alted |
From: Francesc A. <fa...@op...> - 2003-07-16 19:21:16
|
A Dimecres 16 Juliol 2003 21:03, vareu escriure: > Oh ok, I misunderstood what you were implementing lazily. > > Regarding memory, I wasn't clear. > > If you do an, hdf_load(file), you'll get something that acts like a > python dictionary > that will load off of the disk the data when needed. It's not like a > pickle which loads everything off of the disk into memory. Oh I see... Indeed, this would a nice improvement to make pytables even more pythonic. I remember something similar has been done with the Objectify module by David Mertz. I think this is worth to read: http://www-106.ibm.com/developerworks/xml/library/xml-matters2/ This was in the context of XML, but most of the ideas applies to pytables (both databases are hierarchical). > > john > > > -----Original Message----- > From: Francesc Alted [mailto:fa...@op...] > Sent: Wednesday, July 16, 2003 2:49 PM > To: Nielsen John > Subject: Re: I agree on factorizing things > > A Dimecres 16 Juliol 2003 20:01, vareu escriure: > > Oh, so I won't worry about implementing the lazy stuff then. > > Why? I'm just avoiding reading the *attributes* (I mean VERSION, TITLE > and > attribute stuff like that) of objects, but my current code *still* > builds > the complete object tree. So your idea remains *completely* valid and I > have > not worked on that. So feel free to continue working on that issue if > you > are inclined to do so. > > > I am layering the data structure conversion code on top of yours. > > It'll > > > just be a library to import if one so chooses. > > > > In theory it would work like: > > import hdf_file > > > > hdf_file.save(file,data) > > data=hdf_file_load(file) > > > > Of course, the data isn't all there in memory, the access to it will > > be > > > managed. > > Uh? This was not for Python (in-memory) objects? -- Francesc Alted |
From: Francesc A. <fa...@op...> - 2003-07-16 16:11:04
|
A Dimecres 16 Juliol 2003 17:42, vareu escriure: > The loading of the object tree is dependant on data access not on > metadata access. > > For example, let's say we have a new function named hdf_load which > understands caching. > > a=hdf_load(file) > > At this point, you have not loaded anything except for the cache in the > hdf file, which may just be a simple python dictionary. The object > structure is not there. With the cache in place, you do know all of the > groups. You just do not have access to the data. > > So, when I do a: > > for i in a.keys(): > > I am looking through the cache of group names (groups represent keys). > As soon as I do the a[i], then I load the object tree for a[i] and pull > out the table. > The approach is lazy, I only load the part of object tree that it is > actually needed and only when the data is accessed. > > for i in a.keys(): > #i equals fred,barney,wilma . . . betty > print a[i] > #print returns [1,2,3,4,5,6] for a['fred'] > #to get to the data, I load the object tree for a['fred'] > > > Clearer? I think so. It's a great idea!. The only thing is that you should not worry about the cache right now (specially if it is only a few percent better than the current code). I think it would be better to start with the lazy implementation without additional cache complications, because, if cache finally can't be accelerated, we can include the new code without further work. And, if cache will finally speeds significanly things up, we can always include that later on. I think it's always better to factorize things and adding them bit a bit after they have been completely tested. BTW, I'm sending a copy of some messages to the pytables user's list. Maybe somebody wants to contribute with fresh ideas. Cheers, -- Francesc Alted |
From: Francesc A. <fa...@op...> - 2003-07-15 19:13:33
|
A Dimarts 15 Juliol 2003 20:28, vareu escriure: > To clarify the metadata idea. > > Because we can now target a specific part of the metadata rather than > have to load in the entire set. One can just put a cache of the metadata > in a group that is looked at. > > This makes "an" hdf_pickle even more accessible, since it is not costly > to go through all of the keys and look for something. And if they decide > to load up a group, you only load group that you care about. > I think that before spending too much time implementing a metadata cache, it would be worth to see how much this cache speeds up the creation of the object tree. I don't remember whether you have reliable figures on that or not. In the case that it *significantly* improves the creation process, I think that saving that in the same HDF5 file than data is an excellent idea. That way, the portability of the pytables files will not suffer at all. And, of course, I also like the idea of creating the cache in the group from where the user wants to partially create the object tree (we can call that, say, "user access points"). > john > > > Portions of this message may be confidential under an exemption to Ohio's > public records law or under a legal privilege. If you have received this > message in error or due to an unauthorized transmission or interception, > please delete all copies from your system without disclosing, copying, or > transmitting this message. -- Francesc Alted |
From: Francesc A. <fa...@op...> - 2003-07-15 17:30:59
|
A Dimarts 15 Juliol 2003 17:50, vareu escriure: > > Limitations: > > 1)you cannot put dictionaries inside of lists, just like you do not put > trees inside of a leaf. That can be a first approximation, but in the future, you might even be able to do that if you create a group for each dictionary of the list (the name for each group can be "some character + str(list position)"), and adding a user defined attribute on the parent group telling what kind of python object represents the info contained on its childs (in that case, a list). > 2)The rows need to have the same type of data. Yeah, and I this is were you can have a lot of work by doing checkings, not only of that of the type, but also preventing the user to provide irregular lists (e.g. ((1,(2,3)),(1,2)) or a lot more of "irregular" situation that I can't think of right now. One approach can be to transform first the user supplied objects into numarray objects (NumArray, RecArray or CharArray) before to pass the object to pytables. numarray will do the job of doing the checks for input correctness, and if they pass (i.e. a numarray object can be created), then the object can be safely passed to pytables. That way, you don't have to reinvent the wheel and take advantage of all the checkings that the numarray library provides to you. Just some thoughts, -- Francesc Alted |
From: Francesc A. <fa...@op...> - 2003-07-15 17:08:35
|
Hi, Just to satisfy your curiosity, I've started to work with ZODB about 1 year and half ago, and it wasn't useful to me mainly for two reasons: - I needed to save very large amounts of data (from hundreds of MB on), and in that time ZODB had some memory leaks, that, in the end, made my application to crash. I've heard that they have already fixed that, but I personally did not check that again. - The most important reason was efficency. I needed very fast access (basically, only limited by the I/O capabilities of the disk subsystem) to the data that I've saved on disk, and ZODB did not cope with this. I remember that the reason was that ZODB uses cPickle to serialize the data, and cPickle, while it is reasonably fast, it is about 100 times slower (when reading) than pytables (see: http://pytables.sourceforge.net/doc/PyCon.html#section4). But, my needs are presumably different than yours, and pytables also has its limitations, being the most important that it can't deal with completely general objects, as ZODB can do, but only with tables (or arrays, including strings) of data. Besides, it does not support strings of variable length, and you may need that in your applications. This last inconvenience can be circumvented, though, as long as your objects fits in-memory. PyTables also has its advantages, and an important one is that, in my opinion, pytables is very pythonic in that it supports the "natural naming" manner to access to your data from Python (see: http://pytables.sourceforge.net/html-doc/usersguide-html1.html#section1.2), which is very powerful when dealing with hierarchical structures (see also: http://pytables.sourceforge.net/html-doc/usersguide-html3.html#section3.2). My personal suggestion is that, if you are not going to deal with very large amounts of data or, performance is not an issue to you, you should try first ZODB. If it doesn't fulfill your needs, or you just happens to love "natural naming" (as I do), then give pytables a chance. Hope that helps, Francesc A Dimarts 15 Juliol 2003 18:15, Ronald L Chichester va escriure: > I read the home page of the project, and saw that the > author of pyTables tried the ZODB but didn't find it > satisfactory. Just out of sheer curiosity, why wasn't the > ZODB satisfactory. > > The reason that I ask is that I'm going through the same > problem and am looking for a solution. In this case, I > have a set of text clauses in a document that need to be > viewed in a hierarchical manner for drafting, but when > "published" have to be in a sequential manner. Thus, the > "publisher" has to recursively go through the hierarchy > and number the text elements (and reference the number of > the parent element) and insert that into the text that is > "published". > > Would pyTables help me do that? > > Thanks in advance, > > Ron > > > ------------------------------------------------------- > This SF.Net email sponsored by: Parasoft > Error proof Web apps, automate testing & more. > Download & eval WebKing and get a free book. > www.parasoft.com/bulletproofapps1 > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users -- Francesc Alted |
From: Ronald L C. <co...@ha...> - 2003-07-15 16:15:10
|
I read the home page of the project, and saw that the author of pyTables tried the ZODB but didn't find it satisfactory. Just out of sheer curiosity, why wasn't the ZODB satisfactory. The reason that I ask is that I'm going through the same problem and am looking for a solution. In this case, I have a set of text clauses in a document that need to be viewed in a hierarchical manner for drafting, but when "published" have to be in a sequential manner. Thus, the "publisher" has to recursively go through the hierarchy and number the text elements (and reference the number of the parent element) and insert that into the text that is "published". Would pyTables help me do that? Thanks in advance, Ron |
From: Francesc A. <fa...@op...> - 2003-07-11 08:13:04
|
A Dijous 10 Juliol 2003 19:45, vareu escriure: > I'm working on converting python data structures to pytables commands. > > So if you have: > > data={'fred':[1,2,3,4,5,6,7,8,9,10], > 'barney':[11,12,13,14,15], > 'wilma':[99,99,3,22,66] > } > > > Just do something like: > save(data,'file1.hdf') > > and it will automatically save the dictionary in pytables format for > you. > I think it will make pytables more accessible. That sounds nice > > > Anyways, if I define a class like: > > class Row(tables.IsDescription): pass > > I cannot use setattr to do this: > > setattr(Row, 'data0', tables.Col('Int32')) > > The definition of data0 has to be present at class defintion time. I > think this is because of the metaclass that helps create > tables.IsDescription. (the metaclass stuff they've added to python is > cool!) > > I've tried some tactics like defining my own metaclass for Row, but > ended up > resorting to using compile/exec on strings to build the class. It seems > awkward, can you think of a better way to at runtime define the data > that forms a table? I don't know if you are aware that pytables can also build a Table from a dictionary like: RecordDescriptionDict = { 'var1': Col("CharType", 4), # 4-character String 'var2': Col("Int32", 1), # integer 'var3': Col("Int16", 1), # short integer 'var4': Col("Float64", 2), # double (double-precision) 'var5': Col("Float32", 4), # float (single-precision) 'var6': Col("Int16", 1), # short integer 'var7': Col("CharType", 1), # 1-character String } and then call the Table constructor as always: h5file.createTable(group, 'table', RecordDescritionDict, title = self.title) So, you can pass both a metaclass descentant or a dictionary to Table and both will be understood (for more information see http://pytables.sourceforge.net/html-doc/usersguide-html4.html#subsection4.4.2 and http://pytables.sourceforge.net/html-doc/usersguide-html3.html#secondExample) I think that this should be the way to go for you. However, it is indeed possible to build-up a metaclass dynamically and you can see examples of that use in the Table._newRecArray() and Table._open() methods in the sources. But this is a more lower level approach and not recommended unless you absolutely need it. Good luck!, -- Francesc Alted |
From: Francesc A. <fa...@op...> - 2003-07-11 07:49:09
|
A Dijous 10 Juliol 2003 20:25, Vineet Jain va escriure: > I'll keep the offer of commercial support in mind. It's great to know > that you offer this for people who need it. So far I'm really impressed > with pytables. I'm using a combination of sqlite and pytables right now. > Sqlite when I need to do some complex queries and pytable stuff for > almost everything else. Mixing SQLite and pytables is a very good approach for many situations. I think I should further investigate the different possibilities of collaboration of both packages and give examples to easy the people to see how powerful this combination can be. Cheers, -- Francesc Alted |
From: Vineet J. <vin...@ya...> - 2003-07-10 21:51:15
|
I'll keep the offer of commercial support in mind. It's great to know that you offer this for people who need it. So far I'm really impressed with pytables. I'm using a combination of sqlite and pytables right now. Sqlite when I need to do some complex queries and pytable stuff for almost everything else. Will keep you posted on how it works out. -----Original Message----- From: pyt...@li... [mailto:pyt...@li...] On Behalf Of Francesc Alted Sent: Wednesday, July 02, 2003 8:05 AM To: Vineet Jain; pyt...@li... Subject: Re: [Pytables-users] Question about pytables A Dimecres 02 Juliol 2003 00:15, Vineet Jain va escriure: > I have decided to go forward with pytables for the time being. Great to > hear that hd1.6 is planning to implement the updating feature. Well, it's not exactly a feature on HDF5 1.6, but rather a combination of HDF5_HL library (which ships with pytables) and HDF5 1.6. But, it seems like if HDF5 1.6 is the missing factor to achieve that. Hope you will be happy to use pytables. If in the future you need some specific need that is not implemented in pytables or just want professional support, remember that I'm offering commercial support for that kind of things ;-). > I'm > looking to store stock minute bars for around 5000 stocks for several > years. This will be a lot of data and pytables is very fast so I don't > hav eto worry about the IO part of things. That sounds nice. In addition, if your data is compressible, you will find that you need a fraction of the space of a relational database. > > Yes you assumed correctly that I wanted to create a new recarray with > the total number of rows which would be numrows1+numrows2. Once I read > the objects from memory are they mutable or immutable. Can I change some > of the values in place? The RecArray object is mutable, so you can change this values in-memory. Besides, if you save this object on the file later-on and delete the original table you have a rather primitive, yet effective way of upgrading rows, until a more efficient way would be implemented. > > So if read gets all the rows of a table in memory does iterrow only load > the rows that you requested? To be exact, read() only reads the rows specified on its start, stop, step and field arguments, in the same way that iterrows(). The difference between them is that read() returns a monolithic object (i.e. a RecArray) with all the info you have requested, while iterrows() is a row iterator, so you get only a row each time it is invoked. > Is there any way to get a recarray back and > not load all the data into memory with out having to go through the > iterator? That's possible, as I said before, by using the start, stop and step parameters of read(). But if you want to read over all the table without loading all the data in memory, you will need iterrows(), of course. Cheers, -- Francesc Alted ------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01 _______________________________________________ Pytables-users mailing list Pyt...@li... https://lists.sourceforge.net/lists/listinfo/pytables-users |
From: Francesc A. <fa...@op...> - 2003-07-02 18:08:25
|
A Dimecres 02 Juliol 2003 00:15, Vineet Jain va escriure: > I have decided to go forward with pytables for the time being. Great to > hear that hd1.6 is planning to implement the updating feature. Well, it's not exactly a feature on HDF5 1.6, but rather a combination of HDF5_HL library (which ships with pytables) and HDF5 1.6. But, it seems like if HDF5 1.6 is the missing factor to achieve that. Hope you will be happy to use pytables. If in the future you need some specific need that is not implemented in pytables or just want professional support, remember that I'm offering commercial support for that kind of things ;-). > I'm > looking to store stock minute bars for around 5000 stocks for several > years. This will be a lot of data and pytables is very fast so I don't > hav eto worry about the IO part of things. That sounds nice. In addition, if your data is compressible, you will find that you need a fraction of the space of a relational database. > > Yes you assumed correctly that I wanted to create a new recarray with > the total number of rows which would be numrows1+numrows2. Once I read > the objects from memory are they mutable or immutable. Can I change some > of the values in place? The RecArray object is mutable, so you can change this values in-memory. Besides, if you save this object on the file later-on and delete the original table you have a rather primitive, yet effective way of upgrading rows, until a more efficient way would be implemented. > > So if read gets all the rows of a table in memory does iterrow only load > the rows that you requested? To be exact, read() only reads the rows specified on its start, stop, step and field arguments, in the same way that iterrows(). The difference between them is that read() returns a monolithic object (i.e. a RecArray) with all the info you have requested, while iterrows() is a row iterator, so you get only a row each time it is invoked. > Is there any way to get a recarray back and > not load all the data into memory with out having to go through the > iterator? That's possible, as I said before, by using the start, stop and step parameters of read(). But if you want to read over all the table without loading all the data in memory, you will need iterrows(), of course. Cheers, -- Francesc Alted |
From: Francesc A. <fa...@op...> - 2003-07-02 18:08:11
|
Vineet, I've looked at your examples, and I think you are getting such a good results with pytables because your data is completely repetitive and you are using compression. In the real life, however, data is not so compressible and you can be sure that this speed-up will decrease substantially. Having said that, if your data is compressible (even a small amount) pytables will have a clear advantage when reading over SQLite and large amounts of data (tipically larger than your available system memory). Besides, for creating tables, you can expect always much better performance using pytables than a relational database. As always, your best bet is to run the benchmarks with *real* data. Cheers, -- Francesc Alted |
From: Vineet J. <vin...@ya...> - 2003-07-01 22:20:30
|
Is there any disadvantage to using the recarray object over the array object? I've attached my code for benchmarking sqlite vs pytables Createtables sqlite: 67seconds Pytables: 9 seconds Select sqlite: 15 seconds Pytables: 0.22 seconds I'm running on a pentium 600. I have also tried this example by repeating 8000 unique rows 20 times and the times from that run were comparable. Vineet TO CREATE THE TABLES: --------------------- import csv from tables import * #import psyco #psyco.full() from time import clock _time = clock() class Price(IsDescription): date = Col("CharType", 8) # 16-character String hhmm = Col("Int32", 1) # integer open = Col("Float32", 1) # integer high = Col("Float32", 1) # float (single-precision) low = Col("Float32", 1) # float (single-precision) close = Col("Float32", 1) # double (double-precision) volume = Col("Int32", 1) # double (double-precision) # Open a file in "w"rite mode fileh = openFile("c:/Trading/stockdata/test/sp1.hd5", mode = "w") # Create a new table in newgroup group table = fileh.createTable('/', name='table', description=Price, complib='lzo', compress=5) price = table.row for i in xrange(150000): # First, assign the values to the Particle record price['date'] = '01012003' price['hhmm'] = 0101 price['open'] = 935.00 price['high'] = 935.00 price['low'] = 935.00 price['close'] = 935.00 price['volume'] = 0 # This injects the row values. price.append() # We need to flush the buffers in table in order to get an # accurate number of records on it. table.flush() executionTime = clock() - _time print 'execution time: '+str(executionTime) # Finally, close the file fileh.close() import csv import sqlite import psyco psyco.full() from time import clock _time = clock() conn = sqlite.connect(db="c:/Trading/stockdata/test/sp1", mode=077) cursor = conn.cursor() #cursor.execute('drop table minbars') cursor.execute('create table minbars (date,hourmin,Open,low,high,close,volume)') for i in xrange(150000): cursor.execute('insert into minbars values(%s, %s, %s, %s, %s, %s, %s)', ['01012003', '0101', '935.00', '935.00', '935.00', '935.00', '935.00']) cursor.execute('select date, hourmin, open, low, high, close, volume from minbars') conn.commit() conn.close() executionTime = clock() - _time print 'execution time: '+str(executionTime) TO SELECT THE DATA: -------------------- import csv from tables import * #import psyco #psyco.full() from time import clock _time = clock() # Open a file in "w"rite mode fileh = openFile("c:/Trading/stockdata/test/sp1.hd5", mode = "r", ) table = fileh.getNode('/table') row = table.read() executionTime = clock() - _time print 'Total row count: '+str(len(row)) print 'execution time: '+str(executionTime) import csv import sqlite #import psyco #psyco.full() def main(): from time import clock _time = clock() conn = sqlite.connect(db="c:/Trading/stockdata/test/sp1", mode=044) cursor = conn.cursor() cursor.execute('select date, hourmin, open, low, high, close, volume from minbars') cursor.fetchall() conn.close() executionTime = clock() - _time #print 'Total row count: '+str(len(row)) print 'execution time: '+str(executionTime) if __name__ == "__main__": main() -----Original Message----- From: pyt...@li... [mailto:pyt...@li...] On Behalf Of Vineet Jain Sent: Tuesday, July 01, 2003 3:16 PM To: 'Francesc Alted'; pyt...@li... Subject: RE: [Pytables-users] Question about pytables Thanks for your replies. I'm not sure what I did wrong with pysqlite because my example was very simple. But assigning values to row by fetchall took significantly more time than pytables. I have decided to go forward with pytables for the time being. Great to hear that hd1.6 is planning to implement the updating feature. I'm looking to store stock minute bars for around 5000 stocks for several years. This will be a lot of data and pytables is very fast so I don't hav eto worry about the IO part of things. Yes you assumed correctly that I wanted to create a new recarray with the total number of rows which would be numrows1+numrows2. Once I read the objects from memory are they mutable or immutable. Can I change some of the values in place? So if read gets all the rows of a table in memory does iterrow only load the rows that you requested? Is there any way to get a recarray back and not load all the data into memory with out having to go through the iterator? -----Original Message----- From: Francesc Alted [mailto:fa...@op...] Sent: Tuesday, July 01, 2003 1:26 PM To: Vineet Jain; pyt...@li... Subject: Re: [Pytables-users] Question about pytables Hi Vineet, A Dimarts 01 Juliol 2003 01:22, Vineet Jain va escriure: > Couple of questions about pytables: > > > > I built two samples. One with pysqlite and one with pytables and I found > pytables to be about 20 times faster than the pysqlite version and used > a lot less space. Let me commend you on a great application. 20 times faster than pysqlite seems too much, and besides, this should depend on what kind of benchmark are you doing. If it is for writing, that seems reasonable, while that for reading the difference should be lot less (see my Europython presentation at http://pytables.sourceforge.net/doc/EuroPython.pdf, for more details). Can you explain a bit what kind of benchmark have you ran?. Anyway, I'm happy to know that pytables works great for your specific application. > > 1. Update certain rows in a table and append to a table. The latter > you handle but am not sure how to do the former. Will updating rows ever > be supported? Appending rows is not a problem, even between different python sessions. Updating is not yet supported and I'm waiting for HDF5 1.6 to appear to see if I can implement that feature. I'll try to release a new version of pytables supporting deleting and updating rows as soon as NCSA folks release the 1.6 version (which should happen more sooner than later). > > > > 2. For arrays or rows returned from a table. How can you do the > following: > > Row1 = table1.read() > > Row2 = table2.read() > > FinalRow = row1+row2 > > Without having to loop through them. > First of all, let me point out that the read() method of a Table object reads the whole table in memory, and returns a recarray object, which is the way the numarray package represents arrays of inhomogeneous data (i.e. tables). Then, you failed to specify if by row1+row2 you meant adding the different rows of tables to get a larger table with nrows1+nrows2 number of rows, or, in case that nrows1 == nrows2 you want to get a table with the same number of rows, but with ncolumns1 + ncolumns2 number of columns. For simplicity, I'll assume that you meant the former case, as the latter seems more complicated. After this clarifications, it seems that you are trying to add two recarray objects, not two tables and this is not currently supported on numarray. But it should be a nice thing to support a __add__ special method, of course. I'll talk with numarray crew so as to see if that can be implemented. What is the main difference between a recarray and array object especially since both of them can be passed to numarray? I've attached my code for benchmarking sqlite vs pytables Createtables sqlite: 67seconds Pytables: 9 seconds Select sqlite: 15 seconds Pytables: 0.22 seconds I'm running on a pentium 600. I have also tried this example by repeating 8000 unique rows 20 times and the times from that run were comparable. Vineet TO CREATE THE TABLES: --------------------- import csv from tables import * #import psyco #psyco.full() from time import clock _time = clock() class Price(IsDescription): date = Col("CharType", 8) # 16-character String hhmm = Col("Int32", 1) # integer open = Col("Float32", 1) # integer high = Col("Float32", 1) # float (single-precision) low = Col("Float32", 1) # float (single-precision) close = Col("Float32", 1) # double (double-precision) volume = Col("Int32", 1) # double (double-precision) # Open a file in "w"rite mode fileh = openFile("c:/Trading/stockdata/test/sp1.hd5", mode = "w") # Create a new table in newgroup group table = fileh.createTable('/', name='table', description=Price, complib='lzo', compress=5) price = table.row for i in xrange(150000): # First, assign the values to the Particle record price['date'] = '01012003' price['hhmm'] = 0101 price['open'] = 935.00 price['high'] = 935.00 price['low'] = 935.00 price['close'] = 935.00 price['volume'] = 0 # This injects the row values. price.append() # We need to flush the buffers in table in order to get an # accurate number of records on it. table.flush() executionTime = clock() - _time print 'execution time: '+str(executionTime) # Finally, close the file fileh.close() import csv import sqlite import psyco psyco.full() from time import clock _time = clock() conn = sqlite.connect(db="c:/Trading/stockdata/test/sp1", mode=077) cursor = conn.cursor() #cursor.execute('drop table minbars') cursor.execute('create table minbars (date,hourmin,Open,low,high,close,volume)') for i in xrange(150000): cursor.execute('insert into minbars values(%s, %s, %s, %s, %s, %s, %s)', ['01012003', '0101', '935.00', '935.00', '935.00', '935.00', '935.00']) cursor.execute('select date, hourmin, open, low, high, close, volume from minbars') conn.commit() conn.close() executionTime = clock() - _time print 'execution time: '+str(executionTime) TO SELECT THE DATA: -------------------- import csv from tables import * #import psyco #psyco.full() from time import clock _time = clock() # Open a file in "w"rite mode fileh = openFile("c:/Trading/stockdata/test/sp1.hd5", mode = "r", ) table = fileh.getNode('/table') row = table.read() executionTime = clock() - _time print 'Total row count: '+str(len(row)) print 'execution time: '+str(executionTime) import csv import sqlite #import psyco #psyco.full() def main(): from time import clock _time = clock() conn = sqlite.connect(db="c:/Trading/stockdata/test/sp1", mode=044) cursor = conn.cursor() cursor.execute('select date, hourmin, open, low, high, close, volume from minbars') cursor.fetchall() conn.close() executionTime = clock() - _time #print 'Total row count: '+str(len(row)) print 'execution time: '+str(executionTime) if __name__ == "__main__": main() > > > 3 Something useful found in pysqlite, and the postgress db driver > is the ability to access field names directly: > > > > row = table.read() > > high = row[10000].high (where high is a field of the table) > Yeah, you can do that using some parameters of the read() method. For example, let's suppose that we have the next Table object: >>> file.root.detector.smalltable /detector/smalltable (Table(10,)) 'Small table with 3 fields' description := { 'var1': Col('CharType', (6,)), 'var2': Col('Int32', (1,)), 'var3': Col('Float64', (1,)) } byteorder = little if you ask for help on its read() method: >>> help(file.root.detector.smalltable.read) Help on method read in module tables.Table: read(self, start=None, stop=None, step=None, field=None, flavor=None) method of ta bles.Table.Table instance Read a range of rows and return an in-memory object. If "start", "stop", or "step" parameters are supplied, a row range is selected. If "field" is specified, only this "field" is returned as a NumArray object. If "field" is not supplied all the fields are selected and a RecArray is returned. If both "field" and "flavor" are provided, an additional conversion to an object of this flavor is made. "flavor" must have any of the next values: "Numeric", "Tuple" or "List". (END) then, you can for example do: >>> file.root.detector.smalltable.read(start=1,stop=5, field="var2") array([1, 2, 3, 4]) and it returns the "var2" column from the rows from 1 up to (and excluding it) 5. It would be handy providing some more pythonic manner to access this data, and that might come in the future. > > > 4 Is there any way the rows returned from table can be treated as > numarray objects? As you have seen in the example before, pytables will always tries to return numarray objects. It will be an Array object if the data is homogeneous (all resulting elements has the same data type). If the resulting elements are of different datatypes, a RecArray object will be returned, as in: >>> print file.root.detector.smalltable.read(start=1,stop=5) RecArray[ ('d: 1', 1, 1024.0), ('d: 2', 2, 2048.0), ('d: 3', 3, 3072.0), ('d: 4', 4, 4096.0) ] Hope that helps to dissipate some of your questions, -- Francesc Alted ------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01 _______________________________________________ Pytables-users mailing list Pyt...@li... https://lists.sourceforge.net/lists/listinfo/pytables-users |
From: Vineet J. <vin...@ya...> - 2003-07-01 22:16:09
|
Thanks for your replies. I'm not sure what I did wrong with pysqlite because my example was very simple. But assigning values to row by fetchall took significantly more time than pytables. I have decided to go forward with pytables for the time being. Great to hear that hd1.6 is planning to implement the updating feature. I'm looking to store stock minute bars for around 5000 stocks for several years. This will be a lot of data and pytables is very fast so I don't hav eto worry about the IO part of things. Yes you assumed correctly that I wanted to create a new recarray with the total number of rows which would be numrows1+numrows2. Once I read the objects from memory are they mutable or immutable. Can I change some of the values in place? So if read gets all the rows of a table in memory does iterrow only load the rows that you requested? Is there any way to get a recarray back and not load all the data into memory with out having to go through the iterator? -----Original Message----- From: Francesc Alted [mailto:fa...@op...] Sent: Tuesday, July 01, 2003 1:26 PM To: Vineet Jain; pyt...@li... Subject: Re: [Pytables-users] Question about pytables Hi Vineet, A Dimarts 01 Juliol 2003 01:22, Vineet Jain va escriure: > Couple of questions about pytables: > > > > I built two samples. One with pysqlite and one with pytables and I found > pytables to be about 20 times faster than the pysqlite version and used > a lot less space. Let me commend you on a great application. 20 times faster than pysqlite seems too much, and besides, this should depend on what kind of benchmark are you doing. If it is for writing, that seems reasonable, while that for reading the difference should be lot less (see my Europython presentation at http://pytables.sourceforge.net/doc/EuroPython.pdf, for more details). Can you explain a bit what kind of benchmark have you ran?. Anyway, I'm happy to know that pytables works great for your specific application. > > 1. Update certain rows in a table and append to a table. The latter > you handle but am not sure how to do the former. Will updating rows ever > be supported? Appending rows is not a problem, even between different python sessions. Updating is not yet supported and I'm waiting for HDF5 1.6 to appear to see if I can implement that feature. I'll try to release a new version of pytables supporting deleting and updating rows as soon as NCSA folks release the 1.6 version (which should happen more sooner than later). > > > > 2. For arrays or rows returned from a table. How can you do the > following: > > Row1 = table1.read() > > Row2 = table2.read() > > FinalRow = row1+row2 > > Without having to loop through them. > First of all, let me point out that the read() method of a Table object reads the whole table in memory, and returns a recarray object, which is the way the numarray package represents arrays of inhomogeneous data (i.e. tables). Then, you failed to specify if by row1+row2 you meant adding the different rows of tables to get a larger table with nrows1+nrows2 number of rows, or, in case that nrows1 == nrows2 you want to get a table with the same number of rows, but with ncolumns1 + ncolumns2 number of columns. For simplicity, I'll assume that you meant the former case, as the latter seems more complicated. After this clarifications, it seems that you are trying to add two recarray objects, not two tables and this is not currently supported on numarray. But it should be a nice thing to support a __add__ special method, of course. I'll talk with numarray crew so as to see if that can be implemented. What is the main difference between a recarray and array object especially since both of them can be passed to numarray? I've attached my code for benchmarking sqlite vs pytables Createtables sqlite: 67seconds Pytables: 9 seconds Select sqlite: 15 seconds Pytables: 0.22 seconds I'm running on a pentium 600. I have also tried this example by repeating 8000 unique rows 20 times and the times from that run were comparable. Vineet TO CREATE THE TABLES: --------------------- import csv from tables import * #import psyco #psyco.full() from time import clock _time = clock() class Price(IsDescription): date = Col("CharType", 8) # 16-character String hhmm = Col("Int32", 1) # integer open = Col("Float32", 1) # integer high = Col("Float32", 1) # float (single-precision) low = Col("Float32", 1) # float (single-precision) close = Col("Float32", 1) # double (double-precision) volume = Col("Int32", 1) # double (double-precision) # Open a file in "w"rite mode fileh = openFile("c:/Trading/stockdata/test/sp1.hd5", mode = "w") # Create a new table in newgroup group table = fileh.createTable('/', name='table', description=Price, complib='lzo', compress=5) price = table.row for i in xrange(150000): # First, assign the values to the Particle record price['date'] = '01012003' price['hhmm'] = 0101 price['open'] = 935.00 price['high'] = 935.00 price['low'] = 935.00 price['close'] = 935.00 price['volume'] = 0 # This injects the row values. price.append() # We need to flush the buffers in table in order to get an # accurate number of records on it. table.flush() executionTime = clock() - _time print 'execution time: '+str(executionTime) # Finally, close the file fileh.close() import csv import sqlite import psyco psyco.full() from time import clock _time = clock() conn = sqlite.connect(db="c:/Trading/stockdata/test/sp1", mode=077) cursor = conn.cursor() #cursor.execute('drop table minbars') cursor.execute('create table minbars (date,hourmin,Open,low,high,close,volume)') for i in xrange(150000): cursor.execute('insert into minbars values(%s, %s, %s, %s, %s, %s, %s)', ['01012003', '0101', '935.00', '935.00', '935.00', '935.00', '935.00']) cursor.execute('select date, hourmin, open, low, high, close, volume from minbars') conn.commit() conn.close() executionTime = clock() - _time print 'execution time: '+str(executionTime) TO SELECT THE DATA: -------------------- import csv from tables import * #import psyco #psyco.full() from time import clock _time = clock() # Open a file in "w"rite mode fileh = openFile("c:/Trading/stockdata/test/sp1.hd5", mode = "r", ) table = fileh.getNode('/table') row = table.read() executionTime = clock() - _time print 'Total row count: '+str(len(row)) print 'execution time: '+str(executionTime) import csv import sqlite #import psyco #psyco.full() def main(): from time import clock _time = clock() conn = sqlite.connect(db="c:/Trading/stockdata/test/sp1", mode=044) cursor = conn.cursor() cursor.execute('select date, hourmin, open, low, high, close, volume from minbars') cursor.fetchall() conn.close() executionTime = clock() - _time #print 'Total row count: '+str(len(row)) print 'execution time: '+str(executionTime) if __name__ == "__main__": main() > > > 3 Something useful found in pysqlite, and the postgress db driver > is the ability to access field names directly: > > > > row = table.read() > > high = row[10000].high (where high is a field of the table) > Yeah, you can do that using some parameters of the read() method. For example, let's suppose that we have the next Table object: >>> file.root.detector.smalltable /detector/smalltable (Table(10,)) 'Small table with 3 fields' description := { 'var1': Col('CharType', (6,)), 'var2': Col('Int32', (1,)), 'var3': Col('Float64', (1,)) } byteorder = little if you ask for help on its read() method: >>> help(file.root.detector.smalltable.read) Help on method read in module tables.Table: read(self, start=None, stop=None, step=None, field=None, flavor=None) method of ta bles.Table.Table instance Read a range of rows and return an in-memory object. If "start", "stop", or "step" parameters are supplied, a row range is selected. If "field" is specified, only this "field" is returned as a NumArray object. If "field" is not supplied all the fields are selected and a RecArray is returned. If both "field" and "flavor" are provided, an additional conversion to an object of this flavor is made. "flavor" must have any of the next values: "Numeric", "Tuple" or "List". (END) then, you can for example do: >>> file.root.detector.smalltable.read(start=1,stop=5, field="var2") array([1, 2, 3, 4]) and it returns the "var2" column from the rows from 1 up to (and excluding it) 5. It would be handy providing some more pythonic manner to access this data, and that might come in the future. > > > 4 Is there any way the rows returned from table can be treated as > numarray objects? As you have seen in the example before, pytables will always tries to return numarray objects. It will be an Array object if the data is homogeneous (all resulting elements has the same data type). If the resulting elements are of different datatypes, a RecArray object will be returned, as in: >>> print file.root.detector.smalltable.read(start=1,stop=5) RecArray[ ('d: 1', 1, 1024.0), ('d: 2', 2, 2048.0), ('d: 3', 3, 3072.0), ('d: 4', 4, 4096.0) ] Hope that helps to dissipate some of your questions, -- Francesc Alted |
From: Francesc A. <fa...@op...> - 2003-07-01 21:35:33
|
Hi Vineet, A Dimarts 01 Juliol 2003 01:22, Vineet Jain va escriure: > Couple of questions about pytables: > > > > I built two samples. One with pysqlite and one with pytables and I found > pytables to be about 20 times faster than the pysqlite version and used > a lot less space. Let me commend you on a great application. 20 times faster than pysqlite seems too much, and besides, this should depend on what kind of benchmark are you doing. If it is for writing, that seems reasonable, while that for reading the difference should be lot less (see my Europython presentation at http://pytables.sourceforge.net/doc/EuroPython.pdf, for more details). Can you explain a bit what kind of benchmark have you ran?. Anyway, I'm happy to know that pytables works great for your specific application. > > 1. Update certain rows in a table and append to a table. The latter > you handle but am not sure how to do the former. Will updating rows ever > be supported? Appending rows is not a problem, even between different python sessions. Updating is not yet supported and I'm waiting for HDF5 1.6 to appear to see if I can implement that feature. I'll try to release a new version of pytables supporting deleting and updating rows as soon as NCSA folks release the 1.6 version (which should happen more sooner than later). > > > > 2. For arrays or rows returned from a table. How can you do the > following: > > Row1 = table1.read() > > Row2 = table2.read() > > FinalRow = row1+row2 > > Without having to loop through them. > First of all, let me point out that the read() method of a Table object reads the whole table in memory, and returns a recarray object, which is the way the numarray package represents arrays of inhomogeneous data (i.e. tables). Then, you failed to specify if by row1+row2 you meant adding the different rows of tables to get a larger table with nrows1+nrows2 number of rows, or, in case that nrows1 == nrows2 you want to get a table with the same number of rows, but with ncolumns1 + ncolumns2 number of columns. For simplicity, I'll assume that you meant the former case, as the latter seems more complicated. After this clarifications, it seems that you are trying to add two recarray objects, not two tables and this is not currently supported on numarray. But it should be a nice thing to support a __add__ special method, of course. I'll talk with numarray crew so as to see if that can be implemented. > > > 3 Something useful found in pysqlite, and the postgress db driver > is the ability to access field names directly: > > > > row = table.read() > > high = row[10000].high (where high is a field of the table) > Yeah, you can do that using some parameters of the read() method. For example, let's suppose that we have the next Table object: >>> file.root.detector.smalltable /detector/smalltable (Table(10,)) 'Small table with 3 fields' description := { 'var1': Col('CharType', (6,)), 'var2': Col('Int32', (1,)), 'var3': Col('Float64', (1,)) } byteorder = little if you ask for help on its read() method: >>> help(file.root.detector.smalltable.read) Help on method read in module tables.Table: read(self, start=None, stop=None, step=None, field=None, flavor=None) method of ta bles.Table.Table instance Read a range of rows and return an in-memory object. If "start", "stop", or "step" parameters are supplied, a row range is selected. If "field" is specified, only this "field" is returned as a NumArray object. If "field" is not supplied all the fields are selected and a RecArray is returned. If both "field" and "flavor" are provided, an additional conversion to an object of this flavor is made. "flavor" must have any of the next values: "Numeric", "Tuple" or "List". (END) then, you can for example do: >>> file.root.detector.smalltable.read(start=1,stop=5, field="var2") array([1, 2, 3, 4]) and it returns the "var2" column from the rows from 1 up to (and excluding it) 5. It would be handy providing some more pythonic manner to access this data, and that might come in the future. > > > 4 Is there any way the rows returned from table can be treated as > numarray objects? As you have seen in the example before, pytables will always tries to return numarray objects. It will be an Array object if the data is homogeneous (all resulting elements has the same data type). If the resulting elements are of different datatypes, a RecArray object will be returned, as in: >>> print file.root.detector.smalltable.read(start=1,stop=5) RecArray[ ('d: 1', 1, 1024.0), ('d: 2', 2, 2048.0), ('d: 3', 3, 3072.0), ('d: 4', 4, 4096.0) ] Hope that helps to dissipate some of your questions, -- Francesc Alted |
From: Vineet J. <vin...@ya...> - 2003-06-30 23:22:30
|
Couple of questions about pytables: I built two samples. One with pysqlite and one with pytables and I found pytables to be about 20 times faster than the pysqlite version and used a lot less space. Let me commend you on a great application. I have the following requests/questions: 1. Update certain rows in a table and append to a table. The latter you handle but am not sure how to do the former. Will updating rows ever be supported? 2. For arrays or rows returned from a table. How can you do the following: Row1 = table1.read() Row2 = table2.read() FinalRow = row1+row2 Without having to loop through them. 3 Something useful found in pysqlite, and the postgress db driver is the ability to access field names directly: row = table.read() high = row[10000].high (where high is a field of the table) 4 Is there any way the rows returned from table can be treated as numarray objects? Thanks for your replies, vinj |
From: Francesc A. <fa...@op...> - 2003-06-22 10:40:59
|
Ciao Ciro, A Dissabte 21 Juny 2003 18:12, vareu escriure: > Dear Francesc, > > first of all: THANKS for creating pytables! I went through the NetCDF, > ZODB, bsddb3+cPickle ordeal myself, and eventually landed on HDF5 plus > a home-grown Python interface. Now I plan to start using your pytables, > and I'm tinkering with it a little. > > The present CVS version has a small problem in IsDescription.py, > which is fixed by the patch below: Ups, the CVS version requires the numarray CVS version + a modified version of recarray.py that will be part of the next release of numarray. So, basically the operatibility of the CVS version of PyTables is now technically broken. Do you need some special feature present in CVS and not in 0.5.1? > Plus, on running the tests against the present CVS version of numarray > ("0.6a3"), I get the error below for several test cases: > > =========================================================================== >==== [...] > File "/home/ciro/python-modules/tables/Table.py", line 304, in _open > self.row = hdf5Extension.Row(self._v_buffer, self) > File "/home/ciro/cvs/pytables/src/hdf5Extension.pyx", line 1403, in > hdf5Extension.Row.__new__ self._fields = input._fields > AttributeError: 'RecArray' object has no attribute '_fields' > =========================================================================== >==== > > ...since indeed the records.RecArray class shipped with numarray > has no _fields attribute, contrary to the older version in > tables/recarray2.py. Yeah, this is because I'm implementing the support for the upcoming recarray in numarray. I'm working together with the numarray people in order to implement the _fields attribute (which is already present in recarray2.py). It's kind of cache dictionary to make the recarray field access faster. This is very important to achieve good performance to access table objects in PyTables. I don't know when numarray people will decide to upload the ._fields version, but I hope that to happen soon (maybe next week). Meanwhile, you can download the previous version of PyTables in CVS (Wed Jun 11 10:48:46 2003, see the cvs checkout "-D" date option) which should work just fine against current numarray CVS version. In the future, I'll try to create a branch in CVS so as to avoid confusing situations like this one. Thanks for your interest in pytables, -- Francesc Alted |