You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(5) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
|
Feb
(2) |
Mar
|
Apr
(5) |
May
(11) |
Jun
(7) |
Jul
(18) |
Aug
(5) |
Sep
(15) |
Oct
(4) |
Nov
(1) |
Dec
(4) |
2004 |
Jan
(5) |
Feb
(2) |
Mar
(5) |
Apr
(8) |
May
(8) |
Jun
(10) |
Jul
(4) |
Aug
(4) |
Sep
(20) |
Oct
(11) |
Nov
(31) |
Dec
(41) |
2005 |
Jan
(79) |
Feb
(22) |
Mar
(14) |
Apr
(17) |
May
(35) |
Jun
(24) |
Jul
(26) |
Aug
(9) |
Sep
(57) |
Oct
(64) |
Nov
(25) |
Dec
(37) |
2006 |
Jan
(76) |
Feb
(24) |
Mar
(79) |
Apr
(44) |
May
(33) |
Jun
(12) |
Jul
(15) |
Aug
(40) |
Sep
(17) |
Oct
(21) |
Nov
(46) |
Dec
(23) |
2007 |
Jan
(18) |
Feb
(25) |
Mar
(41) |
Apr
(66) |
May
(18) |
Jun
(29) |
Jul
(40) |
Aug
(32) |
Sep
(34) |
Oct
(17) |
Nov
(46) |
Dec
(17) |
2008 |
Jan
(17) |
Feb
(42) |
Mar
(23) |
Apr
(11) |
May
(65) |
Jun
(28) |
Jul
(28) |
Aug
(16) |
Sep
(24) |
Oct
(33) |
Nov
(16) |
Dec
(5) |
2009 |
Jan
(19) |
Feb
(25) |
Mar
(11) |
Apr
(32) |
May
(62) |
Jun
(28) |
Jul
(61) |
Aug
(20) |
Sep
(61) |
Oct
(11) |
Nov
(14) |
Dec
(53) |
2010 |
Jan
(17) |
Feb
(31) |
Mar
(39) |
Apr
(43) |
May
(49) |
Jun
(47) |
Jul
(35) |
Aug
(58) |
Sep
(55) |
Oct
(91) |
Nov
(77) |
Dec
(63) |
2011 |
Jan
(50) |
Feb
(30) |
Mar
(67) |
Apr
(31) |
May
(17) |
Jun
(83) |
Jul
(17) |
Aug
(33) |
Sep
(35) |
Oct
(19) |
Nov
(29) |
Dec
(26) |
2012 |
Jan
(53) |
Feb
(22) |
Mar
(118) |
Apr
(45) |
May
(28) |
Jun
(71) |
Jul
(87) |
Aug
(55) |
Sep
(30) |
Oct
(73) |
Nov
(41) |
Dec
(28) |
2013 |
Jan
(19) |
Feb
(30) |
Mar
(14) |
Apr
(63) |
May
(20) |
Jun
(59) |
Jul
(40) |
Aug
(33) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
From: Francesc A. <fa...@ca...> - 2005-03-01 07:32:14
|
Hi Damon, A Divendres 25 Febrer 2005 20:44, damon fasching va escriure: > I wonder if someone can shed a little light on Figures > 6.1 and 6.2 in the User's Guide. The horizontal axis > is labeled "Number of rows". I assume from the scale > on that axis that this is "Number of rows in table" > and not "Number of rows accessed". That isn't stated > directly anywhere in the text, though it is suggested > by the wording of the 4th paragraph in section 6.2.2. > (I assume a 900MHz machine could not access 6e+08 rows > in a second...) Yes, it's "Number of rows in table". > If the axis is actually "Number of rows in table", > does anyone know roughly how many rows were accessed > for each data point? Is the number of rows accessed > the same for all data points or does the number of > rows satisfying "table.where(table.cols.var1 <=3D 20)" > grow with table size? If it grows with table size, is > it linear? Well, I choosed a normal distribution (see bench/search-bench.py and bench/search-bench-rnd.sh), with the aim that the number of selected rows would remain more or less constant independently of the table size. I must say, however, I've had a moderate success doing that. The only thing that I can assure is that the number of selected rows is very little compared with the total number of rows, specially for very large tables. In pytables 1.0 branch, I've reworked the benchmark so that I can control better the number of selected rows. > Finally, for the table being accessed, are the rows > ordered by "var1", in which case there is no disk > seeking going on, or are rows with various values of > "var1" scattered throughout the table, in which case > the index is accessed seqentially, but the data would > be more or less randomly accessed, i.e. would require > disk seeks. The values should be scattered throughout the tables, as I've choosed a *random* normal distribution to fill the table. > I can only make sense of the figures if I first assume > that the horizontal axis should read "Number of rows > in table". If this is correct (or even if it is not > correct) perhaps it could be clarified in the text > and/or axis label. I'll try to clarify this point in forthcoming benchmarks. > Given that, I have to make one further assumption to > understand the figures, but in this case I'm not sure > what is correct. I can either assume that a large > number of rows is accessed, that this number grows > linearly with table size and that rows in the table > appear in order of variable "var1". Or, I can assume > that a small number or rows is accessed, that this > number may grow (probably sublinearly) with table > size, and that rows with a given value of var1 are > scattered throughout the table. Which of these > assumptions, if either, is correct? The second one is the correct one. Cheers, =2D-=20 >qo< Francesc Altet =A0 =A0 http://www.carabos.com/ V =A0V C=E1rabos Coop. V. =A0=A0Enjoy Data "" |
From: travlr <vel...@gm...> - 2005-02-27 08:24:39
|
Hi Francesc, The reference for walkGroups() says: Iterator that returns the list of Groups (not Leaves) hanging from where. From this description, the list returned should not include the "/" group if that is what is labled for where. -->walkGroups(where="/") The tutorial provides great & diverse examples of how to attain the public attributes of the object tree, but does not express how into instantiate the leaves, via iteration, so they can be used for numeric processing. For instance I'm trying to loop through the groups hanging from "root", to work with the arrays contained within. One version I've tried: import tables as ta f = ta.openFile('/H5/ESdat.h5','r') for group in f.walkGroups(where="/"): g = eval("f.root."+group._v_name) ap = g.AP.read() aS = g.AS.read() p1 = g.P1.read() v1 = g.V1.read() a1 = g.A1.read() ud = g.UD.read() ... do some things... f.close() I've tried many possible ways to iterate through the file and auto-magically perform tasks. Thank you, Kevin |
From: damon f. <dam...@ya...> - 2005-02-25 19:44:59
|
Hi, I wonder if someone can shed a little light on Figures 6.1 and 6.2 in the User's Guide. The horizontal axis is labeled "Number of rows". I assume from the scale on that axis that this is "Number of rows in table" and not "Number of rows accessed". That isn't stated directly anywhere in the text, though it is suggested by the wording of the 4th paragraph in section 6.2.2. (I assume a 900MHz machine could not access 6e+08 rows in a second...) If the axis is actually "Number of rows in table", does anyone know roughly how many rows were accessed for each data point? Is the number of rows accessed the same for all data points or does the number of rows satisfying "table.where(table.cols.var1 <= 20)" grow with table size? If it grows with table size, is it linear? Finally, for the table being accessed, are the rows ordered by "var1", in which case there is no disk seeking going on, or are rows with various values of "var1" scattered throughout the table, in which case the index is accessed seqentially, but the data would be more or less randomly accessed, i.e. would require disk seeks. I can only make sense of the figures if I first assume that the horizontal axis should read "Number of rows in table". If this is correct (or even if it is not correct) perhaps it could be clarified in the text and/or axis label. Given that, I have to make one further assumption to understand the figures, but in this case I'm not sure what is correct. I can either assume that a large number of rows is accessed, that this number grows linearly with table size and that rows in the table appear in order of variable "var1". Or, I can assume that a small number or rows is accessed, that this number may grow (probably sublinearly) with table size, and that rows with a given value of var1 are scattered throughout the table. Which of these assumptions, if either, is correct? Thanks! Damon __________________________________ Do you Yahoo!? Yahoo! Mail - Find what you need with new enhanced search. http://info.mail.yahoo.com/mail_250 |
From: Vicent M. (V+) <vm...@ca...> - 2005-02-25 17:08:23
|
[This is to correct a couple of funny mistakes in my previous announcement. Now, you are told where to get the software. And, who knows, perhaps in the next future a ViTables release will be available *on Mars* too ;)] Announcing ViTables 1.0 beta =2D--------------------------- I'm happy to announce the availability of ViTables-1.0b, the new member of the PyTables family. It's a graphical tool for browsing and editing files in both PyTables and HDF5 format. As it happens with the entire PyTables family, the main strength of ViTables is its ability to manage really large datasets in a fast and comfortable manner. For example, with ViTables you can open a table with one thousand millions of rows in a few tenths of second, with very low memory requirements. In this release you will find, among others, the following features: - Display data hierarchy as a fully browsable object tree. - Open several files simultaneously. - Reorganize your existing files in a graphical way. - Display files and nodes (group or leaf) properties, including metadata and attributes. - Display heterogeneous entities, i.e. tables. - Display homogeneous (numeric or textual) entities, i.e. arrays. - Zoom into multidimensional table cells. - Editing capabilities for nodes and attributes: creation/deletion, copy/paste, rename... - Fully integrated documentation browser Moreover, once CSTables (the client-server version of PyTables) will be out, ViTables will be able to manage remote PyTables/HDF5 files as if they were local ones. Downloads =2D-------- Go to the ViTables web site for more details and downloads: http://www.carabos.com/products/vitables Platforms =2D-------- At the moment, ViTables has been fully tested only on Linux platforms, but as it is made on top of Python, Qt, PyQt and PyTables, its portability should be really good and should work just fine in other Unices (like MacOSX) and Windows. Note for Windows users: Due to license issues, commercial versions of Qt and PyQt are needed to run ViTables on Windows platforms. Furthermore, those libraries must be packaged in a special manner to fulfill some special license requirements. An installer that handles properly these issues is being developed. A Windows version of ViTables will be published as soon as the installer development finishes. Current development state =2D------------------------ This is a beta version. The first stable, commercial, version will be available late in March. What is in the package =2D--------------------- In the package you will find the program sources, some info files as README, INSTALL and LICENSE, and the documentation directory. Documentation includes the User's Guide in HTML4 and also the xml source file, so you can format it as you want. Finally, those of you interested in the internals of ViTables can find the documentation of all its modules in HTML4 format. Legal notice =2D----------- Please, remember that this is commercial software. The beta version is made publically available so that beta testers can work on it, but the terms of the license must be respected. Basically it means that the software or its modifications cannot be distributed to anybody in any way without C=C3=A1rabos explicit permission. See the LICENSE file for detailed information. Share your experience =2D-------------------- Let me know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy Data with ViTables, the troll of the PyTables family! =2D-=20 Share what you know, learn what you don't |
From: Vicent M. (V+) <vm...@ca...> - 2005-02-25 12:51:44
|
Announcing ViTables-1.0b =2D----------------------- I'm happy to announce the availability of ViTables-1.0b, the new member of the PyTables family. It's a graphical tool for browsing and editing files in both PyTables and HDF5 format. As it happens with the entire PyTables family, the main strength of ViTables is its ability to manage really large datasets in a fast and comfortable manner. For example, with ViTables you can open a table with one thousand millions of rows in a few tenths of second, with very low memory requirements. In this release you will find, among others, the following features: - Display data hierarchy as a fully browsable object tree. - Open several files simultaneously. - Reorganize your existing files in a graphical way. - Display files and nodes (group or leaf) properties, including metadata and attributes. - Display heterogeneous entities, i.e. tables. - Display homogeneous (numeric or textual) entities, i.e. arrays. - Zoom into multidimensional table cells. - Editing capabilities for nodes and attributes: creation/deletion, copy/paste, rename... - Fully integrated documentation browser Moreover, once CSTables (the client-server version of PyTables) will be out, ViTables will be able to manage remote PyTables/HDF5 files as if they were local ones. Platforms =2D-------- At the moment, ViTables has been fully tested only on Linux platforms, but as it is made on top of Python, Qt, PyQt and PyTables, its portability should be really good and should work just fine in other Unices (like MacOSX) and Windows. Note for Windows users: Due to license issues, commercial versions of Qt and PyQt are needed to run ViTables on Windows platforms. Furthermore, those libraries must be packaged in a special manner to fulfill some special license requirements. An installer that handles properly these issues is being developed. A Windows version of ViTables will be published as soon as the installer development finishes. Current development state =2D------------------------ This is a beta version. The first stable, commercial, version will be available late on Mars. What is in the package =2D--------------------- In the package you will find the program sources, some info files as README, INSTALL and LICENSE, and the documentation directory. Documentation includes the User's Guide in HTML4 and also the xml source file, so you can format it as you want. Finally, those of you interested in the internals of ViTables can find the documentation of all its modules in HTML4 format. Legal notice =2D----------- Please, remember that this is commercial software. The beta version is made publically available so that beta testers can work on it, but the terms of the license must be respected. Basically it means that the software or its modifications cannot be distributed to anybody in any way without C=C3=A1rabos explicit permission. See the LICENSE file for detailed information. Share your experience =2D-------------------- Let me know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy Data with ViTables, the troll of the PyTables family! Vicent Mas =2D-=20 Share what you know, learn what you don't |
From: Francesc A. <fa...@ca...> - 2005-02-25 08:58:55
|
A Dijous 24 Febrer 2005 23:13, travlr va escriure: > Hi Francesc, > > After returning to my project, and still encountering repetitive > crashes. I took your advice to revert back to Python v.2.3... and > alas: No crashes. Good to know that. > Do happen to know if the Py developers are aware of this; If it is to > be patched (or new release)... or do you think I would need to submit > this characteristic to them as a bug? As I don't really know the source of the problems, I did not reported the problem to nobody. I know that PyTables + Python 2.4 is perfectly stable in Linux (and probably most of UNIX, including MacOSX). I'm guessing here, but I'm afraid that the change of the compiler required for compiling python extensions in Windows platforms for Python 2.4 (MSVC 7.1 instead of MSVC 6) would be the responsible for the unstability of PyTables+Pyton2.4 on this platform. Maybe is the C code generated with Pyrex does not work well with MSVC 7.1, or maybe there are other sources for problems as well, who knows. I do hope tough, that after a short period of time, these problems would be alleviated, and eventually disappear. Meanwhile, I'd strongly recommend to use Python 2.3 with PyTables in Windows platforms. > Thanks to you and the team. BTW, I look forward to Vitables :-) Hopefully, you will have news from us on that subject very shortly. Cheers, =2D-=20 >qo< Francesc Altet =A0 =A0 http://www.carabos.com/ V =A0V C=E1rabos Coop. V. =A0=A0Enjoy Data "" |
From: travlr <vel...@gm...> - 2005-02-24 22:13:38
|
Hi Francesc, After returning to my project, and still encountering repetitive crashes. I took your advice to revert back to Python v.2.3... and alas: No crashes. Do happen to know if the Py developers are aware of this; If it is to be patched (or new release)... or do you think I would need to submit this characteristic to them as a bug? Thanks to you and the team. BTW, I look forward to Vitables :-) Kevin |
From: damon f. <dam...@ya...> - 2005-02-23 00:08:09
|
Hi Francesc, Hmmm, strange. I will look into the install error. I did get the source tarball from the pytables site. I don't think I already had pytables installed because I could not import it previously. If there was a fatal error when I did install....I don't understand why I can import it now. I'm actually installing on a different machine right now and was about to post a message regarding that install. I'll try a fresh install and see what happens. Thanks for the sort snippet! On another note, can someone tell me how I can attach messages to this mailing list to a particular thread? I don't see anything like a 'reply' or post to thread on the mailing list page. Thanks, Damon __________________________________ Do you Yahoo!? All your favorites on one personal page Try My Yahoo! http://my.yahoo.com |
From: Francesc A. <fa...@ca...> - 2005-02-22 09:37:47
|
Hi Damon, A Divendres 18 Febrer 2005 20:31, damon fasching va escriure: > I built PyTables from sources. I've attached the > stdout and stderr logs from the build script. These > error messages aren't causing me any problems, so I In your build logs I've seen a fatal error that should have prevented a successful build, namely: src/hdf5Extension.c:8959: error: too few arguments to function `H5ARRAYget_info' error: command 'gcc' failed with exit status 1 I can't understand how you have generated the hdf5Extension with such an error(??). Besides, I've double checked that this error should not happen with PyTables 0.9.1. Are you sure that you have taken the original tarball from pytables web site and that this was unmodified before the build process? > Regarding read rates, I was thinking that 4 seconds to > read 23 items was slower than expected. But maybe > not. There were probably 5 accesses to the index > and then 23 accesses to the tables, 140 ms per access. Well, depending on how far are the entries from each other, 140 ms maybe large or not. You've already said that you have a slow hard disk, so this may be source of slowness. The new indexing algorithm in forthcoming pytables Pro may achieve significantly better performance. If you are curious, I can send you an alpha version of pytables Pro so that you can check it with your own data pattern. > What will be the function of the "sorted by" and > "group by" qualifiers. Since they are qualifiers on > search methods it sounds as if they don't alter the > structure of the underlying table, rather they present > the results in a particular order. So search > performance would not be impacted. Correct > What I would like to be able to do is actually > reorganize my tables so the entries are ordered by a > particular field. I will normally be accessing data in > order of this field, so if the table were ordered by > this field I could just use a cursor to access the > data...fast fast fast :) My tables will be small > enough to fit in memory, so in principle I could load > the entire array into memory and then sort it, but I > don't see a sort method on the table in either > PyTables or numarray. I see. You can reorder you RecArrays (in-memory tables) very easily in that way: r2 =3D records.array(r[r.field('c2').argsort()]) #sort by column 'c2' Cheers, =2D-=20 >qo< Francesc Altet =A0 =A0 http://www.carabos.com/ V =A0V C=E1rabos Coop. V. =A0=A0Enjoy Data "" |
From: Francesc A. <fa...@ca...> - 2005-02-22 09:24:34
|
Please, ignore this =2D-=20 >qo< Francesc Altet =A0 =A0 http://www.carabos.com/ V =A0V C=E1rabos Coop. V. =A0=A0Enjoy Data "" |
From: travlr <vel...@gm...> - 2005-02-18 23:53:06
|
PyTables Pro: http://www.carabos.com/products/pytables-pro.html.en On Fri, 18 Feb 2005 14:41:01 -0500, Daehyok Shin <sd...@gm...> wrote: > I met several messages mentioning PyTables Pro? > What is this? Is it essentially PyTables 1.0? > Or, some commercialized version? > -- > Daehyok Shin (Peter) > Geography Department > University of North Carolina-Chapel Hill > USA > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Daehyok S. <sd...@gm...> - 2005-02-18 19:41:10
|
I met several messages mentioning PyTables Pro? What is this? Is it essentially PyTables 1.0? Or, some commercialized version? -- Daehyok Shin (Peter) Geography Department University of North Carolina-Chapel Hill USA |
From: damon f. <dam...@ya...> - 2005-02-18 19:32:08
|
Hi Francesc, Thanks for your answers. Regarding the 'unknown compression type' messages, here is the information you requested. ~test python test_all.py --show-versions -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= PyTables version: 0.9.1 Extension version: $Id: hdf5Extension.c,v 1.170.2.2 2004/11/22 17:50:39 falted Exp $ HDF5 version: 1.6.3-patch numarray version: 1.1.1 Zlib version: 1.2.2 LZO version: 1.08 (Jul 12 2002) UCL version: 1.03 (Jul 20 2004) Python version: 2.4 (#1, Jan 31 2005, 12:54:29) [GCC 3.3.5 (Debian 1:3.3.5-6)] Platform: linux2-x86_64 Byte-ordering: little -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= I built PyTables from sources. I've attached the stdout and stderr logs from the build script. These error messages aren't causing me any problems, so I can live with it. (When I request a filter with ucl or zlib PyTables generates the error message that the compression type is unknown, but then goes ahead and compresses with them anyway.) Let me know if there is any more data that would be helpful for you on this. Regarding read rates, I was thinking that 4 seconds to read 23 items was slower than expected. But maybe not. There were probably 5 accesses to the index and then 23 accesses to the tables, 140 ms per access. What will be the function of the "sorted by" and "group by" qualifiers. Since they are qualifiers on search methods it sounds as if they don't alter the structure of the underlying table, rather they present the results in a particular order. So search performance would not be impacted. What I would like to be able to do is actually reorganize my tables so the entries are ordered by a particular field. I will normally be accessing data in order of this field, so if the table were ordered by this field I could just use a cursor to access the data...fast fast fast :) My tables will be small enough to fit in memory, so in principle I could load the entire array into memory and then sort it, but I don't see a sort method on the table in either PyTables or numarray. Thanks! Damon __________________________________ Do you Yahoo!? Yahoo! Mail - Helps protect you from nasty viruses. http://promotions.yahoo.com/new_mail |
From: Francesc A. <fa...@ca...> - 2005-02-18 13:15:39
|
A Divendres 18 Febrer 2005 06:30, damon fasching va escriure: > Hi, > > I posted a couple of days ago regarding indexes which > go missing (but still occupy file space). The > suggestion to hang tables directly from root rather > than from somewhere higher in the tree worked. If > tables can only be indexed if they are hanging off of > root, perhaps that should be documented. (On the > other hand, there was also a hint that this might be > fixed in later snapshots, so perhaps it isn't a big > deal.) Yes, that should be fixed in snapshots. > > I have a couple of other questions. > > 1) Does the "pos" attribute of an IsDescription field > have an performance implications? Well, that depends on your exactly column description, but I don't really expect big performance implications. > 2) If I open a file x with x=3Dtables.openFile(...) then > print x prints some info for that file, whether from > within a script or interactively. Just entering 'x' > interactively displays even more useful information. > But including the 'x' in a script seems to have no > effect. Why does entering the file's symbol only > display the file information from an interactive > session? Entering 'x' in the Python console effectively prints the ouput of repr(x). So do a print repr(x) if you want to do the same programatically. > > 3) When I use a filter with lzo compression I get the > following message: > "ERROR: unknown compression type: lzo" > If I use zlib, I get the message > "ERROR: unknown compression type: zlib" > If I use ucl, I do not get an error message. Ummm, I've never had such a report in Unix (nor Linux of course). Can you go to the test directory and issue: $ python test_all.py --show-versions and get back to me the results, please? Are you compiling pytables yourself or is a package. If it is package, from which distribution? > 4) ...And this one is the real mystery. > I have a script which creates some tables, writes some > data to them, flushes the tables and then reads some > data back. I have attached a simplified version of > the script. You can see in the script that each table > has two keys defined, outerKey and innerKey. The data > in the file is presented sorted by outerKey. When I > read back though, I want to get all of the data which > has a particular innerKey. (See the read back lines at > the end of the script.) After writing all of the > data, the file is around 260 MB (that's with > compression). When I read back all of the data with > the given keys, only about 23 entries are returned in > all. The read time is around 4 seconds, though, or > 170 ms per item. Is this normal? (If I remove the > indexed=3D1 flag from the table declarations, then the > readback takes about 50 seconds.) I don't quite understand the question. Do you find this time large or short? How many entries has your table? a factor 10 of acceleration is not so bad when indexing. Also, note that the first time you do the lookup takes significantly more time than subsequent lookups. [Also, it's worth to say that for PyTables 1.0 Pro we are implementing a completely revamped indexing engine that will accelerate the search far better than 0.9.x implementation, specially for very large tables. With the new code, we are getting tipical speed-ups of 100x compared with 0.9.x and for tables with a bilion of rows. That means lookups under 1 tenth of second for these such a large beasts.] > I am running on a pretty fast machine (AMD64, 1.8 GHz) > but the disk is only 4200 (it's a laptop) with 500 MB > of RAM. The index adds about 128 MB to the file size, > i.e. the file is about 130 MB w/o the index. With that configuration, the file may perfectly fit in the OS filesystem cache. However, pytables is designed to efficiently handle files that exceed available memory as well, with just a little overhead over the in-core case. > An additional question related to read rates in the > attached script, I will normally want to access the > data in order of innerKey, so it would be nice if I > could sort the data by innerKey before starting > accesses. I have looked around in the numarray and > pytables documentation for a way to sort these > records, but don't see anything obvious. Do you have > any suggestions? We are working in implementing "sorted by" and "group by" qualifiers for search method. They will likely be included in forthcoming pytables Pro. Cheers, =2D-=20 >qo< Francesc Altet =A0 =A0 http://www.carabos.com/ V =A0V C=E1rabos Coop. V. =A0=A0Enjoy Data "" |
From: damon f. <dam...@ya...> - 2005-02-18 05:30:46
|
Hi, I posted a couple of days ago regarding indexes which go missing (but still occupy file space). The suggestion to hang tables directly from root rather than from somewhere higher in the tree worked. If tables can only be indexed if they are hanging off of root, perhaps that should be documented. (On the other hand, there was also a hint that this might be fixed in later snapshots, so perhaps it isn't a big deal.) I have a couple of other questions. 1) Does the "pos" attribute of an IsDescription field have an performance implications? 2) If I open a file x with x=tables.openFile(...) then print x prints some info for that file, whether from within a script or interactively. Just entering 'x' interactively displays even more useful information. But including the 'x' in a script seems to have no effect. Why does entering the file's symbol only display the file information from an interactive session? 3) When I use a filter with lzo compression I get the following message: "ERROR: unknown compression type: lzo" If I use zlib, I get the message "ERROR: unknown compression type: zlib" If I use ucl, I do not get an error message. Yet, pytables files generated from the same data and with everything else the same except the compression library, lzo, zlib and ucl produce different output file sizes, all of which are significantly smaller than a file generated without compression. So, lzo and zlib are in fact used. So why the error message? For the record, the following interactive session demonstrates that the libraries are available. ~ python Python 2.4 (#1, Jan 31 2005, 12:54:29) [GCC 3.3.5 (Debian 1:3.3.5-6)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import tables >>> print "PyTables version: %s" % tables.__version__ PyTables version: 0.9.1 >>> tinfo = tables.whichLibVersion("zlib") >>> print "Zlib version: %s" % (tinfo[1]) Zlib version: 1.2.2 >>> tinfo = tables.whichLibVersion("lzo") >>> print "LZO version: %s (%s)" % (tinfo[1],tinfo[2]) LZO version: 1.08 (Jul 12 2002) >>> tinfo = tables.whichLibVersion("ucl") >>> print "UCL version: %s (%s)" % (tinfo[1],tinfo[2]) UCL version: 1.03 (Jul 20 2004) 4) ...And this one is the real mystery. I have a script which creates some tables, writes some data to them, flushes the tables and then reads some data back. I have attached a simplified version of the script. You can see in the script that each table has two keys defined, outerKey and innerKey. The data in the file is presented sorted by outerKey. When I read back though, I want to get all of the data which has a particular innerKey. (See the read back lines at the end of the script.) After writing all of the data, the file is around 260 MB (that's with compression). When I read back all of the data with the given keys, only about 23 entries are returned in all. The read time is around 4 seconds, though, or 170 ms per item. Is this normal? (If I remove the indexed=1 flag from the table declarations, then the readback takes about 50 seconds.) I am running on a pretty fast machine (AMD64, 1.8 GHz) but the disk is only 4200 (it's a laptop) with 500 MB of RAM. The index adds about 128 MB to the file size, i.e. the file is about 130 MB w/o the index. An additional question related to read rates in the attached script, I will normally want to access the data in order of innerKey, so it would be nice if I could sort the data by innerKey before starting accesses. I have looked around in the numarray and pytables documentation for a way to sort these records, but don't see anything obvious. Do you have any suggestions? Thanks! Damon __________________________________ Do you Yahoo!? Yahoo! Mail - now with 250MB free storage. Learn more. http://info.mail.yahoo.com/mail_250 |
From: Francesc A. <fa...@ca...> - 2005-02-14 19:41:17
|
A Dilluns 14 Febrer 2005 19:50, damon fasching va escriure: > Hi Francis, > > I've tried the latest snapshot. When I try to build, > I get the following error: > > gcc: src/hdf5Extension.c: No such file or directory > gcc: no input files > error: command 'gcc' failed with exit status 1 Uh, that's normal. You need to have Pyrex (http://nz.cosc.canterbury.ac.nz/~greg/python/Pyrex/) installed before compiling. In fact, this is an error on our part because the snapshots should be generated with a src/hdf5Extension.c included. We will mark this as a bug and try to address it later on. Sorry for the inconvenience. > If I have hit a known bug, is there a way around it? > All I need to do is index a column and have that index > available the next time I open the file...or is that > the bug? The bug consists in that indexes of tables residing in a depth level larger than 1 in the object tree are badly referenced, so they become unreachable. If you try to do the same, but with your tables hanging directly from root instead than a group, then you will hopefully get the correct behavior. Cheers, =2D-=20 >qo< Francesc Altet =A0 =A0 http://www.carabos.com/ V =A0V C=E1rabos Coop. V. =A0=A0Enjoy Data "" |
From: damon f. <dam...@ya...> - 2005-02-14 18:50:43
|
Hi Francis, I've tried the latest snapshot. When I try to build, I get the following error: gcc: src/hdf5Extension.c: No such file or directory gcc: no input files error: command 'gcc' failed with exit status 1 I copied hdf5Extension.c from the latest release and get the following error: src/hdf5Extension.c:8959: warning: passing arg 4 of `H5ARRAYget_info' from incompatible pointer type src/hdf5Extension.c:8959: warning: passing arg 6 of `H5ARRAYget_info' from incompatible pointer type src/hdf5Extension.c:8959: errorP too few arguments to function `H5ARRAYget_info' I tried to a binary search through the snapshots to find the latest one which builds. It seems that None of the snapshots build out of the box because hdf5Extension.c is missing. Can you point me to a snapshot that builds? If so, I'll be happy to test it with my application. ----------------- If I have hit a known bug, is there a way around it? All I need to do is index a column and have that index available the next time I open the file...or is that the bug? Thanks! Damon __________________________________ Do you Yahoo!? Yahoo! Mail - 250MB free storage. Do more. Manage less. http://info.mail.yahoo.com/mail_250 |
From: Francesc A. <fa...@ca...> - 2005-02-14 15:21:43
|
Hi, I'm afraid you have hitten a known bug. Please, try with a recent snapshot available in: http://www.carabos.com/downloads/pytables/snapshots/ and tell me if this works for you. Regards, A Dilluns 14 Febrer 2005 09:43, damon fasching va escriure: > Hi, > > I'm having some trouble generating an index on a Table > column. Actually, I think the index is created, but I > can't access it the next time I open the file. > > I have a job which creates a very simple pytables > file. Just one group with 4 pretty simple tables. I > dump a bunch of data into the tables. There is one > column on which I will usually select, so I would like > to create an index on this column. In order to get > experience with pytables, I've done two runs over the > same input data. In the first run I did not index any > columns. In the second run I indexed the column of > interest. I'm certain the index was generated because > a) I created the IsDescription class with indexed=3D1 > for that column (very straightforward), b) that was > the only difference between the scripts for the two > runs and the output file for the indexed run was about > 20% larger than the output file with no index. > > When I access data (in a later Python session), using > the where operator to select on the indexed column, > the access is pretty slow, and in fact, it is SLOWER > when I run using the file with the index. I'm pretty > sure that the index is in the file (because the file > is bigger) but it is not seen when I open the file. > If I print the file (by typing the symbol returned by > openFile()) none of the columns are indicated as > having been indexed. Yet, if I print the file during > the session in which it was originally created, the > column which I asked to be indexed is indeed indicated > as so. > > I reproduced the error in an interactive session with > a simpler file structure. I have attached the screen > dump of that session as 'pytables.indextest.session'. > I've added a few annotations to this file, but it > should be pretty easy to see the problem. I've > attached another file, 'pytables.install.session' > which is a screendump of the session where I built > installed and tested pytables. From there you can see > the details of my configuration. I think everything > is in order. > > So, a lot of words, but a single question. What > happened to my index...and how can I make it not > happend? > > Thanks! > Damon > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com =2D-=20 >qo< Francesc Altet =A0 =A0 http://www.carabos.com/ V =A0V C=E1rabos Coop. V. =A0=A0Enjoy Data "" |
From: damon f. <dam...@ya...> - 2005-02-14 08:43:31
|
Hi, I'm having some trouble generating an index on a Table column. Actually, I think the index is created, but I can't access it the next time I open the file. I have a job which creates a very simple pytables file. Just one group with 4 pretty simple tables. I dump a bunch of data into the tables. There is one column on which I will usually select, so I would like to create an index on this column. In order to get experience with pytables, I've done two runs over the same input data. In the first run I did not index any columns. In the second run I indexed the column of interest. I'm certain the index was generated because a) I created the IsDescription class with indexed=1 for that column (very straightforward), b) that was the only difference between the scripts for the two runs and the output file for the indexed run was about 20% larger than the output file with no index. When I access data (in a later Python session), using the where operator to select on the indexed column, the access is pretty slow, and in fact, it is SLOWER when I run using the file with the index. I'm pretty sure that the index is in the file (because the file is bigger) but it is not seen when I open the file. If I print the file (by typing the symbol returned by openFile()) none of the columns are indicated as having been indexed. Yet, if I print the file during the session in which it was originally created, the column which I asked to be indexed is indeed indicated as so. I reproduced the error in an interactive session with a simpler file structure. I have attached the screen dump of that session as 'pytables.indextest.session'. I've added a few annotations to this file, but it should be pretty easy to see the problem. I've attached another file, 'pytables.install.session' which is a screendump of the session where I built installed and tested pytables. From there you can see the details of my configuration. I think everything is in order. So, a lot of words, but a single question. What happened to my index...and how can I make it not happend? Thanks! Damon __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
From: Francesc A. <fa...@ca...> - 2005-02-11 20:05:55
|
Hi Heather, El Friday 11 February 2005 12:37, Heather Alexander escribio: > I was just hoping to find out how ViTables was progressing, and when > it might be released. I was also wondering if it is intended to exceed > HDFview's functionality and if so how. Thank you very much for your > excellence. Nice to see you are interested in ViTables. Although there has been some delays (sorry Norbert) the project is progressing well, and a beta version will hopefully be available next week. As far as HDFView is concerned what we are trying is to make a tool that cover its deficiencies in managing large datasets. As it happens with the entire PyTables family, ViTables is intended to manage really large datasets in a fast and comfortable way. This is its main strength. For example, with ViTables you can open a table with 100 millions of rows in a few tenths of second, with very low memory requirements. HDFView simply cannot do that as it consumes tons of memory. Included features in the beta version will be the ability to create object trees from scratch (that later can be fed with data from Pytables scripts), object tree edition capabilities (what includes moving nodes from one file tree to another one), and attribute edition capabilities. Another distinctive feature is that, once CSTables (the client-server version of PyTables) will be out, ViTables will be able to manage remote PyTables files as if they were local ones. Other features include ease of configuration and an integrated documentation browser. At the moment ViTables is focused on browsing data more than on editing data, but edition of leaf data can be added on future releases if a demand of such feature exists. And last, but not least, ViTables can be used as our starting point to develop customised graphical tools to handle HDF5 data. Regards, Vicent Mas & Francesc Altet =2D-=20 >qo< Francesc Altet =A0 =A0 http://www.carabos.com/ V =A0V C=E1rabos Coop. V. =A0=A0Enjoy Data "" |
From: Heather A. <vel...@gm...> - 2005-02-11 11:37:29
|
Hi Francesc, I was just hoping to find out how ViTables was progressing, and when it might be released. I was also wondering if it is intended to exceed HDFview's functionality and if so how. Thank you very much for your excellence. regards |
From: Francesc A. <fa...@ca...> - 2005-02-07 08:37:39
|
The new SpamAssassin put to work recently to filter spam in all sourceforce discussions lists is giving me problems lately by insisting in that my messages are spam.=20 Well, sorry to bother you with this "spam". I don't think it is really so, but my opinion is probably biased compared with SpamAssassin ;) Hope this time works, =2D--------- Missatge transm=E8s ---------- Subject: Re: [Pytables-users] Unable to remove all rows in a table Date: Divendres 04 Febrer 2005 13:13 =46rom: Francesc Altet <fa...@ca...> To: pyt...@li... Cc: Ashley Walsh <ash...@sy...> Hi, You're hitting a known bug in HDF5. I've reported it, together with a C program that exposes the bug (I'm attaching it) to the HDF5 people last December, and they responded (12-01-04): """ I talked to the developer about this... Truncating a dataset is not allowed currently. I will enter a bug report for this so that we discuss this issue, and for now we will give an error when a user tries to shrink the size of a dataset (rather than crashing). """ I don't know anything new about that. You may want to complain yourself to the HDF5 crew, if this problem is important enough to you. Cheers, A Divendres 04 Febrer 2005 06:16, Ashley Walsh va escriure: > Hi, > > I'm having some problems with Table.removeRows(), which doesn't want to > remove all the rows from a table. Python crashes if I try to remove > all the rows from a table: > > table.removeRows(0, table.nrows) > > MacOSX crashes with a bus error, while Windows 2000 has a memory error > and prints a message about "HDF5: inifinite loop closing library". > > I did try the latest snapshot (pytables-20050204.tar.gz) on MacOSX, and > the problem didn't go away. > > Burrowing down makes it look like the problem is in H5TBdelete_record > in H5TB.c, but then things started getting too hard for me! > > Cheers, > Ashley > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting > Tool for open source databases. Create drag-&-drop reports. Save time > by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. > Download a FREE copy at http://www.intelliview.com/go/osdn_nl > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users =2D- >qo< Francesc Altet =A0 =A0 http://www.carabos.com/ V =A0V C=E1rabos Coop. V. =A0=A0Enjoy Data "" =2D------------------------------------------------------ =2D-=20 >qo< Francesc Altet =A0 =A0 http://www.carabos.com/ V =A0V C=E1rabos Coop. V. =A0=A0Enjoy Data "" |
From: Ashley W. <ash...@sy...> - 2005-02-04 05:17:10
|
Hi, I'm having some problems with Table.removeRows(), which doesn't want to remove all the rows from a table. Python crashes if I try to remove all the rows from a table: table.removeRows(0, table.nrows) MacOSX crashes with a bus error, while Windows 2000 has a memory error and prints a message about "HDF5: inifinite loop closing library". I did try the latest snapshot (pytables-20050204.tar.gz) on MacOSX, and the problem didn't go away. Burrowing down makes it look like the problem is in H5TBdelete_record in H5TB.c, but then things started getting too hard for me! Cheers, Ashley |
From: Francesc A. <fa...@ca...> - 2005-01-31 19:38:45
|
A Dilluns 31 Gener 2005 19:21, Norbert Nemec va escriure: > > That sounds reasonable. What about making rows --> rowlist and > > columns --> columnlist? > > How about rowselect and colselect? Probably a matter of taste. 'list' is > fine with me as well. I still prefer rowlist and columnlist. Although selectrows and selectcolums might be nice as well. > OK, I agree that retrieving out of order is a killer. Still - if I give a > list of rows to read, I would expect the result to be ordered in the same > way. Maybe, the code could check whether the list is ordered and throw an > error otherwise? Checking, whether a list is sorted is relatively > inexpensive. > (OK, sorting an already sorted list is inexpensive as well, but still, I > don't like the idea of silently changing the order of rows in a request.) I see your point. Mmm, perhaps raising a warning in case the list is unordered would be better (in the end, this operation would be perfectly valid). > Alternatively, one could sort the list first and use the permutation of t= he > > rowselect,permutation =3D zip(sorted(zip(rowselect,range(len(rowselect))= ))) > result =3D read_rows(rowselect) > dummy,result =3D zip(sorted(zip(permutation,result))) > return result > > but that's probably overkill... No, numarray can do this very efficently for NumArray objects. The problem could be re-ordering the resulting RecArray, that can be a bit more inefficient, but in the long run, that maybe worth the effort for most of cases. Good idea! Cheers and good luck!, =2D-=20 >qo< Francesc Altet =A0 =A0 http://www.carabos.com/ V =A0V C=E1rabos Coop. V. =A0=A0Enjoy Data "" |
From: Norbert N. <Nor...@gm...> - 2005-01-31 18:21:30
|
Am Montag 31 Januar 2005 17:37 schrieb Francesc Altet: > > The most complete set of parameters would be something like > > start=None, stop=None, step=1, rows=None, columns=None > > where 'rows' does what 'sequence' and 'coords' to up to now. 'columns' > > might not exist for all routines (e.g. remove) that can - by principle > > only address whole rows. > > That sounds reasonable. What about making rows --> rowlist and > columns --> columnlist? How about rowselect and colselect? Probably a matter of taste. 'list' is fine with me as well. > > I would suggest dropping the automatic sorting of sequences. Documenting > > that unsorted lists kill the performance should be enough. I think it is > > better if a user who is unaware of the issue gets bad performance than > > wrong results. > > I disagree in this point. Sorting an object in-memory is a relatively > fast operation, while retrieving an un-sorted sequence from disk can > be *killer*. The default should be the solution that less impact on > performance, and this is sorting my default. On optimization-consciuos > user can read the manual and try to disable sorting, if appropriate. OK, I agree that retrieving out of order is a killer. Still - if I give a list of rows to read, I would expect the result to be ordered in the same way. Maybe, the code could check whether the list is ordered and throw an error otherwise? Checking, whether a list is sorted is relatively inexpensive. (OK, sorting an already sorted list is inexpensive as well, but still, I don't like the idea of silently changing the order of rows in a request.) Alternatively, one could sort the list first and use the permutation of the rowselect,permutation = zip(sorted(zip(rowselect,range(len(rowselect))))) result = read_rows(rowselect) dummy,result = zip(sorted(zip(permutation,result))) return result but that's probably overkill... > However, perhaps it could be useful to add 'step' for remove and > implement this in as a sequence of remove(start,stop) that fakes the > intended behaviour. It would not be very efficient, but... The interface would be a lot cleaner, and if somebody really suffers from the bad performance, he might be more willing to work on it than now, where it is a mostly hypothetical issue... :-) > Well, not me nor Ivan are going to address any of these issues for a > while (at least in a couple of weeks or so). So feel free to download > a recent snapshot (preferibly after this night, as I've fixed a couple > of things today in Table.py): OK, I'll probably do so in the near future. Ciao, Nobbi -- _________________________________________Norbert Nemec Bernhardstr. 2 ... D-93053 Regensburg Tel: 0941 - 2009638 ... Mobil: 0179 - 7475199 eMail: <No...@Ne...> |