From: damon f. <dam...@ya...> - 2005-02-18 05:30:46
|
Hi, I posted a couple of days ago regarding indexes which go missing (but still occupy file space). The suggestion to hang tables directly from root rather than from somewhere higher in the tree worked. If tables can only be indexed if they are hanging off of root, perhaps that should be documented. (On the other hand, there was also a hint that this might be fixed in later snapshots, so perhaps it isn't a big deal.) I have a couple of other questions. 1) Does the "pos" attribute of an IsDescription field have an performance implications? 2) If I open a file x with x=tables.openFile(...) then print x prints some info for that file, whether from within a script or interactively. Just entering 'x' interactively displays even more useful information. But including the 'x' in a script seems to have no effect. Why does entering the file's symbol only display the file information from an interactive session? 3) When I use a filter with lzo compression I get the following message: "ERROR: unknown compression type: lzo" If I use zlib, I get the message "ERROR: unknown compression type: zlib" If I use ucl, I do not get an error message. Yet, pytables files generated from the same data and with everything else the same except the compression library, lzo, zlib and ucl produce different output file sizes, all of which are significantly smaller than a file generated without compression. So, lzo and zlib are in fact used. So why the error message? For the record, the following interactive session demonstrates that the libraries are available. ~ python Python 2.4 (#1, Jan 31 2005, 12:54:29) [GCC 3.3.5 (Debian 1:3.3.5-6)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import tables >>> print "PyTables version: %s" % tables.__version__ PyTables version: 0.9.1 >>> tinfo = tables.whichLibVersion("zlib") >>> print "Zlib version: %s" % (tinfo[1]) Zlib version: 1.2.2 >>> tinfo = tables.whichLibVersion("lzo") >>> print "LZO version: %s (%s)" % (tinfo[1],tinfo[2]) LZO version: 1.08 (Jul 12 2002) >>> tinfo = tables.whichLibVersion("ucl") >>> print "UCL version: %s (%s)" % (tinfo[1],tinfo[2]) UCL version: 1.03 (Jul 20 2004) 4) ...And this one is the real mystery. I have a script which creates some tables, writes some data to them, flushes the tables and then reads some data back. I have attached a simplified version of the script. You can see in the script that each table has two keys defined, outerKey and innerKey. The data in the file is presented sorted by outerKey. When I read back though, I want to get all of the data which has a particular innerKey. (See the read back lines at the end of the script.) After writing all of the data, the file is around 260 MB (that's with compression). When I read back all of the data with the given keys, only about 23 entries are returned in all. The read time is around 4 seconds, though, or 170 ms per item. Is this normal? (If I remove the indexed=1 flag from the table declarations, then the readback takes about 50 seconds.) I am running on a pretty fast machine (AMD64, 1.8 GHz) but the disk is only 4200 (it's a laptop) with 500 MB of RAM. The index adds about 128 MB to the file size, i.e. the file is about 130 MB w/o the index. An additional question related to read rates in the attached script, I will normally want to access the data in order of innerKey, so it would be nice if I could sort the data by innerKey before starting accesses. I have looked around in the numarray and pytables documentation for a way to sort these records, but don't see anything obvious. Do you have any suggestions? Thanks! Damon __________________________________ Do you Yahoo!? Yahoo! Mail - now with 250MB free storage. Learn more. http://info.mail.yahoo.com/mail_250 |