[Pytables-users] read rates...

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi,

I posted a couple of days ago regarding indexes which
go missing (but still occupy file space).  The
suggestion to hang tables directly from root rather
than from somewhere higher in the tree worked.  If
tables can only be indexed if they are hanging off of
root, perhaps that should be documented.  (On the
other hand, there was also a hint that this might be
fixed in later snapshots, so perhaps it isn't a big
deal.)

I have a couple of other questions.

1) Does the "pos" attribute of an IsDescription field
have an performance implications?

2) If I open a file x with x=tables.openFile(...) then
print x prints some info for that file, whether from
within a script or interactively.  Just entering 'x'
interactively displays even more useful information. 
But including the 'x' in a script seems to have no
effect.  Why does entering the file's symbol only
display the file information from an interactive
session?

3) When I use a filter with lzo compression I get the
following message:
"ERROR: unknown compression type: lzo"
If I use zlib, I get the message
"ERROR: unknown compression type: zlib"
If I use ucl, I do not get an error message.
Yet, pytables files generated from the same data and
with everything else the same except the compression
library, lzo, zlib and ucl produce different output
file sizes, all of which are significantly smaller
than a file generated without compression.  So, lzo
and zlib are in fact used.  So why the error message?
For the record, the following interactive session
demonstrates that the libraries are available.
~ python
Python 2.4 (#1, Jan 31 2005, 12:54:29) 
[GCC 3.3.5 (Debian 1:3.3.5-6)] on linux2
Type "help", "copyright", "credits" or "license" for
more information.
>>> import tables
>>> print "PyTables version: %s" % tables.__version__
PyTables version: 0.9.1
>>> tinfo = tables.whichLibVersion("zlib")
>>> print "Zlib version: %s" % (tinfo[1])
Zlib version: 1.2.2
>>> tinfo = tables.whichLibVersion("lzo")
>>> print "LZO version: %s (%s)" % (tinfo[1],tinfo[2])
LZO version: 1.08 (Jul 12 2002)
>>> tinfo = tables.whichLibVersion("ucl")
>>> print "UCL version: %s (%s)" % (tinfo[1],tinfo[2])
UCL version: 1.03 (Jul 20 2004)

4) ...And this one is the real mystery.
I have a script which creates some tables, writes some
data to them, flushes the tables and then reads some
data back.  I have attached a simplified version of
the script.  You can see in the script that each table
has two keys defined, outerKey and innerKey.  The data
in the file is presented sorted by outerKey.  When I
read back though, I want to get all of the data which
has a particular innerKey. (See the read back lines at
the end of the script.)  After writing all of the
data, the file is around 260 MB (that's with
compression).  When I read back all of the data with
the given keys, only about 23 entries are returned in
all.  The read time is around 4 seconds, though, or
170 ms per item.  Is this normal?  (If I remove the
indexed=1 flag from the table declarations, then the
readback takes about 50 seconds.)

I am running on a pretty fast machine (AMD64, 1.8 GHz)
but the disk is only 4200 (it's a laptop) with 500 MB
of RAM.  The index adds about 128 MB to the file size,
i.e. the file is about 130 MB w/o the index.

An additional question related to read rates in the
attached script, I will normally want to access the
data in order of innerKey, so it would be nice if I
could sort the data by innerKey before starting
accesses.  I have looked around in the numarray and
pytables documentation for a way to sort these
records, but don't see anything obvious.  Do you have
any suggestions?

Thanks!
  Damon

__________________________________ 
Do you Yahoo!? 
Yahoo! Mail - now with 250MB free storage. Learn more.
http://info.mail.yahoo.com/mail_250