Re: [Pytables-users] read rates...

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

A Divendres 18 Febrer 2005 06:30, damon fasching va escriure:
> Hi,
>
> I posted a couple of days ago regarding indexes which
> go missing (but still occupy file space).  The
> suggestion to hang tables directly from root rather
> than from somewhere higher in the tree worked.  If
> tables can only be indexed if they are hanging off of
> root, perhaps that should be documented.  (On the
> other hand, there was also a hint that this might be
> fixed in later snapshots, so perhaps it isn't a big
> deal.)

Yes, that should be fixed in snapshots.

>
> I have a couple of other questions.
>
> 1) Does the "pos" attribute of an IsDescription field
> have an performance implications?

Well, that depends on your exactly column description, but I don't
really expect big performance implications.

> 2) If I open a file x with x=3Dtables.openFile(...) then
> print x prints some info for that file, whether from
> within a script or interactively.  Just entering 'x'
> interactively displays even more useful information.
> But including the 'x' in a script seems to have no
> effect.  Why does entering the file's symbol only
> display the file information from an interactive
> session?

Entering 'x' in the Python console effectively prints the ouput of
repr(x). So do a print repr(x) if you want to do the same
programatically.

>
> 3) When I use a filter with lzo compression I get the
> following message:
> "ERROR: unknown compression type: lzo"
> If I use zlib, I get the message
> "ERROR: unknown compression type: zlib"
> If I use ucl, I do not get an error message.

Ummm, I've never had such a report in Unix (nor Linux of course). Can
you go to the test directory and issue:

$ python test_all.py --show-versions

and get back to me the results, please? Are you compiling pytables
yourself or is a package. If it is package, from which distribution?

> 4) ...And this one is the real mystery.
> I have a script which creates some tables, writes some
> data to them, flushes the tables and then reads some
> data back.  I have attached a simplified version of
> the script.  You can see in the script that each table
> has two keys defined, outerKey and innerKey.  The data
> in the file is presented sorted by outerKey.  When I
> read back though, I want to get all of the data which
> has a particular innerKey. (See the read back lines at
> the end of the script.)  After writing all of the
> data, the file is around 260 MB (that's with
> compression).  When I read back all of the data with
> the given keys, only about 23 entries are returned in
> all.  The read time is around 4 seconds, though, or
> 170 ms per item.  Is this normal?  (If I remove the
> indexed=3D1 flag from the table declarations, then the
> readback takes about 50 seconds.)

I don't quite understand the question. Do you find this time large or
short? How many entries has your table? a factor 10 of acceleration is
not so bad when indexing. Also, note that the first time you do the
lookup takes significantly more time than subsequent lookups.

[Also, it's worth to say that for PyTables 1.0 Pro we are implementing
a completely revamped indexing engine that will accelerate the search
far better than 0.9.x implementation, specially for very large tables.
With the new code, we are getting tipical speed-ups of 100x compared
with 0.9.x and for tables with a bilion of rows. That means lookups
under 1 tenth of second for these such a large beasts.]

> I am running on a pretty fast machine (AMD64, 1.8 GHz)
> but the disk is only 4200 (it's a laptop) with 500 MB
> of RAM.  The index adds about 128 MB to the file size,
> i.e. the file is about 130 MB w/o the index.

With that configuration, the file may perfectly fit in the OS
filesystem cache. However, pytables is designed to efficiently handle
files that exceed available memory as well, with just a little
overhead over the in-core case.

> An additional question related to read rates in the
> attached script, I will normally want to access the
> data in order of innerKey, so it would be nice if I
> could sort the data by innerKey before starting
> accesses.  I have looked around in the numarray and
> pytables documentation for a way to sort these
> records, but don't see anything obvious.  Do you have
> any suggestions?

We are working in implementing "sorted by" and "group by" qualifiers
for search method. They will likely be included in forthcoming
pytables Pro.

Cheers,

=2D-=20
>qo<   Francesc Altet =A0 =A0 http://www.carabos.com/
V =A0V   C=E1rabos Coop. V. =A0=A0Enjoy Data
 ""