[Pytables-users] "Random Access Writing" to a Pytable

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

PyTables Users,

I hope you can take a moment to answer a question about the optimal way to 
use PyTables for my application.

I have as many as 20 million records to read from one file and write to a 
Pytable, and in the original file these records are not is the order that I 
want them to be. I could write them out in the same order as they appear in 
the original file, and use the Pytable commands to select the records I 
need, but I am afraid that will take a long time if the records are not 
adjacent. I could order the records  in memory, but I am trying to avoid 
having all of them in memory. What I want to do is to read in a record, 
determine where it should go in the output file (according to an index) and 
then write the record to the correct spot. I would record the first and 
last index of the records that belong with each code, and then retrieve the 
block of records as a whole.That way, when I need to retrieve a block of 
records, they will all be from the area of the disk, and I assume that will 
be faster.

Is this possible, and if so, what is the fastest way to do this? Am I 
mis-understanding the problem?

M

Michael Lefsky, Assistant Professor
Department of Forest, Rangeland and Watershed Stewardship
Colorado State University, Fort Collins, CO 80523
970/491-0602 970/491-6754 (FAX) le...@cn...