From: Michael L. <le...@cn...> - 2003-10-09 21:37:45
|
PyTables Users, I hope you can take a moment to answer a question about the optimal way to use PyTables for my application. I have as many as 20 million records to read from one file and write to a Pytable, and in the original file these records are not is the order that I want them to be. I could write them out in the same order as they appear in the original file, and use the Pytable commands to select the records I need, but I am afraid that will take a long time if the records are not adjacent. I could order the records in memory, but I am trying to avoid having all of them in memory. What I want to do is to read in a record, determine where it should go in the output file (according to an index) and then write the record to the correct spot. I would record the first and last index of the records that belong with each code, and then retrieve the block of records as a whole.That way, when I need to retrieve a block of records, they will all be from the area of the disk, and I assume that will be faster. Is this possible, and if so, what is the fastest way to do this? Am I mis-understanding the problem? M Michael Lefsky, Assistant Professor Department of Forest, Rangeland and Watershed Stewardship Colorado State University, Fort Collins, CO 80523 970/491-0602 970/491-6754 (FAX) le...@cn... |