From: Francesc A. <fa...@py...> - 2008-11-20 09:29:20
|
A Wednesday 19 November 2008, AF...@pr... escrigué: > Francesc Alted wrote: > > A Tuesday 18 November 2008, Eric Bruning escrigué: > >> If extended slicing isn't supported, is there a better way to deal > >> with sparse indices than: > >> for row_id, datum in zip(row_ids, coldata): > >> col[row_id] = datum > > > > Yes, there is. You can make use of the .itersequence() iterator > > combined with the .update() method: > > > > for i, row in enumerate(tbl.itersequence(row_ids)): > > row['col'] = coldata[i] > > row.update() > > tbl.flush() > > > > This method has the advantage that the update is made by using an > > internal buffer, so it will be faster in general (specially when > > you modify a lot of rows) than the one you suggested. > > Two questions: > > 1. Is there a similar function for Arrays? I don't see one. No. Buffered I/O, as well as many functionality on top of it, is only available for Table objects (hence the PyTables name ;-) > > 2. The documentation for Table.itersequence says: > -- > itersequence(sequence, sort=True) > > Iterate over a sequence of row coordinates. > > A true value for sort means that the sequence will be sorted so that > I/O might perform better. If your sequence is already sorted or you > don't want to sort it, leave this parameter as false. The default is > not to sort the sequence. > -- > Is the default sort=True or sort=False. The signature and the last > sentence disagree. Yup. That's a bug in the documentation. Fixed in: http://www.pytables.org/trac/changeset/3925 At any rate, the `sort` argument for `itersequence()` is going to disappear in forthcoming 2.1. The reason is that sorting large sequences can be very time consuming and besides, it is unclear that doing this would accelerate the retrieval very much (if any at all). If the user wants to experiment with a sorted sequence he will have to provide one. Cheers, -- Francesc Alted |