From: Tim C. <tc...@op...> - 2002-12-27 23:49:28
|
On Fri, 2002-12-27 at 12:55, Magnus Lie Hetland wrote: > Tim Churches <tc...@op...>: > [snip] > > Have a look at the discussion on RecordArrays in this overview of > > Numarray: http://stsdas.stsci.edu/numarray/DesignOverview.html > > Sounds interesting. > > > However, in the meantime, as you note, its not too hard to write a class > > which emulates R/S-Plus data frames. Just store each column in its own > > Numeric array of the appropriate type > > Yeah -- it's just that I'd like to keep a set of columns collected as > a two-dimensional array, to allow horizontal summing and the like. > (Not much more complicated, but an extra issue to address.) > > > (which might be the PyObject > > types, which can hold any Python object type), > > Hm. Yes. I can't seem to find these anymore. I seem to recall using > type='o' or something in Numeric, but I can't find the right type > objects now... (Guess I'm just reading the docs and dir(numeric) > poorly...) It would be nice if array(['foo']) just worked. Oh, well. Just like this: >>> import Numeric >>> a = Numeric.array(['a','b','c'],typecode=Numeric.PyObject) >>> a array([a , b , c ],'O') >>> > > > By memory-mapping disc-based > > versions of the Numeric arrays, and using the BsdDb3 record number > > database format for the string columns, you can even make a disc-based > > "record array" which can be larger than available RAM+swap. > > Sounds quite useful, although quite similar to MetaKit. (I suppose I > could use some functions from numarray on columns in MetaKit... But > that might just be too weird -- and it would still just be a > collection of columns :]) I really like MetaKit's column-based storage, but it just doesn't scale well (on the author's admission, and verified empirically) - beyond a few 10**5 records, it bogs down terribly, whereas memory-mapped NumPy plus BsdDb3 recno databse for strings scales well to many tens of millions of records (or more, but thats as far as I have tested). Tim C |