From: Todd M. <jm...@st...> - 2003-07-16 22:36:22
|
On Wed, 2003-07-16 at 17:43, Tim Churches wrote: > On Wed, 2003-07-16 at 05:34, Todd Miller wrote: > > I am adding arrays of Python objects to numarray and so I am curious > > about the uses people have found for Numeric's object arrays. If you > > have found Numeric's object arrays useful, please tell us about what > > you used them for so that we can make certain that numarray can satisfy > > the same need. > > We use NumPy to store vectors (rank-1 arrays) of numbers representing > columns in a dataset. The NumPy arrays, which are large and numerous) > are memory-mapped (using an extension) to disc to conserve real memory. > However, in some vectors (columns) we need to store variable-length, and > in others, variable length sequences of integers or floats (and possibly > even sets in the future). NumPy's object arrays are more > memory-efficient that Python lists of lists or lists of strings from Well, right now the prototype actually uses a single list internally as the object store; still, we might beat out lists of lists by a small margin. > these purposes, and of course they support NumPy functions such as > take(), which makes life simpler. The prototype currently uses common code for put/take on strings, object arrays, and soon record arrays. The common code is currently Python prototype. Numarray numeric arrays use specialized C-code for speed. > But we haven't been able to memory-map > these object arrays, which is a problem. Is there any prospect of > numarray supporting memory-mapped arrays of sequences/strings? numarray supports arrays of fixed length strings with its chararray module. The default chararray string stripping and padding functions blank fill unused space and give the appearance of variable length strings. The data buffers of all of numarray's classes which represent primitive data items (numbers, strings, records) can be memory mapped. I think however that memory mapping sequences or arbitrary Python objects isn't going to happen in numarray any time soon; it sounds too much like object persistence. > I know > that is a big ask! We have an extension module which stores variable > length blobs in a single memory-mapped file which might be useful - the > code could be made available to the numarray project, I think. I don't understand the difference between your module and Python's mmap. > > We also use MA extensively (because in the health care domain life is > full of missing data) - I'll jot down some thoughts on how MA could be > improved in the next few days. I'd be very interested in hearing your thoughts on improving MA. -- Todd Miller <jm...@st...> |