|
From: Sasha <nd...@ma...> - 2006-01-12 21:29:23
|
With Paul's permission I am posting his arguments and my responses. Numpy.ma will follow Paul's design and there is now a wiki page dedicated to the effort to make ma work better in numpy. (See http://projects.scipy.org/scipy/numpy/wiki/MaskedArray). -- sasha On 1/12/06, Sasha <nd...@ma...> wrote: > Paul, > > Thank you very much for your insights and once again thanks for all > the great work that you've done. I've noticed that your reply was not > posted on any list, do you mind if I forward it to numpy-user? Please > see more below. > > On 1/12/06, Paul F. Dubois <pa...@pf...> wrote: > > What special values? Are you sure this works on any platform? What, for > > example, is the special value for integer arrays? For arrays of objects= ? > > > Yes, these are hard questions. For floats nan is an obvious choice and > IEEE support is getting better on the new hardware. For objects None is > a fine choice. For integers some may argue for sys.maxint, and given tha= t > numpy integer arithmetics is already handling overflow a check for maxint= will > not add much complexity. Yet don't get me wrong: I don't see any replacem= ent > for ma myself. > > > How do replaceable mathematical operations make any difference? The > > fundamental problem is that if array x has special values in some place= s > > and array y has them in some other places, how do you create a result > > that has special values in the correct places AND is of a type for whic= h > > those special values are still treated as 'missing'. How do you do this= ? > > > > Replaceable operations would allow one to redefine all operations on inte= ger > arrays to treat say sys.maxint as invariant and cast it to nan in floatin= g point > conversions without changing the logic of main line numpy. > > > I converted MA to ma but did not have time to flesh out all the > > differences with the new ndarray. I was hoping the community would do > > that. > > Me too. That's was the point of my post - to find out the size of the co= munity > rather than to suggest an alternative. > > > I am retired. > > > You deserve it. > > > It is my belief that the approach you outline is not workable, but > > perhaps I am not understanding it properly. > > > I don't have any workable approach other than enchancing ma to work > better with numpy. This is what I am doing right now. > > > If I, who have thought about this a lot, do not know for sure, what > > information can you derive from a poll of the general public, who will > > not think through these issues very carefully? > > > I was trying to poll numpy community to find out how many people actually > use ma in real projects. This would determine how well tested the new fe= atures > will be and how quickly any bugs will be discovered and fixed. > Unfortunately, I have > not seen a single response saying - I've been using MA for X years on > Y projects and > plan on using it as we upgrade to numpy. There was a lot of > theoretical discussions > and a pointer to a plotting library that has recently added MA > support, but no testimony > from end users. > > > I am close to absolutely positive that subclassing won't particularly > > ease the task. > > > I thought about this a little, and I think you are right. Subclassing > may improve speed a little, but all methods will need to be adapted > the same ways as it is done without subclassing. > > > For the reason I indicated, I don't care to engage in public discussion= s > > of complex technical issues so I have not cc'd this to the group. > > > I respect that, but please allow me to forward at least portions of > this correspondence > to the community. Your insights are invaluable. > > -- sasha > > > > > Sasha wrote: > > > MA is intended to be a drop-in replacement for Numeric arrays that ca= n > > > explicitely handle missing observations. With the recent improvement= s > > > to the array object in NumPy, the MA library has fallen behind. Ther= e > > > are more than 50 methods in the ndarray object that are not present i= n > > > ma.array. > > > > > > I would like to hear from people who work with datasets with missing > > > observations? Do you use MA? Do you think with the support for nan's > > > and replaceable mathematical operations, should missing observations > > > be handled in numpy using special values rather than an array of > > > masks? > > > > > > Thanks. > > > > > > -- sasha > > > > > > > > > ------------------------------------------------------- > > > This SF.net email is sponsored by: Splunk Inc. Do you grep through lo= g files > > > for problems? Stop! Download the new AJAX search engine that makes > > > searching your log files as easy as surfing the web. DOWNLOAD SPLUN= K! > > > http://ads.osdn.com/?ad_idv37&alloc_id=16865&op=3Dclick > > > _______________________________________________ > > > Numpy-discussion mailing list > > > Num...@li... > > > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > > > > |