From: Michael H. <mh...@al...> - 2001-05-09 23:20:45
|
Hi, I've spent several days using the masked arrays that have been added to NumPy recently. They're a great feature and they were just what I needed for the little project I was working on (aside from a few bugs that I found). However, there were a few things about MA that I found inconvenient and/or counterintuitive, so I thought I'd post them to the list while they're fresh in my mind. I'm using Numeric-20.0.0b2. 1. I couldn't find a simple way to tell if all of the cells of a masked array are unmasked. There are times when you fill an array incrementally and you want to convert it to a Numeric array but first make sure that all of the elements have been set. "m.filled()" is a bit dangerous (in my opinion) because it silently fills. The shortest idiom I could think of is >>> assert not logical_or.reduce(ravel(MA.getmaskarray(m))) which isn't very short :-) and is also awkward because it creates a mask array even if m.mask() is None. How about a m.is_unmasked() method, or even giving a special meaning to "m.filled(masked)", namely that it raises an exception if any cells are still masked. (As an optimization, this method could set m.__mask = None to speed up future checks.) 2. I can't reproduce this problem now, but I could swear that the MaskedArray.__str__() method sometimes printed "typecode='O'" if masked.enabled() is true. This would be a byproduct of using Numeric's __str__() method to print the array, at least under the unknown circumstances in which Numeric.__str__() prints the typecode. This confused me for a while. 3. I found the semantics of MA.compress(condition,a,axis=0) to be inconvenient and inconsistent with those of Numeric.compress. MA.compress() squeezes out not only those elements for which condition is false, but also those elements that are masked. This differs from the behavior of Numeric.compress, which always returns an array with the "axis" dimension equal to the number of nonzero elements of "condition". The real problem, though, is that MA.compress can't be used on a multidimensional array with a nontrivial mask, because squeezing out the masked values is highly unlikely to result in a rectangular matrix. It is nice to be able to squeeze masked values out of a 1-d array, but not at the price of not being able to use compress on a multidimensional array. I suggest giving MA.compress() semantics closer to those of Numeric.compress(), and adding an optional argument or a separate method to cause masked elements to be omitted. Thanks for a great package! Yours, Michael -- Michael Haggerty mh...@al... |
From: Paul F. D. <pa...@pf...> - 2001-05-10 03:04:16
|
-----Original Message----- From: num...@li... [mailto:num...@li...]On Behalf Of Michael Haggerty wrote 1. I couldn't find a simple way to tell if all of the cells of a masked array are unmasked. There are times when you fill an array incrementally and you want to convert it to a Numeric array but first make sure that all of the elements have been set. "m.filled()" is a bit dangerous (in my opinion) because it silently fills. The shortest idiom I could think of is >>> assert not logical_or.reduce(ravel(MA.getmaskarray(m))) which isn't very short :-) and is also awkward because it creates a mask array even if m.mask() is None. How about a m.is_unmasked() method, or even giving a special meaning to "m.filled(masked)", namely that it raises an exception if any cells are still masked. (As an optimization, this method could set m.__mask = None to speed up future checks.) ====== >>> from MA import * >>> x=array([[1,2],[3,4]],mask=[[0,0],[0,0]]) >>> count(x) 4 >>> product(x.shape) 4 >>> So your test could be if count(x) < product(x.shape): error... make_mask(m, flag=1) will make a mask and have it be None if possible. It also accepts an argument of None correctly. So your test could be if make_mask(m.mask(),flag=1) is not None: error... You could also consider if not Numeric.allclose(m.filled(0), m.filled(1)) or m.mask() is not None and not Numeric.alltrue(Numeric.ravel(m.mask())): Is that enough ways to do it? (TM) (:-> I don't recommend using assert if the test is data-driven, since it won't get executed with python -O. Instead use if...: raise .... I'm not against is_unmasked but I'm not sure how much it would get used and I don't like the name. I hate query methods with side effects (if you use them in an assert you change the program). A method that replaces the mask with None if possible might make sense. m.unmask()? m.demask()? m.debride() ? ============= 2. I can't reproduce this problem now, but I could swear that the MaskedArray.__str__() method sometimes printed "typecode='O'" if masked.enabled() is true. This would be a byproduct of using Numeric's __str__() method to print the array, at least under the unknown circumstances in which Numeric.__str__() prints the typecode. This confused me for a while. ========= Short of writing my own print routine, I basically have to create something filled with '--', which is of course of type object. That's why you can disable it. (:-> ========= 3. I found the semantics of MA.compress(condition,a,axis=0) to be inconvenient and inconsistent with those of Numeric.compress. MA.compress() squeezes out not only those elements for which condition is false, but also those elements that are masked. This differs from the behavior of Numeric.compress, which always returns an array with the "axis" dimension equal to the number of nonzero elements of "condition". The real problem, though, is that MA.compress can't be used on a multidimensional array with a nontrivial mask, because squeezing out the masked values is highly unlikely to result in a rectangular matrix. It is nice to be able to squeeze masked values out of a 1-d array, but not at the price of not being able to use compress on a multidimensional array. I suggest giving MA.compress() semantics closer to those of Numeric.compress(), and adding an optional argument or a separate method to cause masked elements to be omitted. ====== It has been an interesting project in that there are hundreds of these individual little design questions. Can you propose the semantics you would like in a precise way? Include the case where the condition has masked values. ====== Thanks for a great package! Yours, Michael === I appreciate the encouragement. -- Paul |
From: Michael H. <mh...@al...> - 2001-05-10 22:11:46
|
"Paul F. Dubois" <pa...@pf...> writes: > -----Original Message----- > Michael Haggerty wrote > 1. I couldn't find a simple way to tell if all of the cells of a > masked array are unmasked. > ====== > So your test could be if count(x) < product(x.shape): error... > > So your test could be > if make_mask(m.mask(),flag=1) is not None: > error... > > You could also consider if not Numeric.allclose(m.filled(0), m.filled(1)) > or > m.mask() is not None and not Numeric.alltrue(Numeric.ravel(m.mask())): Shouldn't that be m.mask() is not None and Numeric.sometrue(Numeric.ravel(m.mask())) ? ("Proof" that these expressions are nonintuitive.) > Is that enough ways to do it? (TM) (:-> Frankly, it's too many ways to do it, none of them obvious to the writer or the reader. This is a simple and useful concept and it should have one obvious implementation. > I'm not against is_unmasked but I'm not sure how much it would get > used and I don't like the name. I hate query methods with side > effects (if you use them in an assert you change the program). In this case the side effect is to change the internal representation of the object without changing its semantics, so I don't find it too objectionable. But omit this optimization if you prefer; the query method would be just as useful even without the side effect. Because of the relationship with filled(), maybe this query function should be called m.isfull(). There should probably also be an isfull(m) function for the same reason that there is a mask(m) function. > A method that replaces the mask with None if possible might make > sense. m.unmask()? m.demask()? m.debride() ? Of these names, I like m.unmask() the best. I assume that it would set m.__mask=None if possible and throw an exception if not. On the other hand, while it would be desirable to have a function equivalent (i.e., unmask(m)), this would be awkward because a function should usually not change its argument. Therefore, I suggest adding a safe analogue of raw_data() that throws an exception if the array has a nontrivial mask and otherwise returns self.__data. E.g. [untested]: class MaskedArray: [...] def data(self): """If no values are masked, return self.__data(). Otherwise raise an exception. """ d = self.__data m = self.__mask if m is not None and Numeric.sometrue(Numeric.ravel(m)): raise MAError, "MaskedArray cannot be converted to array" elif d.iscontiguous(): return d else: return Numeric.array(d, typecode=d.typecode(), copy=1, savespace = d.spacesaver()) def data(a): if isinstance(a, MaskedArray): return m.data() elif isinstance(a, Numeric.ArrayType) and a.iscontiguous(): return a else: return Numeric.array(a) A more obscure name should be chosen since you seem to encourage "from MA import *". > 3. I found the semantics of MA.compress(condition,a,axis=0) to be > inconvenient and inconsistent with those of Numeric.compress. > ====== > It has been an interesting project in that there are hundreds of these > individual little design questions. > Can you propose the semantics you would like in a precise way? Include the > case where the condition has masked values. > ====== In the simple case where the condition has no masked values, I think compress() should simply pick slices out according to condition, without regard to which cells of x are masked. When condition is masked, I don't think that there is a sensible interpretation for compress() because a "masked" value in condition means you don't know whether that slice of x should be included or not. Since you can't have an output array of indeterminate shape, I would throw an exception if condition is masked. Here is my attempt [untested]: def compress(condition, x, dimension=-1): # data function is defined above (throws exception if condition is masked): c = data(condition) if mask(x) is None: mask = None else: mask=Numeric.compress(condition, mask(x), dimension) return array(Numeric.compress(condition, filled(x), dimension), mask=mask) Yours, Michael -- Michael Haggerty mh...@al... |
From: Paul F. D. <pa...@pf...> - 2001-05-11 16:11:08
|
Thank you for provoking me to think about these issues in MA. Here is the conclusion I have reached. Please let me know what you think of it. Background: Michael wanted a way to use a masked array as a Numeric array but with assurance that in fact no element was masked, without obscure tests such as count(x) == product(x.shape). The method __array__(self, typecode=None) is a special (existing) hook for conversion to a Numeric array. Many operations in Numeric, when presented with an object x to be operated upon, such as Numeric.sqrt(x), will call x.__array__ as a final act of desperation in an attempt to convert their argument to a Numeric array. Heretofore it was essentially returning x.filled(). This bothered me, because it was a silent conversion that replaced masked values with the fill value. Solution: a. Add a method 'unmask()' which will replace the mask by None if possible. It will not fail. b. Change MaskedArray.__array__ to work as follows: a. self.unmask(), and then b. Return the raw data if the mask is now None. Otherwise, throw an MAError. Example usage: >>> from MA import * >>> x=arange(10) >>> Numeric.array(x) [0,1,2,3,4,5,6,7,8,9,] >>> x[3]=masked >>> Numeric.array(x) Traceback (most recent call last): File "<stdin>", line 1, in ? File "/pcmdi/dubois/linux/lib/python2.1/site-packages/MA/MA.py", line 578, in __array__ raise MAError, \ MA.MA.MAError: Cannot convert masked array to Numeric because data is masked in one or more locations. Merits of this solution: a. It reads like what it is -- there is no doubt you are converting to a Numeric array when you see Numeric.array. b. It gives you the full range of options in Numeric.array, such as messing with the typecode. c. It allows Numeric operations for speed on masked arrays that you know to be masked in name only. No copy of data occurs here unless the typecode needs to be changed. d. It removes the possibility of a 'dishonest' conversion. e. No new method or function is required, other than the otherwise-useful unmask(). f. Successive conversions are optimized because once the mask is None, unmask is cheap. Deficiency: __array__ becomes a query with an internal, albeit safe, side-effect. Mitigating this is that __array__ is not a "public" method and would not normally be used in assertions. |
From: Michael H. <mh...@al...> - 2001-05-11 18:32:02
|
"Paul F. Dubois" <pa...@pf...> writes: > The method __array__(self, typecode=None) is a special (existing) > hook for conversion to a Numeric array. Many operations in Numeric, > when presented with an object x to be operated upon, such as > Numeric.sqrt(x), will call x.__array__ as a final act of desperation > in an attempt to convert their argument to a Numeric > array. Heretofore it was essentially returning x.filled(). This > bothered me, because it was a silent conversion that replaced masked > values with the fill value. This bothered me too. > Solution: > > a. Add a method 'unmask()' which will replace the mask by None if > possible. It will not fail. > > b. Change MaskedArray.__array__ to work as follows: > a. self.unmask(), and then > b. Return the raw data if the mask is now None. > Otherwise, throw an MAError. Perfect. Great solution! Michael -- Michael Haggerty mh...@al... |
From: Michael H. <mh...@al...> - 2001-05-11 18:00:05
|
Paul F. Dubois writes: > def compress (condition, x, dimension=-1): > """Select those parts of x for which condition is true. > Masked values in condition are considered false. > """ > c = filled(condition, 0) > m = getmask(x) > if m is not None: > m=Numeric.compress(c, m, dimension) > d = Numeric.compress(c, filled(x), dimension) > return masked_array(d, m) > > > I did want a treatment of masked conditions. Consider: > > compress( x > 10, x) I see your point, but this doesn't generalize to x having more than one dimension. And with my semantics, for your example (assuming 1-d), you could substitute compress((x > 10).filled(0), x) which isn't very obscure. Moreover, sometimes it is interesting to compress an array in such a way that masked values are carried along as masked values; this cannot be done with the existing compress(). Michael -- Michael Haggerty mh...@al... |
From: Chris B. <chr...@ho...> - 2001-05-17 22:17:52
|
Hi all, I recently tried to take a slice out of an array, transpose it, and then change it with the putmask function. It didn't work, because it ends up not being contiguous. This brings up two points: 1) why does the array passed as the first argument to putmask have to be contiguous? this can be a substantial limitation, I end up making copies when I have no other reason to. 2) the docs say that transpose "returns a new array..." This is confusing. I had assumed that "new array" meant a copy, when, in fact it is another array, but the data is a reference, much like a slice. We might want to make it a little more clear in the docs what that means. Another question I have is about how to match array shapes. I find I am often getting error when I try to do something (like putmask) with two arrays, one of shape (n,1) and one of shape (n,). It seems to me that what I want to do is unambiguous, so I should be able to do it. Is the reason I can't do this: a) The NumPy interface is designed to be unambiguous, and so it is important that the code never silently translate a rank one array into a rank two array where one of the dimensions is one (and vice-versa) or b) No one has gotten around to making the code smart enough to do this when it is unambigous.? -Thanks, -Chris -- Christopher Barker, Ph.D. Chr...@ho... --- --- --- http://members.home.net/barkerlohmann ---@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Oil Spill Modeling ------ @ ------ @ ------ @ Water Resources Engineering ------- --------- -------- Coastal and Fluvial Hydrodynamics -------------------------------------- ------------------------------------------------------------------------ |