From: Pierre GM <pgm...@ma...> - 2006-10-16 06:35:05
|
Folks, I just posted on the scipy/developers zone wiki (http://projects.scipy.org/scipy/numpy/wiki/MaskedArray) a reimplementation of the masked_array mopdule, motivated by some problems I ran into while subclassing MaskedArray. The main differences with the initial numpy.core.ma package are that MaskedArray is now a subclass of ndarray and that the _data section can now be any subclass of ndarray (well, it should work in most cases, some tweaking might required here and there). Apart from a couple of issues listed below, the behavior of the new MaskedArray class reproduces the old one. It is quite likely to be significantly slower, though: I was more interested in a clear organization than in performance, so I tended to use wrappers liberally. I'm sure we can improve that rather easily. The new module, along with a test suite and some utilities, are available here: http://projects.scipy.org/scipy/numpy/attachment/wiki/MaskedArray/maskedarray.py http://projects.scipy.org/scipy/numpy/attachment/wiki/MaskedArray/masked_testutils.py http://projects.scipy.org/scipy/numpy/attachment/wiki/MaskedArray/test_maskedarray.py Please note that it's still a work in progress (even if it seems to work quite OK when I use it). Suggestions, comments, improvements and general feedback are more than welcome ! |
From: Michael S. <mic...@gm...> - 2006-10-17 02:08:53
|
Does this new MA class allow masking of rearray like arrays? The numpy (1.0b5) version does not seem to. e.g. from numpy import * desc = [('name','S30'),('age',int8),('weight',float32)] a = array([('Bill',31,260.0),('Fred', 15, 145.0)], dtype=desc) print a[0] print a['name'] a2 = ma.array([('Bill',31,260.0),('Fred', 15, 145.0)], dtype=desc, mask=ma.nomask) print a2[0] print a2['name'] a3 = ma.array([('Bill',31,260.0),('Fred', 15, 145.0)], dtype=desc, mask=[[True,False,False],[False,False, False]]) print a3[0] print a3['name'] -- output ('Bill', 31, 260.0) [Bill Fred] ('Bill', 31, 260.0) [Bill Fred] Traceback (most recent call last): File "D:\eclipse\Table\scripts\testrecarray.py", line 12, in ? a3 = ma.array([('Bill',31,260.0),('Fred', 15, 145.0)], dtype=desc, mask=[[True,False,False],[False,False, False]]) File "C:\Python23\Lib\site-packages\numpy\core\ma.py", line 592, in __init__ raise MAError, "Mask and data not compatible." numpy.core.ma.MAError: Mask and data not compatible. On 10/16/06, Pierre GM <pgm...@ma...> wrote: > Folks, > I just posted on the scipy/developers zone wiki > (http://projects.scipy.org/scipy/numpy/wiki/MaskedArray) a reimplementation > of the masked_array mopdule, motivated by some problems I ran into while > subclassing MaskedArray. > > The main differences with the initial numpy.core.ma package are that > MaskedArray is now a subclass of ndarray and that the _data section can now > be any subclass of ndarray (well, it should work in most cases, some tweaking > might required here and there). Apart from a couple of issues listed below, > the behavior of the new MaskedArray class reproduces the old one. It is quite > likely to be significantly slower, though: I was more interested in a clear > organization than in performance, so I tended to use wrappers liberally. I'm > sure we can improve that rather easily. > > The new module, along with a test suite and some utilities, are available > here: > http://projects.scipy.org/scipy/numpy/attachment/wiki/MaskedArray/maskedarray.py > http://projects.scipy.org/scipy/numpy/attachment/wiki/MaskedArray/masked_testutils.py > http://projects.scipy.org/scipy/numpy/attachment/wiki/MaskedArray/test_maskedarray.py > > Please note that it's still a work in progress (even if it seems to work quite > OK when I use it). Suggestions, comments, improvements and general feedback > are more than welcome ! > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Num...@li... > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > |
From: Pierre GM <pgm...@ma...> - 2006-10-17 02:43:11
|
On Monday 16 October 2006 22:08, Michael Sorich wrote: > Does this new MA class allow masking of rearray like arrays? Excellent question! Which is to say, I have no idea... I don't use recordarray, so I didn't think about testing them. So, a first test indicates that it doesn't work either. The mask for a3 is understood as having a size 6, when the data is seen as size 2 only (exactly the same problem as the original ma module). I gonna poke around and check whether I can come with a workaround. Just to make it clear: you want to be able to mask some fields in some records, right ? Not mask all the fields of a record ? Thanks for the feedback, I'll keep you posted. |
From: Michael S. <mic...@gm...> - 2006-10-18 00:19:31
|
On 10/17/06, Pierre GM <pgm...@ma...> wrote: > On Monday 16 October 2006 22:08, Michael Sorich wrote: > > Does this new MA class allow masking of rearray like arrays? > > Excellent question! Which is to say, I have no idea... I don't use > recordarray, so I didn't think about testing them. > > So, a first test indicates that it doesn't work either. The mask for a3 is > understood as having a size 6, when the data is seen as size 2 only (exactly > the same problem as the original ma module). > I gonna poke around and check whether I can come with a workaround. Just to > make it clear: you want to be able to mask some fields in some records, > right ? Not mask all the fields of a record ? Yes, that is correct. It would be preferable to have more control over masking than simply masking an entire record. I view the heterogenouse arrays as 2D arrays in which it should be possible to mask individual values. |
From: Pierre GM <pgm...@ma...> - 2006-10-23 12:52:28
|
Folks, I updated the alternative implementation of MaskedArray on the wiki, mainly to correct a couple of bugs. (http://projects.scipy.org/scipy/numpy/wiki/MaskedArray) In addition, I attached another file, maskedrecordarray, which introduce a new class, MaskedRecord, as a subclass of recarray and MaskedArray. An instance of this class accepts a recarray as data, and uses two masks: the 'recordmask' has as many entries as records in the array, each entry with the same fields as a record, but of boolean types, indicating whether a field is masked or not; an entry is flagged as masked in the 'mask' array if at least one field is masked. The 'mask' object is introduced mostly for compatibilty with MaskedArray, only 'recordmask' is really useful. A few examples in the file should give you an idea of what can be done. In particular, you can define a new maskedrecord array as simply as ; a = masked_record([('Alan',29,200.), ('Bill',31,260.0)], dtype=[('name','S30'),('age',int_),('weight',float_)], mask=[(1,0,0), (0,0,0)]) Note that maskedrecordarray is still quite experimental. As I'm not a regular user of records, I don't really know what should be implemented... The file can be accessed at http://projects.scipy.org/scipy/numpy/attachment/wiki/MaskedArray/maskedrecordarray.py Once again, I need your comments and suggestions ! Thanks. Pierre |
From: Michael S. <mic...@gm...> - 2006-10-24 06:50:21
|
I am currently running numpy rc2 (I haven't tried your reimplementation yet as I am still using python 2.3). I am wondering whether the new maskedarray is able to handle construction of arrays from masked scalar values (not sure if this is the correct term). I ran across a situation recently when I was picking individual values from a masked array, collecting them in a list and then subsequently constructing an array with these values. This does not work if any of the values choosen are masked. See example below On a more general note I am interested to find out whether there are any other languages that handle masked/missing data well and if so how this is done. My only experience is with R, which I have found to be quite good (there is a special value NA this signifies a masked value - this can be mixed in with non-masked values when defining an array). from numpy import * a = ma.array([1,2,3], mask=[True, False, False]) print a[0], type(a[0]) print a[1], type(a[1]) print list(a) a = ma.array(list(a)) -- output -- -- <class 'numpy.core.ma.MaskedArray'> 2 <type 'numpy.int32'> [array(data = 999999, mask = True, fill_value=999999) , 2, 3] C:\Python23\Lib\site-packages\numpy\core\ma.py:604: UserWarning: Cannot automatically convert masked array to numeric because data is masked in one or more locations. warnings.warn("Cannot automatically convert masked array to "\ Traceback (most recent call last): File "D:\eclipse\Table\scripts\testrecarray.py", line 23, in ? a = ma.array(list(a)) File "C:\Python23\Lib\site-packages\numpy\core\ma.py", line 562, in __init__ c = numeric.array(data, dtype=tc, copy=True, order=order) TypeError: an integer is required On 10/16/06, Pierre GM <pgm...@ma...> wrote: > Folks, > I just posted on the scipy/developers zone wiki > (http://projects.scipy.org/scipy/numpy/wiki/MaskedArray) a reimplementation > of the masked_array mopdule, motivated by some problems I ran into while > subclassing MaskedArray. > > The main differences with the initial numpy.core.ma package are that > MaskedArray is now a subclass of ndarray and that the _data section can now > be any subclass of ndarray (well, it should work in most cases, some tweaking > might required here and there). Apart from a couple of issues listed below, > the behavior of the new MaskedArray class reproduces the old one. It is quite > likely to be significantly slower, though: I was more interested in a clear > organization than in performance, so I tended to use wrappers liberally. I'm > sure we can improve that rather easily. > > The new module, along with a test suite and some utilities, are available > here: > http://projects.scipy.org/scipy/numpy/attachment/wiki/MaskedArray/maskedarray.py > http://projects.scipy.org/scipy/numpy/attachment/wiki/MaskedArray/masked_testutils.py > http://projects.scipy.org/scipy/numpy/attachment/wiki/MaskedArray/test_maskedarray.py > > Please note that it's still a work in progress (even if it seems to work quite > OK when I use it). Suggestions, comments, improvements and general feedback > are more than welcome ! > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Num...@li... > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > |
From: Pierre GM <pgm...@gm...> - 2006-10-24 18:09:18
|
On Tuesday 24 October 2006 02:50, Michael Sorich wrote: > I am currently running numpy rc2 (I haven't tried your > reimplementation yet as I am still using python 2.3). I am wondering > whether the new maskedarray is able to handle construction of arrays > from masked scalar values (not sure if this is the correct term). The answer is no, unfortunately: both the new and old implementations fail at the same point, raising a TypeError: lists are processed through numpy.core.numeric.array, which doesn't handle that. But there's a workaround: creating an empty masked array, and filling it by hand: b = list(a) A = MA.array(N.empty(len(b))) for (k,v) in enumerate(b): A[k] = v should work in most cases. I guess I could plug something along those lines in the new implementation... But there's a problem anyway: if you don't precise a type at the creation of the empty array, the type will be determined automatically, which may be a problem if you mix numbers and strings: the maskedarray is detected as a string in that case. |
From: Michael S. <mic...@gm...> - 2006-11-08 01:11:31
|
On 10/25/06, Pierre GM <pgm...@gm...> wrote: > On Tuesday 24 October 2006 02:50, Michael Sorich wrote: > > I am currently running numpy rc2 (I haven't tried your > > reimplementation yet as I am still using python 2.3). I am wondering > > whether the new maskedarray is able to handle construction of arrays > > from masked scalar values (not sure if this is the correct term). > > The answer is no, unfortunately: both the new and old implementations fail at > the same point, raising a TypeError: lists are processed through > numpy.core.numeric.array, which doesn't handle that. I have finally gotten around to upgrading to python 2.4 and have had a chance to play with your new version of the MaskedArray. It is great to see that someone is working on this. I have a few thoughts on masked arrays that may or may no warrant discusion 1. It would be nice if the masked_singleton could be passed into a ndarray, as this would allow it to be passed into the MaskedArray e.g. import numpy as N import ma.maskedarray as MA test = N.array([1,2,MA.masked]) >> ValueError: setting an array element with a sequence If the masked_singleton was implemented as an object that is not a MakedArray (which is a sequence that numpy.array chokes on), then a valid numpy array with an object dtype could be produced. e.g. class MaskedScalar: def __str__(self): return 'masked' masked = MaskedScalar() test = N.array([1,2,masked]) print test.dtype, test >>object [1 2 masked] print test == masked >>[False False True] print test[2] == masked >>True print test[2] is masked >>True Then it would be possible to alternatively define a masked array as MA.array([1,2,masked]) or MA.array(N.array([1,2,masked])). In the __init__ of the MaskedArray if the ndarray has an object dtype simply calculate the mask from a==masked. 2. What happens if a masked array is passed into a ndarray or passed into a MaskedArray with another mask? test_ma1 = MA.array([1,2,3], mask=[False, False, True]) print test_ma1 >>[1 2 --] print N.array(test_ma1) >>[1 2 3] test_ma2 = MA.array(test_ma1, mask=[True, False, False]) print test_ma2 >>[-- 2 3] I suppose it depends on whether you are masking useful data, or the masks represent missing data. In the former it may make sense to change or remove the mask. However in the latter case the original data entered is a bogus value which should never be unmasked. In this case, when converting to a ndarray I think it make more sense to make an object ndarray with the missing value containing the masked singleton. Additionally, if the MaskedArray is holding missing data, it does not make much sense to be able to pass in to the MA constructor both an existing ma and a mask. |
From: Pierre GM <pgm...@gm...> - 2006-11-08 10:10:21
|
Michael, First of all, thanks for your interest in the exercise of style the new implementation of MaskedArray is basically nothing but. On Tuesday 07 November 2006 20:11, Michael Sorich wrote: > 1. It would be nice if the masked_singleton could be passed into a > ndarray, as this would allow it to be passed into the MaskedArray e.g. > > import numpy as N > import ma.maskedarray as MA > test = N.array([1,2,MA.masked]) > > >> ValueError: setting an array element with a sequence I like your idea, but not its implementation. If MA.masked_singleton is defined as an object, as you suggest, then the dtype of the ndarray it is passed to becomes 'object', as you pointed out, and that is not something one would naturally expec, as basic numerical functions don't work well with the 'object' dtype (just try N.sqrt(N.array([1],dtype=N.object)) to see what I mean). Even if we can construct a mask rather easily at the creation of the masked array, following your 'a==masked' suggestion, we still need to get the dtype of the non-masked section, and that doesn't seem trivial... I guess that a simple solution is to use MA.masked_values. Make sure to use a numerical value for the masked data, else you'll end up with yet another object array. > 2. What happens if a masked array is passed into a ndarray or passed > into a MaskedArray with another mask? >>> test_ma1 = MA.array([1,2,3], mask=[False, False, True]) >>> print test_ma1, N.array(test_ma1), [1 2 --] [1 2 3] >>> MA.array(test_ma1, mask=[True, False, False]) [-- 2 3] Let me precise that my objective was to get an implementation as close to the original numpy.core.ma as possible, for 'backward compatibility'. I'm not sure it'd be wise to change it at this point, but that could be discussed. As you've noticed, when creating a new masked array from an existing one, the 'mask' argument supersedes the initial mask. That's ideal when you want to focus on a fraction of the initial data: you just mask what you don't need, and are still able to retrieve it when you need it. I agree that this default behavior is a bit strange when you have missing data: in that case, one would expect the new mask to be a combination of the 'mask' argument and the old mask. A possibility would then be to add a 'keep_mask' flag: a default of False would give the current behavior, a value of True would force the new mask to be a combination. I think that feelings are mixed on that list about extra flags, but the users of maskedarray are only a minority anyway (hopefully, only for the moment). About the conversion to ndarray: By default, the result should have the same dtype as the _data section. For this reason, I disagree with your idea of "(returning) an object ndarray with the missing value containing the masked singleton". If you really want an object ndarray, you can use the filled method or the filled function, with your own definition of the filling value (such as your MaskedScalar). |
From: A. M. A. <per...@gm...> - 2006-11-08 18:42:09
|
On 08/11/06, Pierre GM <pgm...@gm...> wrote: > I like your idea, but not its implementation. If MA.masked_singleton is > defined as an object, as you suggest, then the dtype of the ndarray it is > passed to becomes 'object', as you pointed out, and that is not something one > would naturally expec, as basic numerical functions don't work well with the > 'object' dtype (just try N.sqrt(N.array([1],dtype=N.object)) to see what I > mean). > Even if we can construct a mask rather easily at the creation of the masked > array, following your 'a==masked' suggestion, we still need to get the dtype > of the non-masked section, and that doesn't seem trivial... A good candidate for "should be masked" marked is NaN. It is supposed to mean, more or less, "no sensible value". Unfortunately, integer types do not have such a special value. It's also conceivable that some user might want to keep NaNs in their array separate from the mask. Finally, on some hardware, operations with NaN are very slow (so leaving them in the array, even masked, might not be a good idea). The reason I suggest this is that in the last major application I had for numpy, one stage of the problem would occasionally result in NaNs for certain values, but the best thing I could do was leave them in place to represent "no data". Switching to a MaskedArray might have been a better idea, but the NaNs were a rare occurrence. > About the conversion to ndarray: > By default, the result should have the same dtype as the _data section. > For this reason, I disagree with your idea of "(returning) an object ndarray > with the missing value containing the masked singleton". If you really want > an object ndarray, you can use the filled method or the filled function, with > your own definition of the filling value (such as your MaskedScalar). If you've got floating point, you can again fill in NaNs, but you have a good point about wanting to extract the original values that were masked out. Depending on what one is doing, one might want one or the other. A. M. Archibald |
From: Tim H. <tim...@ie...> - 2006-11-09 05:20:53
|
A. M. Archibald wrote: > On 08/11/06, Pierre GM <pgm...@gm...> wrote: > > >> I like your idea, but not its implementation. If MA.masked_singleton is >> defined as an object, as you suggest, then the dtype of the ndarray it is >> passed to becomes 'object', as you pointed out, and that is not something one >> would naturally expec, as basic numerical functions don't work well with the >> 'object' dtype (just try N.sqrt(N.array([1],dtype=N.object)) to see what I >> mean). >> Even if we can construct a mask rather easily at the creation of the masked >> array, following your 'a==masked' suggestion, we still need to get the dtype >> of the non-masked section, and that doesn't seem trivial... >> > > A good candidate for "should be masked" marked is NaN. It is supposed > to mean, more or less, "no sensible value". Unfortunately, integer > types do not have such a special value. It's also conceivable that > some user might want to keep NaNs in their array separate from the > mask. Finally, on some hardware, operations with NaN are very slow (so > leaving them in the array, even masked, might not be a good idea). > It has always been my experience (on various flavors or Pentium) that operating on NANs is extremely slow. Does anyone know on what hardware NANs are *not* slow? Of course it's always possible I just never notice NANs on hardware where they aren't slow. [SNIP] -tim |
From: A. M. A. <per...@gm...> - 2006-11-09 07:27:49
|
On 08/11/06, Tim Hochberg <tim...@ie...> wrote: > It has always been my experience (on various flavors or Pentium) that > operating on NANs is extremely slow. Does anyone know on what hardware > NANs are *not* slow? Of course it's always possible I just never notice > NANs on hardware where they aren't slow. On an opteron machine I have access to, they appear to be no slower (and even faster for some transcendental functions) than ordinary floats: In [13]: a=zeros(1000000) In [14]: %time for i in xrange(1000): a += 1.1 CPU times: user 6.87 s, sys: 0.00 s, total: 6.87 s Wall time: 6.87 In [15]: a *= NaN In [16]: %time for i in xrange(1000): a += 1.1 CPU times: user 6.86 s, sys: 0.00 s, total: 6.86 s Wall time: 6.87 On my Pentium M, they are indeed significantly slower (three times? I didn't really do enough testing to say how much). I am actually rather offended by this unfair discrimination against a citizen in good standing of the IEEE floating point community. A. M. Archibald |
From: Tim H. <tim...@ie...> - 2006-11-09 18:05:35
|
A. M. Archibald wrote: > On 08/11/06, Tim Hochberg <tim...@ie...> wrote: > > >> It has always been my experience (on various flavors or Pentium) that >> operating on NANs is extremely slow. Does anyone know on what hardware >> NANs are *not* slow? Of course it's always possible I just never notice >> NANs on hardware where they aren't slow. >> > > On an opteron machine I have access to, they appear to be no slower > (and even faster for some transcendental functions) than ordinary > floats: > > In [13]: a=zeros(1000000) > > In [14]: %time for i in xrange(1000): a += 1.1 > CPU times: user 6.87 s, sys: 0.00 s, total: 6.87 s > Wall time: 6.87 > > In [15]: a *= NaN > > In [16]: %time for i in xrange(1000): a += 1.1 > CPU times: user 6.86 s, sys: 0.00 s, total: 6.86 s > Wall time: 6.87 > > On my Pentium M, they are indeed significantly slower (three times? I > didn't really do enough testing to say how much). I am actually rather > offended by this unfair discrimination against a citizen in good > standing of the IEEE floating point community. > If they're only 3x slower you're doing better than I am. On my core duo box they appear to be nearly 20x slower for both addition and multiplication. This more or less matches what I recall from earlier boxes. >>> Timer("a.prod()", "import numpy as np; a = np.ones(4096); a[0] = np.nan").timeit(10000) 5.6495738983585397 >>> Timer("a.prod()", "import numpy as np; a = np.ones(4096)").timeit(10000) 0.34439041833525152 >>> Timer("a.sum()", "import numpy as np; a = np.ones(4096); a[0] = np.nan").timeit(10000) 6.0985655998179027 >>> Timer("a.sum()", "import numpy as np; a = np.ones(4096)").timeit(10000) 0.32354363473564263 I've been told that operations on NANs are slow because they aren't always implemented in the FPU hardware. Instead they are trapped and implemented software or firmware or something or other. That may well be bogus 42nd hand information though. -tim |
From: Pierre GM <pgm...@gm...> - 2006-11-08 21:08:35
|
> A good candidate for "should be masked" marked is NaN. It is supposed > to mean, more or less, "no sensible value". Which might turn out out to be the best indeed. Michael's application would then look like >>> import numpy as N >>> import maskedarray as MA >>> maskit = N.nan >>> test = N.array([1,2,maskit]) >>> test_ma1 = MA.array(x,mask=N.isnan(x)) > Switching to a MaskedArray might have > been a better idea, but the NaNs were a rare occurrence. Once again, that's a situation when one would use masked arrays. > If you've got floating point, you can again fill in NaNs, but you have > a good point about wanting to extract the original values that were > masked out. Depending on what one is doing, one might want one or the > other. In any case, I think we should stick to the numpy.core.ma default behavior for backwards compatibility. If you really wanht to distinguish between several kind of masks (one for missing data, one for data to discard temporarily), that could be done by defining a special subclass. But is it really needed ? A smart use of filled and masked_values should do the trick in most cases. |
From: Paul D. <pfd...@gm...> - 2006-11-09 04:02:02
|
2 cents from the author of the first folio: The intent was to allow creation of masked arrays with copy=no, so that the original data could be retrieved from it if desired. But I was quite, quite rigorous about NEVER assuming the data in a masked slot made any sense whatsoever. The intention was that there are two ways to get a numeric array out of a masked one: 1. Get the data field 2. m.filled() (1) is strictly at your own risk. |
From: Eric F. <ef...@ha...> - 2006-11-09 21:50:14
|
It looks like on my Pentium M multiplication with NaNs is slow, but using a masked array ranges from slightly faster (with only one value masked) to twice as slow (with all values masked): In [15]:Timer("a.prod()", "import numpy as np; aa = np.ones(4096); a = np.ma.masked_greater(aa,0)").timeit(10000) Out[15]:9.4012830257415771 In [16]:Timer("a.prod()", "import numpy as np; a = np.ones(4096); a[0]= np.nan").timeit(10000) Out[16]:5.5737850666046143 In [17]:Timer("a.prod()", "import numpy as np; a = np.ones(4096)").timeit(10000) Out[17]:0.40796804428100586 In [18]:Timer("a.prod()", "import numpy as np; aa = np.ones(4096); aa[0] = 2; a = np.ma.masked_greater(aa,1)").timeit(10000) Out[18]:4.1544749736785889 In [19]:Timer("a.prod()", "import numpy as np; a = np.ones(4096); a[:]= np.nan").timeit(10000)Out[19]:5.8589630126953125 For transcendentals, nans or masks don't make much difference, although masks are slightly faster than nans: In [20]:Timer("np.sin(a)", "import numpy as np; a = np.ones(4096); a[:]= np.nan").timeit(10000) Out[20]:4.5575671195983887 In [21]:Timer("np.ma.sin(a)", "import numpy as np; aa = np.ones(4096); a = np.ma.masked_greater(aa,0)").timeit(10000) Out[21]:4.4125270843505859 In [22]:Timer("b=np.sin(a)", "import numpy as np; a = np.ones(4096)").timeit(10000) Out[22]:3.5793929100036621 Eric Tim Hochberg wrote: > A. M. Archibald wrote: >> On 08/11/06, Tim Hochberg <tim...@ie...> wrote: >> >> >>> It has always been my experience (on various flavors or Pentium) that >>> operating on NANs is extremely slow. Does anyone know on what hardware >>> NANs are *not* slow? Of course it's always possible I just never notice >>> NANs on hardware where they aren't slow. >>> >> On an opteron machine I have access to, they appear to be no slower >> (and even faster for some transcendental functions) than ordinary >> floats: >> >> In [13]: a=zeros(1000000) >> >> In [14]: %time for i in xrange(1000): a += 1.1 >> CPU times: user 6.87 s, sys: 0.00 s, total: 6.87 s >> Wall time: 6.87 >> >> In [15]: a *= NaN >> >> In [16]: %time for i in xrange(1000): a += 1.1 >> CPU times: user 6.86 s, sys: 0.00 s, total: 6.86 s >> Wall time: 6.87 >> >> On my Pentium M, they are indeed significantly slower (three times? I >> didn't really do enough testing to say how much). I am actually rather >> offended by this unfair discrimination against a citizen in good >> standing of the IEEE floating point community. >> > If they're only 3x slower you're doing better than I am. On my core duo > box they appear to be nearly 20x slower for both addition and > multiplication. This more or less matches what I recall from earlier boxes. > > >>> Timer("a.prod()", "import numpy as np; a = np.ones(4096); a[0] > = np.nan").timeit(10000) > 5.6495738983585397 > >>> Timer("a.prod()", "import numpy as np; a = > np.ones(4096)").timeit(10000) > 0.34439041833525152 > >>> Timer("a.sum()", "import numpy as np; a = np.ones(4096); a[0] = > np.nan").timeit(10000) > 6.0985655998179027 > >>> Timer("a.sum()", "import numpy as np; a = > np.ones(4096)").timeit(10000) > 0.32354363473564263 > > I've been told that operations on NANs are slow because they aren't > always implemented in the FPU hardware. Instead they are trapped and > implemented software or firmware or something or other. That may well be > bogus 42nd hand information though. > > -tim > > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Num...@li... > https://lists.sourceforge.net/lists/listinfo/numpy-discussion |
From: Christopher B. <Chr...@no...> - 2006-11-09 21:58:34
|
Tim Hochberg wrote: > I've been told that operations on NANs are slow because they aren't > always implemented in the FPU hardware. Instead they are trapped and > implemented software or firmware or something or other. which still doesn't make sense -- doesn't ANY operation with a NaN return NaN? how hard could that be? I share in the disappointment that NaN's are not first class citizens in common hardware.... -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chr...@no... |
From: Tim H. <tim...@ie...> - 2006-11-09 22:14:28
|
Christopher Barker wrote: > Tim Hochberg wrote: > >> I've been told that operations on NANs are slow because they aren't >> always implemented in the FPU hardware. Instead they are trapped and >> implemented software or firmware or something or other. >> > > which still doesn't make sense -- doesn't ANY operation with a NaN > return NaN? how hard could that be? > Well, it's not free. You still have to recognize a particular bit pattern corresponding to QNaN and then copy the result in that case. All that probably burns more silicon for a case that most people don't care about. Do NaNs make Half Life run any faster? No I didn't think so. I suspect that instead they just look for the case where the exponent is all 1s, in which case the result is either +-Inf, SNaN or QNaN and then drop into software to handle those infrequent cases. > I share in the disappointment that NaN's are not first class citizens in > common hardware.... > It would be nice to see IEEE 754 supported more faithfully, but there seems to be more of a problem on the software side than on the hardware side in practice. I still can't help feeling that using NaNs for mask values would be abuse even if it weren't slow and eventually it would reach around and bite you someplace tender. -tim |
From: Paul D. <pfd...@gm...> - 2006-11-09 23:00:32
|
Disappointed in NaN land? Since the function of old retired persons is to tell youngsters stories around the campfile: A supercomputer hardware designer told me that when forced to add IEEE arithmetic to his designs that it decreased performance substantially, maybe 25-30%; it wasn't that doing the operations took so much longer, it was that the increased physical space needed for that circuitry pushed the memory farther away. Doubtless this inspired doing some of it in software instead. No standard for controlling the behaviors exists, either, so you can find out the hard way that underflow-to-zero is being done in software by default, and that you are doing a lot of it. Or that your code doesn't have the same behavior on different platforms. To my mind, all that was really accomplished was to convince said youngsters that somehow this NaN stuff was the solution to some problem. In reality, computing for 3 days and having it print out 1000 NaNs is not exactly satisfying. I think this stuff was essentially a mistake in the sense that it is a nice ivory-tower idea that costs more in practice than it is worth. I do not think a properly thought-out and coded algorithm ought to do anything that this stuff is supposed to 'help' with, and if it does do it, the code should stop executing. Anyway, if I thought it would do the job I wouldn't have written MA in the first place. Rant off. Nap. Grumble. Stupid kids. (:-> -- Paul |
From: A. M. A. <per...@gm...> - 2006-11-10 06:43:12
|
On 09/11/06, Paul Dubois <pfd...@gm...> wrote: > Since the function of old retired persons is to tell youngsters stories > around the campfile: I'll pull up a log. But since I'm uppity, and not especially young, I hope you won't mind if I heckle. > A supercomputer hardware designer told me that when forced to add IEEE > arithmetic to his designs that it decreased performance substantially, maybe > 25-30%; it wasn't that doing the operations took so much longer, it was that > the increased physical space needed for that circuitry pushed the memory > farther away. Doubtless this inspired doing some of it in software instead. The goal of IEEE floats is not to be faster but to make doing correct numerical programming easier. (Numerical analysts can write robust numerical code for almost any kind of float, but the vast majority of numerical code is written by scientists who know the bare minimum or less about numerical analysis, so a good system makes such code work as well as possible.) The other reason IEEE floats are good is, no disrespect to your hardware designer friend intended, that they were designed with some care. Reimplementations are liable to get some of the finicky details wrong (round-to-even, denormalized numbers, what have you...). I found Kahan's "How Java's Floating-point Hurts Everyone Everywhere" ( http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf ) very educational when it comes to "what can go wrong with platform-dependent floating-point". > No standard for controlling the behaviors exists, either, so you can find > out the hard way that underflow-to-zero is being done in software by > default, and that you are doing a lot of it. Or that your code doesn't have > the same behavior on different platforms. Well, if the platform is IEEE-compliant, you can trust it to behave the same way logicially, even if some applications are slower on some systems than others. Or did you mean that there's no standard way to set the various IEEE flags? That seems like a language/compiler issue, to me (which is not to minimize the pain it causes!) > To my mind, all that was really accomplished was to convince said youngsters > that somehow this NaN stuff was the solution to some problem. In reality, > computing for 3 days and having it print out 1000 NaNs is not exactly > satisfying. I think this stuff was essentially a mistake in the sense that > it is a nice ivory-tower idea that costs more in practice than it is worth. > I do not think a properly thought-out and coded algorithm ought to do > anything that this stuff is supposed to 'help' with, and if it does do it, > the code should stop executing. Well, then turn on floating-point exceptions. I find a few NaNs in my calculations relatively benign. If I'm doing a calculation where they'll propagate and destroy everything, I trap them. But it happens just as often that I launch a job, a NaN appears early on, but I come back in the morning to find that instead of aborting, the job has completed except for a few bad data points. If you want an algorithm that takes advantage of NaNs, I have one. I was simulating light rays around a black hole, generating a map of the bending for various view angles. So I naturally did a lot of exploring of grazing incidence, where the calculations would often yield NaNs. So I used the NaNs to fill in the gap between the internal bending of light and the external bending of the light; no extra programming effort required, since that was what came out of the ray tracer anyway. The rendering step just returned NaNs for the image pixels that came from interpolating in the NaN-filled regions, which came out black; just what I wanted. I just wrote the program ignoring what would happen if a NaN appeared, and it came out right. Which is, I think, the goal - to save the programmer having to think too much about numerical-analysis headaches. > Anyway, if I thought it would do the job I wouldn't have written MA in the > first place. It's certainly not always a good solution; in particular it's hard to control the behaviour of things like where(). There's also no NaN for integer types, which makes MaskedArray more useful. Anyway, I am pleased that numpy has both controllable IEEE floats and MaskedArray, and I apologize for a basically off-topic post. A. M. Archibald |