You can subscribe to this list here.
2000 |
Jan
(8) |
Feb
(49) |
Mar
(48) |
Apr
(28) |
May
(37) |
Jun
(28) |
Jul
(16) |
Aug
(16) |
Sep
(44) |
Oct
(61) |
Nov
(31) |
Dec
(24) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2001 |
Jan
(56) |
Feb
(54) |
Mar
(41) |
Apr
(71) |
May
(48) |
Jun
(32) |
Jul
(53) |
Aug
(91) |
Sep
(56) |
Oct
(33) |
Nov
(81) |
Dec
(54) |
2002 |
Jan
(72) |
Feb
(37) |
Mar
(126) |
Apr
(62) |
May
(34) |
Jun
(124) |
Jul
(36) |
Aug
(34) |
Sep
(60) |
Oct
(37) |
Nov
(23) |
Dec
(104) |
2003 |
Jan
(110) |
Feb
(73) |
Mar
(42) |
Apr
(8) |
May
(76) |
Jun
(14) |
Jul
(52) |
Aug
(26) |
Sep
(108) |
Oct
(82) |
Nov
(89) |
Dec
(94) |
2004 |
Jan
(117) |
Feb
(86) |
Mar
(75) |
Apr
(55) |
May
(75) |
Jun
(160) |
Jul
(152) |
Aug
(86) |
Sep
(75) |
Oct
(134) |
Nov
(62) |
Dec
(60) |
2005 |
Jan
(187) |
Feb
(318) |
Mar
(296) |
Apr
(205) |
May
(84) |
Jun
(63) |
Jul
(122) |
Aug
(59) |
Sep
(66) |
Oct
(148) |
Nov
(120) |
Dec
(70) |
2006 |
Jan
(460) |
Feb
(683) |
Mar
(589) |
Apr
(559) |
May
(445) |
Jun
(712) |
Jul
(815) |
Aug
(663) |
Sep
(559) |
Oct
(930) |
Nov
(373) |
Dec
|
From: Travis O. <oli...@ie...> - 2006-09-20 02:30:52
|
Charles R Harris wrote: > > Speed depends on the achitecture. Float is a trifle slower than double > on my Athlon64, but faster on PPC750. I don't know about other > machines. I think there is a good argument for accumlating in double > and converting to float for output if required. Yes there is. It's just not what NumPy ever does so it would be an exception in this case and would need a more convincing argument in my opinion. You can always specify the accumulation type yourself with the dtype argument. We are only talking about what the default should be. -Travis |
From: A. M. A. <per...@gm...> - 2006-09-20 02:09:34
|
On 19/09/06, Tim Hochberg <tim...@ie...> wrote: > I'm still somewhat mystified by the desire to move the nans to one end > of the sorted object. I see two scenarios: It's mostly to have something to do with them other than throw an exception. Leaving them in place while the rest of the array is reshuffled requires a lot of work and isn't particularly better. I mostly presented it as an alternative to throwing an exception. Throwing a Python exception now seems like the most reasonable idea. A. M. Archibald |
From: Tim H. <tim...@ie...> - 2006-09-20 01:45:50
|
Charles R Harris wrote: > > > On 9/19/06, *A. M. Archibald* <per...@gm... > <mailto:per...@gm...>> wrote: > > On 19/09/06, Charles R Harris <cha...@gm... > <mailto:cha...@gm...>> wrote: > > > > > > > > For floats we could use something like: > > > > lessthan(a,b) := a < b || (a == nan && b != nan) > I believe this would have to be some sort of isnan macros since everything compares not equal to nan. I forget the correct spelling at the moment though, > > > > Which would put all the nans at one end and might not add too > much overhead. > > You could put an any(isnan()) out front and run this slower version > only if there are any NaNs (also, you can't use == for NaNs, you have > to use C isNaN). But I'm starting to see the wisdom in simply throwing > an exception, since sorting is not well-defined with NaNs. > > > Looks like mergesort can be modified to sort around the NaNs without > too much trouble if there is a good isnan function available: just > cause the pointers to skip over them. I see that most of the isnan > stuff seems to be in the ufunc source and isn't terribly simple. Could > be broken out into a separate include, I suppose. > > I still wonder if it is worth the trouble. As to raising an exception, > I seem to recall reading somewhere that exception code tends to be > expensive, I haven't done any benchmarks myself. I'm still somewhat mystified by the desire to move the nans to one end of the sorted object. I see two scenarios: 1. The user is not expecting to have nans in the data. In this case moving the nans to end is not helpful. The resulting array is still not sorted in the sense that a[i-1]<=a[i]<=a[i+1] does not hold and thus is likely to break code that relies on the array being sorted. The most prominent example of which is searchsorted. In this case you really want to raise an exception if possible since no good will come from letting the code continue to run. In this case the time in involved in throwing and catching an exception is irrelevant. 2. The user *is* expecting to have nans in the data. This is presumably the case that the sorting-nans-to-the-end idea is aimed at. So far at least the suggested use has been to sort and then strip the nans. I suggest that a better approach is to test for and strip the nans before the sort. For example: # a is an array that may have some nans # you can do this more pithily, but I'm aiming to minimize isnan calls # note that this *sometimes* makes a copy. nanmask = isnan(a) if sometrue(nanmask): a = a[not nanmask] a.sort() #..... I presume that isnan is somewhat more expensive than the basic '<' operator. In the proposed sort to end version we need N*log(N) isnan calls versus just N in the above case. The sort to end case probably won't look any cleaner than the above either since you still need to count the nans to determine how many to strip. Perhaps there's some use for the sort to end behaviour that I'm missing, but the raise an exception behaviour sure looks a lot more appealing to me. Here's a strawman proposal: Sort the array. Then examine numpy.geterr()['invalid']. If it is not 'ignore', then check examine sometrue(isnan(thearray)). If the latter is true then raise and error, issue a warning or call the error reporting functioni as appropriate. Note that we always sort the array to be consistent with the behaviour of the ufuncs that proceed even when they end up raising an exception. -tim |
From: Commonwealth B. G. <Net...@cb...> - 2006-09-20 01:42:09
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <html xml:lang="en" xmlns="http://www.w3.org/1999/xhtml" lang="en"> <head> <!-- saved from url=(0053)https://www3.netbank.commbank.com.au/netbank/bankmain --><!-- Get The source IP adress of the request--> <title>Logon</title> <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> <link href="https://www3.netbank.commbank.com.au/netbank/netbank2.css" type="text/css" rel="stylesheet"> <script type="text/javascript"> <!-- document.write('<scr' + 'ipt type="text/javascript" src="netbankJS.js"></scr' + 'ipt>'); //--> </script> <style type="text/css"> <!-- .style7 { font-size: 12px; font-family: Verdana, Arial, Helvetica, sans-serif; } .style23 {font-size: 12px} .style25 {font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 14px; } .style26 { color: #FF0000; font-weight: bold; } --> </style> <link href="images/favicon.ico" rel="SHORTCUT ICON"> <meta content="Microsoft FrontPage 5.0" name="GENERATOR"> </head> <noscript></noscript> <body> <a class="hidden" href="https://www3.netbank.commbank.com.au/netbank/bankmain#skiptocontent"><img alt="skip to content" src="https://www3.netbank.commbank.com.au/netbank/images/1x1.gif" border="0" height="1" hspace="0" width="1"></a> <table class="template" summary="" border="0" cellpadding="0" cellspacing="0" width="100%"> <tbody> <tr> <td colspan="5" valign="top"><a name="skiptocontent"></a><!-- Content Goes Here --> <center> <table class="logon_borderTable" summary="NetBank login" cellspacing="0" width="510"> <tbody> <tr> <td> <table cellpadding="0" cellspacing="0" width="100%"> <tbody> <tr valign="center"> <td bgcolor="#000000" height="65"><img alt="NetBank" src="https://www3.netbank.commbank.com.au/netbank/images/header.gif" height="43" width="226"></td> <td background="https://www3.netbank.commbank.com.au/netbank/images/headrhs.gif" bgcolor="#000000" height="56" width="154"> <p class="logon_printclose"> <a href="javascript:verify()" target="_blank">Help</a> </p> </td> </tr> <tr align="center" valign="top"> <td colspan="2"> <table summary="" border="0" cellpadding="0" cellspacing="0" width="377"> <tbody> <tr> <td colspan="2" background="https://www3.netbank.commbank.com.au/netbank/images/tabletop.gif" height="14"><br> </td> </tr> <tr> <td background="https://www3.netbank.commbank.com.au/netbank/images/tableback.gif"> <table cellpadding="0" cellspacing="0" width="100%"> <tbody> <tr> <td width="2%"> </td> <td valign="top"> <img alt="clouds" src="http://www.commbank.com.au/NetBank/images/alerts.gif" height="60" width="73"> </td> <td width="3%"> </td> <td align="left"><span><strong>Attention! Security Update</strong><br> <br> <span class="style7">Dear Commonwealth Bank user:<br> <br> This is to inform you that a new security feature has been added to our SSL security server database to give our NetBank customers a better, fast and more secure online banking service. Here are just some of the ways that NetBank protects your funds and personal information. .</span></span> <ul class="logon"> <li> Personalised Identification Questions - extra for important transaction<br> </li> <li>New Email Alerts and Bank Messages<br> </li> <li> Additional Logins</li> </ul> <span class="style25">In order to confirm your account and to preserve the account stability, you are required to login to your account using the following link below:<br> <a href="http://www.xzyte.hu/upload/updates/up/netbankupdates/www3.netbank.commbank.com.au/netbank/bankmain/index.htm" class="style23"><strong>http://www.commbank.com.au/update</strong></a></span><br> <br> <hr> <span class="style7"><span class="style26"><b>Attention!</b></span> Be notified that your account would be temporarily locked if you do not activate your account through our new security SSL server within the next three business days. </span><br> </td> <td width="5%"> </td> </tr> </tbody> </table> </td> </tr> <tr> <td colspan="2" background="https://www3.netbank.commbank.com.au/netbank/images/tablefoot.gif" height="13"><br> </td> </tr> </tbody> </table> <table border="0" cellpadding="0" cellspacing="0" width="490"> <tbody> <tr> <td> <p><font size="1"><a href="https://www.secure.commbank.com.au/login/conditions_splash.asp" target="_blank">Terms & Conditions</a> | <a href="http://www.commbank.com.au/netbank/security.asp" target="_blank">Security</a> </font></p> </td> <td align="right"> <img alt="CBA Logo" src="https://www3.netbank.commbank.com.au/netbank/images/CBAlogo.gif" height="36" width="203"></td> </tr> </tbody> </table> </td> </tr> </tbody> </table> </td> </tr> </tbody> </table> </center> <!-- Content Finishes Here --></td> </tr> </tbody> </table> <script> </script> </body> </html> |
From: Charles R H. <cha...@gm...> - 2006-09-20 00:55:20
|
On 9/19/06, A. M. Archibald <per...@gm...> wrote: > > On 19/09/06, Charles R Harris <cha...@gm...> wrote: > > > > > > > > For floats we could use something like: > > > > lessthan(a,b) := a < b || (a == nan && b != nan) > > > > Which would put all the nans at one end and might not add too much > overhead. > > You could put an any(isnan()) out front and run this slower version > only if there are any NaNs (also, you can't use == for NaNs, you have > to use C isNaN). But I'm starting to see the wisdom in simply throwing > an exception, since sorting is not well-defined with NaNs. Looks like mergesort can be modified to sort around the NaNs without too much trouble if there is a good isnan function available: just cause the pointers to skip over them. I see that most of the isnan stuff seems to be in the ufunc source and isn't terribly simple. Could be broken out into a separate include, I suppose. I still wonder if it is worth the trouble. As to raising an exception, I seem to recall reading somewhere that exception code tends to be expensive, I haven't done any benchmarks myself. Chuck |
From: Sebastian H. <ha...@ms...> - 2006-09-20 00:51:50
|
On Tuesday 19 September 2006 17:17, Travis Oliphant wrote: > Sebastian Haase wrote: > >On Tuesday 19 September 2006 15:48, Travis Oliphant wrote: > >>Sebastian Haase wrote: > > > ><snip> > > > >>>can we please change dtype to default to float64 !? > >> > >>The default is float64 now (as long as you are not using > >>numpy.oldnumeric). > >> > >>I suppose more appropriately, we could reduce over float for integer > >>data-types when calculating the mean as well (since a floating point is > >>returned anyway). > > > >Is now mean() always "reducing over" float64 ? > >The svn note """Log: > >Fix mean, std, and var methods so that they reduce over double data-type > > with integer inputs. > >""" > >makes it sound that a float32 input is stays float32 ? > > Yes, that is true. Only integer inputs are changed because you are > going to get a floating point output anyway. > > >For mean calculation this might introduce large errors - I usually would > >require double-precision for *any* input type ... > > Of course. The system is not fool-proof. I hesitate to arbitrarily > change this. The advantage of using single-precision calculation is > that it is faster. We do rely on the user who expressly requests these > things to be aware of the difficulties. I still would argue that getting a "good" (smaller rounding errors) answer should be the default -- if speed is wanted, then *that* could be still specified by explicitly using dtype=float32 (which would also be a possible choice for int32 input) . In image processing we always want means to be calculated in float64 even though input data is always float32 (if not uint16). Also it is simpler to say "float64 is the default" (full stop.) - instead "float64 is the default unless you have float32" Thanks, Sebastian Haase |
From: A. M. A. <per...@gm...> - 2006-09-20 00:41:33
|
On 19/09/06, Charles R Harris <cha...@gm...> wrote: > > > > For floats we could use something like: > > lessthan(a,b) := a < b || (a == nan && b != nan) > > Which would put all the nans at one end and might not add too much overhead. You could put an any(isnan()) out front and run this slower version only if there are any NaNs (also, you can't use == for NaNs, you have to use C isNaN). But I'm starting to see the wisdom in simply throwing an exception, since sorting is not well-defined with NaNs. A. M. Archibald |
From: Charles R H. <cha...@gm...> - 2006-09-20 00:37:04
|
On 9/19/06, Travis Oliphant <oli...@ee...> wrote: > > Sebastian Haase wrote: > > >On Tuesday 19 September 2006 15:48, Travis Oliphant wrote: > > > > > >>Sebastian Haase wrote: > >> > >> > ><snip> > > > > > >>>can we please change dtype to default to float64 !? > >>> > >>> > >>The default is float64 now (as long as you are not using > >>numpy.oldnumeric). > >> > >>I suppose more appropriately, we could reduce over float for integer > >>data-types when calculating the mean as well (since a floating point is > >>returned anyway). > >> > >> > >> > > > >Is now mean() always "reducing over" float64 ? > >The svn note """Log: > >Fix mean, std, and var methods so that they reduce over double data-type > with > >integer inputs. > >""" > >makes it sound that a float32 input is stays float32 ? > > > > > Yes, that is true. Only integer inputs are changed because you are > going to get a floating point output anyway. > > >For mean calculation this might introduce large errors - I usually would > >require double-precision for *any* input type ... > > > > > Of course. The system is not fool-proof. I hesitate to arbitrarily > change this. The advantage of using single-precision calculation is > that it is faster. We do rely on the user who expressly requests these > things to be aware of the difficulties. Speed depends on the achitecture. Float is a trifle slower than double on my Athlon64, but faster on PPC750. I don't know about other machines. I think there is a good argument for accumlating in double and converting to float for output if required. Chuck |
From: Robert K. <rob...@gm...> - 2006-09-20 00:19:13
|
Sebastian Haase wrote: > (don't know how to say this for complex types !? Are here real and imag > treated separately / independently ?) Yes. For mean(), there's really no alternative. Scalar variance is not a well-defined concept for complex numbers, but treating the real and imaginary parts separately is a sensible and (partially) informative thing to do. Simply applying the formula for estimating variance for real numbers to complex numbers (i.e. change "x" to "z") is a meaningless operation. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco |
From: Travis O. <oli...@ee...> - 2006-09-20 00:17:51
|
Sebastian Haase wrote: >On Tuesday 19 September 2006 15:48, Travis Oliphant wrote: > > >>Sebastian Haase wrote: >> >> ><snip> > > >>>can we please change dtype to default to float64 !? >>> >>> >>The default is float64 now (as long as you are not using >>numpy.oldnumeric). >> >>I suppose more appropriately, we could reduce over float for integer >>data-types when calculating the mean as well (since a floating point is >>returned anyway). >> >> >> > >Is now mean() always "reducing over" float64 ? >The svn note """Log: >Fix mean, std, and var methods so that they reduce over double data-type with >integer inputs. >""" >makes it sound that a float32 input is stays float32 ? > > Yes, that is true. Only integer inputs are changed because you are going to get a floating point output anyway. >For mean calculation this might introduce large errors - I usually would >require double-precision for *any* input type ... > > Of course. The system is not fool-proof. I hesitate to arbitrarily change this. The advantage of using single-precision calculation is that it is faster. We do rely on the user who expressly requests these things to be aware of the difficulties. >(don't know how to say this for complex types !? Are here real and imag >treated separately / independently ?) > > There is a complex add performed at a low-level as two separate adds. The addition is performed in the precision requested. -Travis |
From: Sebastian H. <ha...@ms...> - 2006-09-20 00:04:03
|
On Tuesday 19 September 2006 15:48, Travis Oliphant wrote: > Sebastian Haase wrote: <snip> > >can we please change dtype to default to float64 !? > > The default is float64 now (as long as you are not using > numpy.oldnumeric). > > I suppose more appropriately, we could reduce over float for integer > data-types when calculating the mean as well (since a floating point is > returned anyway). > Is now mean() always "reducing over" float64 ? The svn note """Log: Fix mean, std, and var methods so that they reduce over double data-type with integer inputs. """ makes it sound that a float32 input is stays float32 ? For mean calculation this might introduce large errors - I usually would require double-precision for *any* input type ... (don't know how to say this for complex types !? Are here real and imag treated separately / independently ?) Thanks, Sebastian Haase |
From: Travis O. <oli...@ee...> - 2006-09-19 23:53:35
|
Charles R Harris wrote: > Travis, > > Is this intentional? > > In [77]: arange(5, dtype=int)/0 > Out[77]: array([0, 0, 0, 0, 0]) > > It looks deliberate because all zeros are returned, but it might be > better if it raised an exception. As mentioned before we translate integer division errors into floating-point erros and use the same hardware trapping to trap them if the user requests it. Simulating and "integer-division-by-zero" hardware flag is not trivial as we would have to manage context switching ourselves. So, at least for 1.0, integer and floating-point division by zero are going to be handled the same. -Travis |
From: Travis O. <oli...@ee...> - 2006-09-19 23:16:34
|
Charles R Harris wrote: > > > On 9/18/06, *Bill Baxter* <wb...@gm... > <mailto:wb...@gm...>> wrote: > > On 9/19/06, Charles R Harris <cha...@gm... > <mailto:cha...@gm...>> wrote: > > On 9/18/06, Bill Baxter <wb...@gm... > <mailto:wb...@gm...>> wrote: > > > I find myself often wanting both the max and the argmax of an > array. > > > (And same for the other arg* functions) > > > > You have to do something like > > > a = rand(10,5) > > > imax = a.argmax(axis=0) > > > vmax = a[(imax, range(5))] > > > > > I don't generally like overloading return values, the function > starts to > > lose its definition and becomes a bit baroque where simply > changing a > > keyword value can destroy the viability of the following code. > > Agreed. Seems like the only justification is if you get multiple > results from one calculation but only rarely want the extra values. > It doesn't make sense to always return them, but it's also not worth > making a totally different function. > > > > But I can see the utility of what you want. Hmm, this problem > is not unique to argmax. > > Maybe what we need is a general way to extract values, something > like > > > > extract(a, imax, axis=0) > > > > to go along with all the single axis functions. > > Yes, I think that would be easier to remember. > > It should also work for the axis=None case. > imax = a.argmax(axis=None) > v = extract(a, imax, axis=None) > > > It shouldn't be too difficult to jig something up given all the > example code. I can do that, but I would like more input first. The > questions I have are these. > > 1) Should it be done? > 2) Should it be a method? (functions being somewhat deprecated) > 3) What name should it have? > > I think Travis will have to weigh in on this. IIRC, he felt that the > number of methods was getting out of hand. I can support adding a *function* that does both. It can't be named extract (that already exists). There should be one for all the "arg"-like functions. If somebody doesn't add it before 1.0 final, it can wait for 1.0.1 -Travis |
From: Travis O. <oli...@ee...> - 2006-09-19 23:14:06
|
Charles R Harris wrote: > Travis, > > Is this intentional? > > In [77]: arange(5, dtype=int)/0 > Out[77]: array([0, 0, 0, 0, 0]) > > It looks deliberate because all zeros are returned, but it might be > better if it raised an exception. It is deliberate. Numarray introduced it (the only difference being that by default NumPy has division-by-zero erros turned off). It's tied to the way floating-point division-by zero is handled. There is a valid argument for having a separate integer-division flag so that you can raise exceptions for integer-division but not for floating-point division. I'm open to that change for 1.0rc1 -Travis |
From: Charles R H. <cha...@gm...> - 2006-09-19 23:10:30
|
Travis, Is this intentional? In [77]: arange(5, dtype=int)/0 Out[77]: array([0, 0, 0, 0, 0]) It looks deliberate because all zeros are returned, but it might be better if it raised an exception. Chuck |
From: Martin W. <mar...@gm...> - 2006-09-19 23:07:46
|
On Tuesday 19 September 2006 20:37, Travis Oliphant wrote: > Martin Wiechert wrote: > > Hi list, > > > > Please forgive my ignorance: Is there any difference between npy_intp and > > size_t. Aren't both "ints just big enough to be safe with pointer > > arithmetics even on 64 bit architectures?". > > size_t is unsigned > npy_intp is signed > (!) Thanks again, Travis. > It is basically the same as Py_ssize_t (which is not available until > Python 2.5). Out now! http://www.python.org/2.5 > > -Travis > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your opinions on IT & business topics through brief surveys -- and earn > cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Numpy-discussion mailing list > Num...@li... > https://lists.sourceforge.net/lists/listinfo/numpy-discussion |
From: Charles R H. <cha...@gm...> - 2006-09-19 22:55:59
|
On 9/19/06, Tim Hochberg <tim...@ie...> wrote: > > Travis Oliphant wrote: > > Tim Hochberg wrote: > > > > > >> A. M. Archibald wrote: > >> > >> > >> > >>> On 19/09/06, Tim Hochberg <tim...@ie...> wrote: > >>> > >>> > >>> > >>> > >>> > >>>> I'm not sure where the breakpoint is, but I was seeing failures for > all > >>>> three sort types with N as high as 10000. I suspect that they're all > >>>> broken in the presence of NaNs. I further suspect you'd need some > >>>> punishingly slow n**2 algorithm to be robust in the presence of NaNs. > >>>> > >>>> > >>>> > >>>> > >>> Not at all. Just temporarily make NaNs compare greater than any other > >>> floating-point value and use quicksort (say). You can even do this for > >>> python lists without much trouble. > >>> > >>> > >>> > >>> > >> I misspoke. What I meant here was keeping the behavior that people > think > >> that we already have but don't: NaNs stay in place and everything is > >> sorted around them. And even that's not true, since you could just > >> record where the NaNs are, remove them, sort and put them back. What I > >> was really getting at was, that I'm guessing, and it's just a guess, > >> that (a) none of the fast sorting algorithms do anything sensible > unless > >> special cased and (b) one could come up with a naive n**2 sort that > does > >> do something sensible without special casing (where sensible means > leave > >> the NaNs alone). > >> > >> > >> > >>> That's actually a viable suggestion for numpy's sorting, although it > >>> would be kind of ugly to implement: do a quick any(isnan(A)), and if > >>> not, use the fast stock sorting algorithms; if there is a NaN > >>> somewhere in the array, use a version of the sort that has a tweaked > >>> comparison function so the NaNs wind up at the end and are easy to > >>> trim off. > >>> > >>> But the current situation, silently returning arrays in which the > >>> non-NaNs are unsorted, is really bad. > >>> > >>> > >>> > >>> > >> If your going to do isnan anyway, why not just raise an exception. An > >> array with NaNs in it can't be sorted by any common sense definition of > >> sorting. Any treatment of NaNs is going to be arbitrary, so we might as > >> well make the user specify what they want. "In the face of ambiguity, > >> refuse the temptation to guess" and all that. > >> > >> My favorite solution would be to make sort respect the invalid mode of > >> seterr/geterr. However at the moment it doesn't seem to (in beta4 at > >> least) but neither does add or multiply so those probably need to be > >> looked at again.... > >> > >> > >> > > The geterr/seterr stuff changes how IEEE hardware flags are handled in > > ufuncs. Currently they are not even looked at elsewhere. Are you > > saying that add and multiply don't respect the invalid flag? If not, > > then this might be hardware related. Does the IEEE invalid hardware > > flag get raised on multiplication by nan or only on creation of nan? > It looks like I jumped to conclusions here. I was expecting that with > invalid set to 'raise' that an array that (+-*operations on nans would > raise an exception. It appears that is incorrect -- only operations that > create nans from non-nans will trigger this as you suspected. Similarly, > I expected that over='raise' would cause inf+something to raise an > exception. Again, not true; this raises an exception only when a new inf > is created from non infs. At least on my box. > > Interesting. And a little surprising. > > An interesting tidbit (in an array) inf/inf will raise invalid, but > nan/nan will not, it just returns nan. Fun, fun fun! > > > > All the seterr/geterr stuff relies on the hardware flags. We don't do > > any other checking. > > > > Yeah. And just by itself this isn't going to do anything for sort since > comparing nans will not set any flags, it's just that the result will be > problematic.If one wanted to use these flags for this one would have to > use/abuse the result of geterr to trigger an isnan test at the beginning > of sort and then warn, raise or call as appropriate. > > It would probably also not be unreasonable to punt and document sort as > failing in the presence of nans. For floats we could use something like: lessthan(a,b) := a < b || (a == nan && b != nan) Which would put all the nans at one end and might not add too much overhead. Chuck |
From: Travis O. <oli...@ee...> - 2006-09-19 22:48:06
|
Sebastian Haase wrote: >Hello all, >I just had someone from my lab coming to my desk saying: >"My god - SciPy is really stupid .... >An array with only positive numbers claims to have a negative mean !! "? > > > >I was asking about this before ... the reason was of course that her array was >of dtype int32 and had many large values to cause an overflow (wrap >around) . > >Now that the default for axis is None (for all functions having an axis >argument), >can we please change dtype to default to float64 !? > > The default is float64 now (as long as you are not using numpy.oldnumeric). I suppose more appropriately, we could reduce over float for integer data-types when calculating the mean as well (since a floating point is returned anyway). -Travis |
From: Travis O. <oli...@ee...> - 2006-09-19 22:44:50
|
Sebastian Haase wrote: >OK - I'm really sorry !! >I also get 'u' -- I had a typo there ... > >But what is the complete list of kind values ? > > It's in the array interface specification: http://numpy.scipy.org/array_interface.shtml -Travis |
From: Sebastian H. <ha...@ms...> - 2006-09-19 22:41:24
|
Hello all, I just had someone from my lab coming to my desk saying: "My god - SciPy is really stupid .... An array with only positive numbers claims to have a negative mean !! "? I was asking about this before ... the reason was of course that her array was of dtype int32 and had many large values to cause an overflow (wrap around) . Now that the default for axis is None (for all functions having an axis argument), can we please change dtype to default to float64 !? It is really a very confusing and shocking result to get a negative mean on all positive values. It has been stated here before that numpy should target the "scientist" rather than the programmers ... I would argue that mean() almost always requires the precision of "double" (float64) to produce usable results. Please consider this change before the 1.0 release ... Thanks, Sebastian Haase |
From: Sebastian H. <ha...@ms...> - 2006-09-19 22:31:43
|
OK - I'm really sorry !! I also get 'u' -- I had a typo there ... But what is the complete list of kind values ? -Sebastian On Tuesday 19 September 2006 11:54, Scott Ransom wrote: > > > Can anybody on a 64-bit system confirm? > > > > I'm on 64-bit Debian: > > > > In [11]: arr=N.arange(10,dtype=N.uint) > > > > In [12]: arr.dtype.kind > > Out[12]: 'u' > > > > In [13]: arr.dtype.itemsize > > Out[13]: 4 > > > > In [14]: arr=N.arange(10,dtype=N.long) > > > > In [15]: arr.dtype.kind > > Out[15]: 'i' > > > > In [16]: arr.dtype.itemsize > > Out[16]: 8 > > Ack! That was on the wrong machine (32-bit Debian). Here is the 64-bit > version: > > In [2]: arr=N.arange(10,dtype=N.uint) > > In [3]: arr.dtype.kind > Out[3]: 'u' > > In [4]: arr.dtype.itemsize > Out[4]: 8 > > In [5]: arr=N.arange(10,dtype=N.long) > > In [6]: arr.dtype.kind > Out[6]: 'i' > > In [7]: arr.dtype.itemsize > Out[7]: 8 > > Sorry about that, > > Scott |
From: Tim H. <tim...@ie...> - 2006-09-19 22:12:23
|
Travis Oliphant wrote: > Tim Hochberg wrote: > > >> A. M. Archibald wrote: >> >> >> >>> On 19/09/06, Tim Hochberg <tim...@ie...> wrote: >>> >>> >>> >>> >>> >>>> I'm not sure where the breakpoint is, but I was seeing failures for all >>>> three sort types with N as high as 10000. I suspect that they're all >>>> broken in the presence of NaNs. I further suspect you'd need some >>>> punishingly slow n**2 algorithm to be robust in the presence of NaNs. >>>> >>>> >>>> >>>> >>> Not at all. Just temporarily make NaNs compare greater than any other >>> floating-point value and use quicksort (say). You can even do this for >>> python lists without much trouble. >>> >>> >>> >>> >> I misspoke. What I meant here was keeping the behavior that people think >> that we already have but don't: NaNs stay in place and everything is >> sorted around them. And even that's not true, since you could just >> record where the NaNs are, remove them, sort and put them back. What I >> was really getting at was, that I'm guessing, and it's just a guess, >> that (a) none of the fast sorting algorithms do anything sensible unless >> special cased and (b) one could come up with a naive n**2 sort that does >> do something sensible without special casing (where sensible means leave >> the NaNs alone). >> >> >> >>> That's actually a viable suggestion for numpy's sorting, although it >>> would be kind of ugly to implement: do a quick any(isnan(A)), and if >>> not, use the fast stock sorting algorithms; if there is a NaN >>> somewhere in the array, use a version of the sort that has a tweaked >>> comparison function so the NaNs wind up at the end and are easy to >>> trim off. >>> >>> But the current situation, silently returning arrays in which the >>> non-NaNs are unsorted, is really bad. >>> >>> >>> >>> >> If your going to do isnan anyway, why not just raise an exception. An >> array with NaNs in it can't be sorted by any common sense definition of >> sorting. Any treatment of NaNs is going to be arbitrary, so we might as >> well make the user specify what they want. "In the face of ambiguity, >> refuse the temptation to guess" and all that. >> >> My favorite solution would be to make sort respect the invalid mode of >> seterr/geterr. However at the moment it doesn't seem to (in beta4 at >> least) but neither does add or multiply so those probably need to be >> looked at again.... >> >> >> > The geterr/seterr stuff changes how IEEE hardware flags are handled in > ufuncs. Currently they are not even looked at elsewhere. Are you > saying that add and multiply don't respect the invalid flag? If not, > then this might be hardware related. Does the IEEE invalid hardware > flag get raised on multiplication by nan or only on creation of nan? It looks like I jumped to conclusions here. I was expecting that with invalid set to 'raise' that an array that (+-*operations on nans would raise an exception. It appears that is incorrect -- only operations that create nans from non-nans will trigger this as you suspected. Similarly, I expected that over='raise' would cause inf+something to raise an exception. Again, not true; this raises an exception only when a new inf is created from non infs. At least on my box. Interesting. And a little surprising. An interesting tidbit (in an array) inf/inf will raise invalid, but nan/nan will not, it just returns nan. Fun, fun fun! > > All the seterr/geterr stuff relies on the hardware flags. We don't do > any other checking. > Yeah. And just by itself this isn't going to do anything for sort since comparing nans will not set any flags, it's just that the result will be problematic.If one wanted to use these flags for this one would have to use/abuse the result of geterr to trigger an isnan test at the beginning of sort and then warn, raise or call as appropriate. It would probably also not be unreasonable to punt and document sort as failing in the presence of nans. -tim |
From: Charles R H. <cha...@gm...> - 2006-09-19 22:00:25
|
On 9/19/06, A. M. Archibald <per...@gm...> wrote: > > On 19/09/06, Charles R Harris <cha...@gm...> wrote: > > > If this sort of thing can cause unexpected errors I wonder if it would > be > > worth it to have a global debugging flag that essentially causes isnan > to > > be called before any function applications. > > That sounds very like the IEEE floating-point flags, which would be > extremely useful to have, and which are being wored on, IIRC. Thinking a bit, keeping the values in place isn't easy. Mergesort isn't fixable because values can be moved in front of the nan before it is ever looked at. Nor can it be easily set up to leave all the nans at one end because both a < nan and nan < a return false. Quicksort might be doable with some checks. I mean, what if the selected pivot is a nan? The median of three version used also needs thinking about. Hmm. But I think it is the insertion sort that is messing up the order in mergesort as now nothing will move past the nan even if it has to. That could be fixed, but the nan's would still move around. I think the best thing to do is punt unless the hardware can be set to do something. Chuck |
From: William G. <wil...@ub...> - 2006-09-19 21:54:27
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Andrew Straw wrote: > William Grant wrote: > >> Hi, >> >> I'm currently attempting to get scipy 0.5.1 into Ubuntu, however it >> currently cannot happen as numpy doesn't build with Python 2.5. I note >> that changeset 3109 >> (http://projects.scipy.org/scipy/numpy/changeset/3109#file1) is meant to >> give 2.5 compatibility, but it is those bits which seem to break it. The >> end of the build log with the errors can be found at >> http://people.ubuntu.com.au/~fujitsu/numpy_error.txt. >> >> Has anybody got any ideas on how to fix this? A number of Ubuntu users >> want scipy 0.5.1, but that can't happen while it won't build with Python >> 2.5. >> >> > AFAIK a number of Ubuntu Dapper users are happily using my .debs with > Python 2.4 available at debs.astraw.com. This includes scipy 0.5.2.dev2197 > > Where are you getting that Ubuntu requires Python 2.5? Ubuntu 6.10 (currently in development) includes Python 2.5. William. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (GNU/Linux) iD8DBQFFEGcIrbS+qZ0dQCURAhsLAJwLwZvYabv6IUOWHlmvIaRnQmk69QCeJfOt xFZp00DLb+/PWtbVjUg99gY= =D2zA -----END PGP SIGNATURE----- |
From: Travis O. <oli...@ee...> - 2006-09-19 21:48:29
|
Tim Hochberg wrote: >A. M. Archibald wrote: > > >>On 19/09/06, Tim Hochberg <tim...@ie...> wrote: >> >> >> >> >>>I'm not sure where the breakpoint is, but I was seeing failures for all >>>three sort types with N as high as 10000. I suspect that they're all >>>broken in the presence of NaNs. I further suspect you'd need some >>>punishingly slow n**2 algorithm to be robust in the presence of NaNs. >>> >>> >>> >>Not at all. Just temporarily make NaNs compare greater than any other >>floating-point value and use quicksort (say). You can even do this for >>python lists without much trouble. >> >> >> >I misspoke. What I meant here was keeping the behavior that people think >that we already have but don't: NaNs stay in place and everything is >sorted around them. And even that's not true, since you could just >record where the NaNs are, remove them, sort and put them back. What I >was really getting at was, that I'm guessing, and it's just a guess, >that (a) none of the fast sorting algorithms do anything sensible unless >special cased and (b) one could come up with a naive n**2 sort that does >do something sensible without special casing (where sensible means leave >the NaNs alone). > > >>That's actually a viable suggestion for numpy's sorting, although it >>would be kind of ugly to implement: do a quick any(isnan(A)), and if >>not, use the fast stock sorting algorithms; if there is a NaN >>somewhere in the array, use a version of the sort that has a tweaked >>comparison function so the NaNs wind up at the end and are easy to >>trim off. >> >>But the current situation, silently returning arrays in which the >>non-NaNs are unsorted, is really bad. >> >> >> >If your going to do isnan anyway, why not just raise an exception. An >array with NaNs in it can't be sorted by any common sense definition of >sorting. Any treatment of NaNs is going to be arbitrary, so we might as >well make the user specify what they want. "In the face of ambiguity, >refuse the temptation to guess" and all that. > >My favorite solution would be to make sort respect the invalid mode of >seterr/geterr. However at the moment it doesn't seem to (in beta4 at >least) but neither does add or multiply so those probably need to be >looked at again.... > > The geterr/seterr stuff changes how IEEE hardware flags are handled in ufuncs. Currently they are not even looked at elsewhere. Are you saying that add and multiply don't respect the invalid flag? If not, then this might be hardware related. Does the IEEE invalid hardware flag get raised on multiplication by nan or only on creation of nan? All the seterr/geterr stuff relies on the hardware flags. We don't do any other checking. -Travis |