You can subscribe to this list here.
| 2000 |
Jan
(8) |
Feb
(49) |
Mar
(48) |
Apr
(28) |
May
(37) |
Jun
(28) |
Jul
(16) |
Aug
(16) |
Sep
(44) |
Oct
(61) |
Nov
(31) |
Dec
(24) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2001 |
Jan
(56) |
Feb
(54) |
Mar
(41) |
Apr
(71) |
May
(48) |
Jun
(32) |
Jul
(53) |
Aug
(91) |
Sep
(56) |
Oct
(33) |
Nov
(81) |
Dec
(54) |
| 2002 |
Jan
(72) |
Feb
(37) |
Mar
(126) |
Apr
(62) |
May
(34) |
Jun
(124) |
Jul
(36) |
Aug
(34) |
Sep
(60) |
Oct
(37) |
Nov
(23) |
Dec
(104) |
| 2003 |
Jan
(110) |
Feb
(73) |
Mar
(42) |
Apr
(8) |
May
(76) |
Jun
(14) |
Jul
(52) |
Aug
(26) |
Sep
(108) |
Oct
(82) |
Nov
(89) |
Dec
(94) |
| 2004 |
Jan
(117) |
Feb
(86) |
Mar
(75) |
Apr
(55) |
May
(75) |
Jun
(160) |
Jul
(152) |
Aug
(86) |
Sep
(75) |
Oct
(134) |
Nov
(62) |
Dec
(60) |
| 2005 |
Jan
(187) |
Feb
(318) |
Mar
(296) |
Apr
(205) |
May
(84) |
Jun
(63) |
Jul
(122) |
Aug
(59) |
Sep
(66) |
Oct
(148) |
Nov
(120) |
Dec
(70) |
| 2006 |
Jan
(460) |
Feb
(683) |
Mar
(589) |
Apr
(559) |
May
(445) |
Jun
(712) |
Jul
(815) |
Aug
(663) |
Sep
(559) |
Oct
(930) |
Nov
(373) |
Dec
|
|
From: Travis O. <oli...@ie...> - 2006-01-19 00:09:38
|
Sven Schreiber wrote: > Hi, > I've spent a couple of weeks with scipy/numpy and the old-to-new > transition; now that the transition is over (?) but some confusion is > remaining (on my side) I feel the need to ask a basic question about > matlab compatibility in terms of matrix (linear algebra) programming. > > Take "eye" and "identity" for example; given that "eye" supposedly > exists to facilitate transiton from matlab to numpy/scipy (correct?), > I expected eye to be/return a matrix. Historical is the only reason. Numeric always returned an array for eye not a matrix. We could return a matrix without difficulty especially if we put an eye --> identity transition in convertcode.py -Travis |
|
From: Russell E. O. <ro...@ce...> - 2006-01-19 00:01:55
|
We're getting numeric data from a (MySQL) database. We'd like to use numarray or NumPy on the resulting data, but some values may be None. Is there a fast, efficient way to replace None with NaN? I'd hate to use a list comprehension on each data tuple before turning it into an array, but I haven't thought of anything else. numarray.array and numarray.where are both intolerant of None in the input data. -- Russell |
|
From: Fernando P. <Fer...@co...> - 2006-01-18 23:21:47
|
Perry Greenfield wrote:
> It's not a new idea. I raised it some time ago and I don't think it was
I wasn't claiming authorship, sorry if it sounded like that :) In fact, I
remember specifically talking with you about this at scipy'03, in the context
of small array performance issues for the at-the-time-nascent numarray, and
I'm sure similar things have been done many times before. I've had it
floating in my head since I first saw blitz, back in 2001, and blitz probably
got it from... There's nothing really new under the sun ;)
> new then either. I have to believe that if you allowed only Float64
> (and perhaps a complex variant) and used other restrictions then it
> would be much faster for small arrays. One would think it would be much
> easier to implement than Numeric/numarray/numpy... I've always thought
> that those looking for really fast small array performance would be
> better served by something like this. But you'd really have to fight
> off feature creep. ("This almost meets my needs. If it could only do
> xxx")
Couldn't that last issue be well dealt with by the fact that today's numpy is
fairly subclassing-friendly? (which, if I remember correctly, wasn't quite the
case with at least old Numeric).
Cheers,
f
|
|
From: Travis O. <oli...@ee...> - 2006-01-18 22:58:18
|
Fernando Perez wrote: > David M. Cooke wrote: > >> I've done a little bit of work along these lines. I have a module I >> call vector3 [*] which has 2- and 3-dimensional immutable vectors, >> using either ints or doubles. It's as fast as I could make it, while >> keeping it all written in Pyrex. I find it very convenient for >> anything vector-related. Konrad Hinsen has something similiar in the >> development version of his ScientificPython package. >> >> [*] http://arbutus.mcmaster.ca/dmc/software/vector3.html >> >> Also, I've also done some playing around with a n-dimensional vector >> type (restricted to doubles). My best attempts make it ~4-5x faster >> than numpy (and 2x faster than Numeric) for vectors of dimension 10 >> on simple ops like + and *, 2x faster than numpy for dimension 1000, >> and approaching 1x as you make the vectors larger. Indexing is about >> 3x faster than numpy, and 1.4x faster than Numeric. So that gives I >> think some idea of the maximum speed-up possible. >> >> I think the speedups mostly come from the utter lack of any >> polymorphism: it handles vectors of doubles only, and only as >> contiguous vectors (no strides). > > > This is excellent, thanks for the pointer. I can see uses for vectors > (still 1-d, no strides, etc) with more than 3 elements, and perhaps > fixed-size (no reshaping, no striding) 2-d arrays (matrices), but this > looks like a good starting point. Sandbox material? > With the array interface, these kinds of objects can play very nicely with full ndarray's as well... -Travis |
|
From: Fernando P. <Fer...@co...> - 2006-01-18 22:51:43
|
David M. Cooke wrote: > I've done a little bit of work along these lines. I have a module I > call vector3 [*] which has 2- and 3-dimensional immutable vectors, > using either ints or doubles. It's as fast as I could make it, while > keeping it all written in Pyrex. I find it very convenient for > anything vector-related. Konrad Hinsen has something similiar in the > development version of his ScientificPython package. > > [*] http://arbutus.mcmaster.ca/dmc/software/vector3.html > > Also, I've also done some playing around with a n-dimensional vector > type (restricted to doubles). My best attempts make it ~4-5x faster > than numpy (and 2x faster than Numeric) for vectors of dimension 10 > on simple ops like + and *, 2x faster than numpy for dimension 1000, > and approaching 1x as you make the vectors larger. Indexing is about > 3x faster than numpy, and 1.4x faster than Numeric. So that gives I > think some idea of the maximum speed-up possible. > > I think the speedups mostly come from the utter lack of any > polymorphism: it handles vectors of doubles only, and only as > contiguous vectors (no strides). This is excellent, thanks for the pointer. I can see uses for vectors (still 1-d, no strides, etc) with more than 3 elements, and perhaps fixed-size (no reshaping, no striding) 2-d arrays (matrices), but this looks like a good starting point. Sandbox material? Cheers, f |
|
From: <co...@ph...> - 2006-01-18 22:41:10
|
Fernando Perez <Fer...@co...> writes: > Travis Oliphant wrote: >> Andrew Straw wrote: >> >>> Here's an idea Fernando and I have briefly talked about off-list, >>> but which perhaps bears talking about here: Is there speed to be >>> gained by an alternative, very simple, very optimized ndarray >>> constructor? The idea would be a special-case constructor with very >>> limited functionality designed purely for speed. It wouldn't >>> support (m)any of the fantastic things Travis has done, but would >>> be useful only in specialized use cases, such as creating indices. >> The general purpose constructor is >> PyArray_NewFromDescr(...) >> I suspect this could be special cased for certain circumstances and >> the special-case called occasionally. Their are checks on the >> dimensions that could be avoided in certain circumstances (like when >> we are getting the dimensions from another arrayobject already...) >> We could also inline the __array_from_strides code... >> Besides that, I'm not sure what else to optimize... > > Just to give some context: this came to my mind inspired by Blitz++'s > TinyVector and TinyMatrix objects. In Blitz, arrays have compile-time > rank, but run-time size in all dimensions. Since this introduces some > overhead, Blitz offers also the Tiny* classes, which are compile-time > fixed _both_ in rank and in size. This allows a number of > optimizations to be made on them, at the cost of some flexibility > lost. Some info on these guys: > > http://www.oonumerics.org/blitz/manual/blitz07.html > > What Andrew and I discussed was the idea of writing some object which > would only support the most basic operations: element-wise arithmetic, > slicing, linear algebra calls on them (matrix-matrix, matrix-vector > and vector operations), and little else. I'd be OK losing fancy > indexing, byteswapping, memory-mapping, reshaping, and anything else > which costs either: > > 1. initialization-time CPU cycles > 2. memory footprint > 3. runtime element access and arithmetic. > > Such objects could be very useful in many contexts. I'd even like an > immutable version, so they could be used as dictionary keys without > having to make a tuple out of them. This would allow algorithms which > use small arrays as multidimensional indices in sparse tree structures > to be used without the hoops one must jump through today, and with > higher performance. I've done a little bit of work along these lines. I have a module I call vector3 [*] which has 2- and 3-dimensional immutable vectors, using either ints or doubles. It's as fast as I could make it, while keeping it all written in Pyrex. I find it very convenient for anything vector-related. Konrad Hinsen has something similiar in the development version of his ScientificPython package. [*] http://arbutus.mcmaster.ca/dmc/software/vector3.html Also, I've also done some playing around with a n-dimensional vector type (restricted to doubles). My best attempts make it ~4-5x faster than numpy (and 2x faster than Numeric) for vectors of dimension 10 on simple ops like + and *, 2x faster than numpy for dimension 1000, and approaching 1x as you make the vectors larger. Indexing is about 3x faster than numpy, and 1.4x faster than Numeric. So that gives I think some idea of the maximum speed-up possible. I think the speedups mostly come from the utter lack of any polymorphism: it handles vectors of doubles only, and only as contiguous vectors (no strides). -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |co...@ph... |
|
From: Perry G. <pe...@st...> - 2006-01-18 22:37:07
|
It's not a new idea. I raised it some time ago and I don't think it was
new then either. I have to believe that if you allowed only Float64
(and perhaps a complex variant) and used other restrictions then it
would be much faster for small arrays. One would think it would be much
easier to implement than Numeric/numarray/numpy... I've always thought
that those looking for really fast small array performance would be
better served by something like this. But you'd really have to fight
off feature creep. ("This almost meets my needs. If it could only do
xxx")
Perry
On Jan 18, 2006, at 5:00 PM, Fernando Perez wrote:
> Travis Oliphant wrote:
>> Andrew Straw wrote:
>>> Here's an idea Fernando and I have briefly talked about off-list,
>>> but which perhaps bears talking about here: Is there speed to be
>>> gained by an alternative, very simple, very optimized ndarray
>>> constructor? The idea would be a special-case constructor with very
>>> limited functionality designed purely for speed. It wouldn't support
>>> (m)any of the fantastic things Travis has done, but would be useful
>>> only in specialized use cases, such as creating indices.
>> The general purpose constructor is
>> PyArray_NewFromDescr(...)
>> I suspect this could be special cased for certain circumstances and
>> the special-case called occasionally. Their are checks on the
>> dimensions that could be avoided in certain circumstances (like when
>> we are getting the dimensions from another arrayobject already...)
>> We could also inline the __array_from_strides code...
>> Besides that, I'm not sure what else to optimize...
>
> Just to give some context: this came to my mind inspired by Blitz++'s
> TinyVector and TinyMatrix objects. In Blitz, arrays have compile-time
> rank, but run-time size in all dimensions. Since this introduces some
> overhead, Blitz offers also the Tiny* classes, which are compile-time
> fixed _both_ in rank and in size. This allows a number of
> optimizations to be made on them, at the cost of some flexibility
> lost. Some info on these guys:
>
> http://www.oonumerics.org/blitz/manual/blitz07.html
>
> What Andrew and I discussed was the idea of writing some object which
> would only support the most basic operations: element-wise arithmetic,
> slicing, linear algebra calls on them (matrix-matrix, matrix-vector
> and vector operations), and little else. I'd be OK losing fancy
> indexing, byteswapping, memory-mapping, reshaping, and anything else
> which costs either:
>
> 1. initialization-time CPU cycles
> 2. memory footprint
> 3. runtime element access and arithmetic.
>
> Such objects could be very useful in many contexts. I'd even like an
> immutable version, so they could be used as dictionary keys without
> having to make a tuple out of them. This would allow algorithms which
> use small arrays as multidimensional indices in sparse tree structures
> to be used without the hoops one must jump through today, and with
> higher performance.
>
> I wonder if giving up reshaping would allow the indexing code to be
> faster, as specialized versions could be hard-coded for each rank,
> with only say ranks 1-4 offered for this kind of object (I know we
> recently had a discussion about large ranks, but this object would be
> geared towards pure performance, and certainly working in 32
> dimensions is a 'flexibility-driven' case, where the generic objects
> are called for).
>
> Note that I had never mentioned this in public, because I think it may
> be a slight specialization that isn't needed early on, and currently
> the library's priority was to get off the ground. But having such
> objects could be very handy, and now that the C API is starting to
> stabilize, maybe someone can play with this as a side project. Once
> they prove their worth, these beasts could be folded as part of the
> official distribution.
>
> I am not really qualified to judge whether there are enough areas for
> optimization where the sacrifices indicated could really pay off, both
> in terms of memory and performance.
>
> Cheers,
>
> f
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc. Do you grep through log
> files
> for problems? Stop! Download the new AJAX search engine that makes
> searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
> http://sel.as-us.falkag.net/sel?
> cmd=lnk&kid=103432&bid=230486&dat=121642
> _______________________________________________
> Numpy-discussion mailing list
> Num...@li...
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
|
|
From: Fernando P. <Fer...@co...> - 2006-01-18 22:28:59
|
Pearu Peterson wrote: >>I understood this as 'scipy.fftpack.basic.fft overwrote >>numpy.dft.fftpack.fft'. Does this then not affect the numpy namespace at >>all? > > > No! See above. Ah, OK. Thanks for the clarification, I misunderstood the message. >>I also would like to propose that, rather than using an environment variable, >>pkgload() takes a 'verbose=' keyword (or 'quiet='). I think it's much >>cleaner to say >> >>pkgload(quiet=1) or pkgload(verbose=0) >> >>than relying on users configuring env. variables for something like this. > > > > pkgload has already verbose kwd argumend. See ?pkgload for more > information. What is this '?foo' syntax you speak of? Certainly not python... ;-) Sorry for not checking, I just don't have new-scipy on my work machine, only at home. Cheers, f |
|
From: Pearu P. <pe...@sc...> - 2006-01-18 22:14:18
|
On Wed, 18 Jan 2006, Fernando Perez wrote: > Mmh, I think I'm confused then: it seemed to me that pkgload() WAS > overwriting numpy names, from the messages which the environment variable > controls. Is that not true? Here's a recent thread: > > http://aspn.activestate.com/ASPN/Mail/Message/scipy-dev/2974044 pkgload() is overwriting numpy names in *scipy namespace*. To be explicit, the following is going on in scipy/__init__.py when pkgload is called, pseudocode follows: from numpy import * # imports fft old_fft = fft from scipy.fftpack import fft print 'Overwriting',old_fft,'with',fft del old_fft And nothing else! So, scipy.fft, that was numpy.fft, is set to scipy.fftpack.fft. numpy.fft remains the same. ... > I understood this as 'scipy.fftpack.basic.fft overwrote > numpy.dft.fftpack.fft'. Does this then not affect the numpy namespace at > all? No! See above. > I also would like to propose that, rather than using an environment variable, > pkgload() takes a 'verbose=' keyword (or 'quiet='). I think it's much > cleaner to say > > pkgload(quiet=1) or pkgload(verbose=0) > > than relying on users configuring env. variables for something like this. pkgload has already verbose kwd argumend. See ?pkgload for more information. Pearu |
|
From: Fernando P. <Fer...@co...> - 2006-01-18 22:00:14
|
Travis Oliphant wrote: > Andrew Straw wrote: > > >>Here's an idea Fernando and I have briefly talked about off-list, but >>which perhaps bears talking about here: Is there speed to be gained by >>an alternative, very simple, very optimized ndarray constructor? The >>idea would be a special-case constructor with very limited >>functionality designed purely for speed. It wouldn't support (m)any of >>the fantastic things Travis has done, but would be useful only in >>specialized use cases, such as creating indices. > > > The general purpose constructor is > > PyArray_NewFromDescr(...) > > I suspect this could be special cased for certain circumstances and the > special-case called occasionally. Their are checks on the dimensions > that could be avoided in certain circumstances (like when we are getting > the dimensions from another arrayobject already...) > > We could also inline the __array_from_strides code... > > Besides that, I'm not sure what else to optimize... Just to give some context: this came to my mind inspired by Blitz++'s TinyVector and TinyMatrix objects. In Blitz, arrays have compile-time rank, but run-time size in all dimensions. Since this introduces some overhead, Blitz offers also the Tiny* classes, which are compile-time fixed _both_ in rank and in size. This allows a number of optimizations to be made on them, at the cost of some flexibility lost. Some info on these guys: http://www.oonumerics.org/blitz/manual/blitz07.html What Andrew and I discussed was the idea of writing some object which would only support the most basic operations: element-wise arithmetic, slicing, linear algebra calls on them (matrix-matrix, matrix-vector and vector operations), and little else. I'd be OK losing fancy indexing, byteswapping, memory-mapping, reshaping, and anything else which costs either: 1. initialization-time CPU cycles 2. memory footprint 3. runtime element access and arithmetic. Such objects could be very useful in many contexts. I'd even like an immutable version, so they could be used as dictionary keys without having to make a tuple out of them. This would allow algorithms which use small arrays as multidimensional indices in sparse tree structures to be used without the hoops one must jump through today, and with higher performance. I wonder if giving up reshaping would allow the indexing code to be faster, as specialized versions could be hard-coded for each rank, with only say ranks 1-4 offered for this kind of object (I know we recently had a discussion about large ranks, but this object would be geared towards pure performance, and certainly working in 32 dimensions is a 'flexibility-driven' case, where the generic objects are called for). Note that I had never mentioned this in public, because I think it may be a slight specialization that isn't needed early on, and currently the library's priority was to get off the ground. But having such objects could be very handy, and now that the C API is starting to stabilize, maybe someone can play with this as a side project. Once they prove their worth, these beasts could be folded as part of the official distribution. I am not really qualified to judge whether there are enough areas for optimization where the sacrifices indicated could really pay off, both in terms of memory and performance. Cheers, f |
|
From: Robert K. <rob...@gm...> - 2006-01-18 20:08:48
|
Ed Schofield wrote:
> Unless I'm doing something very stupid, there seem to be multiple
> sources of evil here. First, numpy's linalg package is available from
> the scipy namespace, which seems like a recipe for Ed's madness, since
> he can't find his pinv2() function. Second, scipy.pkgload('linalg')
> silently fails to make any visible difference. This is probably a
> simple bug.
It's certainly not intended behavior. The process by which pkgload() determines
where it is being called from is messing up. pkgload lives in
numpy._import_tools, but it is also exposed in the scipy namespace, too.
Please file a bug report and assign it to Pearu.
--
Robert Kern
rob...@gm...
"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter
|
|
From: Sasha <nd...@ma...> - 2006-01-18 20:08:42
|
As of svn version 1931, numpy bool_ values are singletons: >>> from numpy import * >>> bool_(0) is array([True,False])[1] This change makes bool_ values behavior more similar to python bools and will allow much faster implementation of scalar boolean algebra.=20 In order to allow other modules to take advantage of this property, I would like to propose several additions to python and c interfaces. At the Python level: define True_ and False_ constants. At the C-API level: PyArrayScalar_True PyArrayScalar_False PyArrayScalar_BoolFromLong PyArrayScalar_RETURN_TRUE PyArrayScalar_RETURN_FALSE PyArrayScalar_RETURN_BOOL_FROM_LONG Any objections? -- sasha |
|
From: Robert K. <rob...@gm...> - 2006-01-18 20:06:08
|
Fernando Perez wrote: > Travis Oliphant wrote: > >>>3) pkgload() exists to support the loading of subpackages. It does not reach >>>into numpy.dft or numpy.linalg at all. It is not relevant to this issue. >>> >>>4) There are some places in numpy that use numpy.dual. >>> >>>I think we can address all of your concerns by changing #4. >> >>This is an accurate assessment. However, I do not want to eliminate >>number 4 as I've mentioned before. I think there is a place for having >>functions that can be over-written with better versions. I agree that >>it could be implemented better, however, with some kind of register >>function instead of automatically looking in scipy... > > Mmh, I think I'm confused then: So am I, now! > it seemed to me that pkgload() WAS overwriting > numpy names, from the messages which the environment variable controls. Is > that not true? Here's a recent thread: > > http://aspn.activestate.com/ASPN/Mail/Message/scipy-dev/2974044 > > where this was shown: > > In [3]: import scipy > Overwriting fft=<function fft at 0x2000000001474668> from > scipy.fftpack.basic (was <function fft at 0x2000000001394a28> from > numpy.dft.fftpack) > Overwriting ifft=<function ifft at 0x20000000014746e0> from > scipy.fftpack.basic (was <function inverse_fft at 0x2000000001394aa0> from > numpy.dft.fftpack) [~]$ python Python 2.4.1 (#2, Mar 31 2005, 00:05:10) [GCC 3.3 20030304 (Apple Computer, Inc. build 1666)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import scipy scipy.>>> scipy.pkgload(verbose=2) Imports to 'scipy' namespace ---------------------------- __all__.append('io') import lib -> success Overwriting lib=<module 'scipy.lib' from '/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/site-packages/scipy-0.4.4.1307-py2.4-macosx-10.4-ppc.egg/scipy/lib/__init__.pyc'> from /Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/site-packages/scipy-0.4.4.1307-py2.4-macosx-10.4-ppc.egg/scipy/lib/__init__.pyc (was <module 'numpy.lib' from '/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/site-packages/numpy-0.9.4.1849-py2.4-macosx-10.4-ppc.egg/numpy/lib/__init__.pyc'> from /Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/site-packages/numpy-0.9.4.1849-py2.4-macosx-10.4-ppc.egg/numpy/lib/__init__.pyc) __all__.append('signal') __all__.append('interpolate') __all__.append('lib.lapack') import cluster -> success __all__.append('montecarlo') __all__.append('fftpack') __all__.append('sparse') __all__.append('integrate') __all__.append('optimize') __all__.append('special') import lib.blas -> success __all__.append('linalg') __all__.append('stats') >>> import numpy >>> numpy.lib <module 'numpy.lib' from '/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/site-packages/numpy-0.9.4.1849-py2.4-macosx-10.4-ppc.egg/numpy/lib/__init__.pyc'> >>> scipy.lib <module 'scipy.lib' from '/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/site-packages/scipy-0.4.4.1307-py2.4-macosx-10.4-ppc.egg/scipy/lib/__init__.pyc'> [Ignore the SVN version numbers. They are faked. I was using checkouts from last night.] > I understood this as 'scipy.fftpack.basic.fft overwrote > numpy.dft.fftpack.fft'. Does this then not affect the numpy namespace at all? If it does, then I agree with you that this should change. > I also would like to propose that, rather than using an environment variable, > pkgload() takes a 'verbose=' keyword (or 'quiet='). I think it's much cleaner > to say > > pkgload(quiet=1) or pkgload(verbose=0) > > than relying on users configuring env. variables for something like this. It does take a verbose keyword argument. -- Robert Kern rob...@gm... "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter |
|
From: Travis O. <oli...@ee...> - 2006-01-18 19:55:22
|
>> > Here's an idea Fernando and I have briefly talked about off-list, but > which perhaps bears talking about here: Is there speed to be gained by > an alternative, very simple, very optimized ndarray constructor? The > idea would be a special-case constructor with very limited > functionality designed purely for speed. It wouldn't support (m)any of > the fantastic things Travis has done, but would be useful only in > specialized use cases, such as creating indices. > > I'm not familiar enough with what the normal constructor does to know > if we could implement something, (in C, perhaps) that would do nothing > but create a simple, contiguous array significantly faster than what > is currently done. Or does the current constructor create a new > instance about as fast as possible? I know Travis has optimized it, > but it's a general purpose constructor, and I'm thinking these extra > features may take some extra CPU cycles. I think the indexing code will be slower because it is more sophisticated than Numeric's. Basically, it has to check for fancy indexing before defaulting to the old way. I see this as more of a slow-down than array creation. It might be possible to improve it --- more eyeballs are always helpful. But, I'm not sure how at this point. -Travis |
|
From: Travis O. <oli...@ee...> - 2006-01-18 19:48:11
|
Andrew Straw wrote: > Here's an idea Fernando and I have briefly talked about off-list, but > which perhaps bears talking about here: Is there speed to be gained by > an alternative, very simple, very optimized ndarray constructor? The > idea would be a special-case constructor with very limited > functionality designed purely for speed. It wouldn't support (m)any of > the fantastic things Travis has done, but would be useful only in > specialized use cases, such as creating indices. The general purpose constructor is PyArray_NewFromDescr(...) I suspect this could be special cased for certain circumstances and the special-case called occasionally. Their are checks on the dimensions that could be avoided in certain circumstances (like when we are getting the dimensions from another arrayobject already...) We could also inline the __array_from_strides code... Besides that, I'm not sure what else to optimize... -Travis |
|
From: Andrew S. <str...@as...> - 2006-01-18 19:35:32
|
Paulo J. S. Silva wrote: >Em Qua, 2006-01-18 =C3=A0s 11:15 -0700, Travis Oliphant escreveu: > > =20 > >>Will you run these again with the latest SVN version of numpy. I=20 >>couldn't figure out why a copy was being made on transpose (because it=20 >>shouldn't have been). Then, I dug deep into the PyArray_FromAny code=20 >>and found bad logic in when a copy was needed that was causing an=20 >>inappropriate copy. >> >>I fixed that and now wonder how things will change. Because presumably= ,=20 >>the dotblas function should handle the situation now... >> >> =20 >> > >Good work Travis :-) > >Tests x.T*y x*y.T A*x A*B A.T*x half 2in2 > >Dimension: 5 >Array 0.9000 0.2400 0.2000 0.2600 0.7100 0.9400 1.1600 >Matrix 4.7800 1.5700 0.6200 0.7600 1.0600 3.0400 4.6500 >NumArr 3.2900 0.7400 0.6800 0.7800 8.4800 7.4200 11.6600 >Numeri 1.3300 0.3900 0.3100 0.4200 0.7900 0.6800 0.7600 >Matlab 1.88 0.44 0.41 0.35 0.37 1.20 0.98 > >Dimension: 50 >Array 9.0000 2.1400 0.5500 18.9500 1.4100 4.2700 4.4500 >Matrix 48.7400 3.9200 1.0100 20.2000 1.8000 6.5000 8.1900 >NumArr 32.3900 2.6800 1.0000 18.9700 13.0300 8.6300 13.0700 >Numeri 13.1000 2.2600 0.6500 18.2700 10.1500 1.0400 3.2600 >Matlab 16.98 1.94 1.07 17.86 0.73 1.57 1.77 > >Dimension: 500 >Array 1.1400 9.2300 2.0100 168.2700 2.1800 4.0200 4.2900 >Matrix 5.0300 9.3500 2.1500 167.5300 2.1700 4.1100 4.4200 >NumArr 3.4400 9.1000 2.1000 168.7100 21.8400 4.3900 5.8900 >Numeri 1.5800 9.2700 2.0700 167.5600 20.0500 3.4000 4.6800 >Matlab 2.09 6.07 2.17 169.45 2.10 2.56 3.06 > >Note the 10-fold speed-up for higher dimensions :-) > >It looks like that now that numpy only looses to matlab in small >dimensions. Probably, the problem is the creation of the object to >represent the transposed object. Probably Matlab creation of objects is >very lightweight (they only have matrices objects to deal with). >Probably this phenomenon explains the behavior for the indexing >operations too. > >Paulo > > > =20 > Here's an idea Fernando and I have briefly talked about off-list, but=20 which perhaps bears talking about here: Is there speed to be gained by=20 an alternative, very simple, very optimized ndarray constructor? The=20 idea would be a special-case constructor with very limited functionality=20 designed purely for speed. It wouldn't support (m)any of the fantastic=20 things Travis has done, but would be useful only in specialized use=20 cases, such as creating indices. I'm not familiar enough with what the normal constructor does to know if=20 we could implement something, (in C, perhaps) that would do nothing but=20 create a simple, contiguous array significantly faster than what is=20 currently done. Or does the current constructor create a new instance=20 about as fast as possible? I know Travis has optimized it, but it's a=20 general purpose constructor, and I'm thinking these extra features may=20 take some extra CPU cycles. Cheers! Andrew |
|
From: Sasha <nd...@ma...> - 2006-01-18 19:28:27
|
Oops, bool cannot be subclassed in python, but bool is a subclass of int, so it makes sense to derive from int_, only which int? In python bool inherits all number methods from int and overrides only and, or, and xor. Maybe bool_ can do the same ... -- sasha On 1/18/06, Sasha <nd...@ma...> wrote: > >>> from numpy import * > >>> isinstance(bool_(True), bool) > False > |
|
From: Sasha <nd...@ma...> - 2006-01-18 19:21:02
|
>>> from numpy import * >>> isinstance(bool_(True), bool) False |
|
From: eric j. <er...@en...> - 2006-01-18 19:03:38
|
In scipy, we talked about having a benchmark_xyz methods that could be added to the test classes. These weren't run during unit tests (scipy.test()) but would could be run using scipy.benchmark() or something like that. I can't remember if Pearu got the machinery in place, but it seems to me it wouldn't be so hard. You would have to add guards around benchmarks that compare to 3rd party tools, obviously, so that people without them could still run the benchmark suite. Adding a regression process that checks against results from previous builds to flag potential problems when a slow down is noted would be good -- that is more work. Anyway, something flagging these "tests" as benchmarks instead of standard correctness tests seems like a good idea. eric Fernando Perez wrote: > Matthew Brett wrote: >> Hi, >> >> >>> Travis asked me to benchmark numpy versus matlab in some basic linear >>> algebra operations. Here are the resuts for matrices/vectors of >>> dimensions 5, 50 and 500: >> >> >> This is really excellent, thanks. Is there any chance we can make >> these and other benchmarks part of the pre-release testing? Apart >> from testing for bottlenecks, if we could show that we were in the >> ballpark of matlab for speed for each release, this would be very >> helpful for those us trying to persuade our matlab colleagues to >> switch. > > +1 > > It might not be part of test(1), but at test(10) these tests could be > automatically run, activating each line as each package is found (or > not) in the system (Numeric, numarray, matlab). This way, people who > have matlab on their box can even get a real-time check of how their > fresh-off-svn numpy fares against matlab that day. > > cheers, > > f > > > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through log > files > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Num...@li... > https://lists.sourceforge.net/lists/listinfo/numpy-discussion |
|
From: Paulo J. S. S. <pjs...@im...> - 2006-01-18 19:02:27
|
Em Qua, 2006-01-18 às 11:15 -0700, Travis Oliphant escreveu: > Will you run these again with the latest SVN version of numpy. I > couldn't figure out why a copy was being made on transpose (because it > shouldn't have been). Then, I dug deep into the PyArray_FromAny code > and found bad logic in when a copy was needed that was causing an > inappropriate copy. > > I fixed that and now wonder how things will change. Because presumably, > the dotblas function should handle the situation now... > Good work Travis :-) Tests x.T*y x*y.T A*x A*B A.T*x half 2in2 Dimension: 5 Array 0.9000 0.2400 0.2000 0.2600 0.7100 0.9400 1.1600 Matrix 4.7800 1.5700 0.6200 0.7600 1.0600 3.0400 4.6500 NumArr 3.2900 0.7400 0.6800 0.7800 8.4800 7.4200 11.6600 Numeri 1.3300 0.3900 0.3100 0.4200 0.7900 0.6800 0.7600 Matlab 1.88 0.44 0.41 0.35 0.37 1.20 0.98 Dimension: 50 Array 9.0000 2.1400 0.5500 18.9500 1.4100 4.2700 4.4500 Matrix 48.7400 3.9200 1.0100 20.2000 1.8000 6.5000 8.1900 NumArr 32.3900 2.6800 1.0000 18.9700 13.0300 8.6300 13.0700 Numeri 13.1000 2.2600 0.6500 18.2700 10.1500 1.0400 3.2600 Matlab 16.98 1.94 1.07 17.86 0.73 1.57 1.77 Dimension: 500 Array 1.1400 9.2300 2.0100 168.2700 2.1800 4.0200 4.2900 Matrix 5.0300 9.3500 2.1500 167.5300 2.1700 4.1100 4.4200 NumArr 3.4400 9.1000 2.1000 168.7100 21.8400 4.3900 5.8900 Numeri 1.5800 9.2700 2.0700 167.5600 20.0500 3.4000 4.6800 Matlab 2.09 6.07 2.17 169.45 2.10 2.56 3.06 Note the 10-fold speed-up for higher dimensions :-) It looks like that now that numpy only looses to matlab in small dimensions. Probably, the problem is the creation of the object to represent the transposed object. Probably Matlab creation of objects is very lightweight (they only have matrices objects to deal with). Probably this phenomenon explains the behavior for the indexing operations too. Paulo |
|
From: Fernando P. <Fer...@co...> - 2006-01-18 19:02:19
|
Travis Oliphant wrote: >>3) pkgload() exists to support the loading of subpackages. It does not reach >>into numpy.dft or numpy.linalg at all. It is not relevant to this issue. >> >>4) There are some places in numpy that use numpy.dual. >> >>I think we can address all of your concerns by changing #4. >> >> >> > > > This is an accurate assessment. However, I do not want to eliminate > number 4 as I've mentioned before. I think there is a place for having > functions that can be over-written with better versions. I agree that > it could be implemented better, however, with some kind of register > function instead of automatically looking in scipy... Mmh, I think I'm confused then: it seemed to me that pkgload() WAS overwriting numpy names, from the messages which the environment variable controls. Is that not true? Here's a recent thread: http://aspn.activestate.com/ASPN/Mail/Message/scipy-dev/2974044 where this was shown: In [3]: import scipy Overwriting fft=<function fft at 0x2000000001474668> from scipy.fftpack.basic (was <function fft at 0x2000000001394a28> from numpy.dft.fftpack) Overwriting ifft=<function ifft at 0x20000000014746e0> from scipy.fftpack.basic (was <function inverse_fft at 0x2000000001394aa0> from numpy.dft.fftpack) I understood this as 'scipy.fftpack.basic.fft overwrote numpy.dft.fftpack.fft'. Does this then not affect the numpy namespace at all? I also would like to propose that, rather than using an environment variable, pkgload() takes a 'verbose=' keyword (or 'quiet='). I think it's much cleaner to say pkgload(quiet=1) or pkgload(verbose=0) than relying on users configuring env. variables for something like this. Cheers, f |
|
From: Ed S. <sch...@ft...> - 2006-01-18 19:00:47
|
Robert Kern wrote:
>Fernando Perez wrote:
>
>
>
>>Anyway, I won't belabor this point any longer. I'd just like to hear from
>>others their opinion on this matter, and if a decision is made to go ahead
>>with the overwriting, at least I think the rationale for it should be well
>>justified (and be more than "it's convenient"). The fact that over the last
>>few weeks we've had several surprised questions on this is, to me, an
>>indicator that I'm not the one uncomfortable with this decision.
>>
>>
>
>I haven't followed this discussion in great detail, but I believe the current
>situation is this:
>
>1) If you use numpy.dft and numpy.linalg directly, you will always get the numpy
>versions no matter what else is installed.
>
>2) If you want to optionally use optimized scipy versions if they are available
>and regular numpy versions otherwise, then you use the functions exposed in
>numpy.dual. You do so at your own risk.
>
>3) pkgload() exists to support the loading of subpackages. It does not reach
>into numpy.dft or numpy.linalg at all. It is not relevant to this issue.
>
>4) There are some places in numpy that use numpy.dual.
>
>I think we can address all of your concerns by changing #4.
>
>
I've been battling for half an hour trying to import scipy.linalg. Is
this one of Fernando's scary predictions coming true? I get:
>>> from scipy import linalg
>>> dir(linalg)
['Heigenvalues', 'Heigenvectors', 'LinAlgError', 'ScipyTest',
'__builtins__', '__doc__', '__file__', '__name__', '__path__',
'cholesky', 'cholesky_decomposition', 'det', 'determinant', 'eig',
'eigenvalues', 'eigenvectors', 'eigh', 'eigvals', 'eigvalsh',
'generalized_inverse', 'inv', 'inverse', 'lapack_lite', 'linalg',
'linear_least_squares', 'lstsq', 'pinv', 'singular_value_decomposition',
'solve', 'solve_linear_equations', 'svd', 'test']
>>> linalg.__file__
'/home/schofield/Tools/lib/python2.4/site-packages/numpy/linalg/__init__.pyc'
This is the linalg package from numpy, not scipy. It's missing my
favourite pinv2 function. What is going on?! I've just cleaned
everything out and built from the latest SVN revisions. Using
pkgload('linalg') doesn't seem to help:
>>> import scipy
>>> scipy.pkgload('linalg')
>>> linalg.__file__
Traceback (most recent call last):
File "<stdin>", line 1, in ?
NameError: name 'linalg' is not defined
>>> scipy.linalg.__file__
'/home/schofield/Tools/lib/python2.4/site-packages/numpy/linalg/__init__.pyc'
The only thing that helps is calling pkgload() with no arguments:
>>> import scipy
>>> scipy.pkgload()
Overwriting lib=<module 'scipy.lib' from
'/home/schofield/Tools/lib/python2.4/site-packages/scipy/lib/__init__.pyc'>
from
/home/schofield/Tools/lib/python2.4/site-packages/scipy/lib/__init__.pyc
(was <module 'numpy.lib' from
'/home/schofield/Tools/lib/python2.4/site-packages/numpy/lib/__init__.pyc'>
from
/home/schofield/Tools/lib/python2.4/site-packages/numpy/lib/__init__.pyc)
>>> scipy.linalg.__file__
'/home/schofield/Tools/lib/python2.4/site-packages/scipy/linalg/__init__.pyc'
Unless I'm doing something very stupid, there seem to be multiple
sources of evil here. First, numpy's linalg package is available from
the scipy namespace, which seems like a recipe for Ed's madness, since
he can't find his pinv2() function. Second, scipy.pkgload('linalg')
silently fails to make any visible difference. This is probably a
simple bug. Third, ..., there is no third. But bad things usually
come in threes.
-- Ed
|
|
From: Christopher H. <ch...@st...> - 2006-01-18 18:46:20
|
Hi Travis,
The following works in numarray but fails in numpy:
In [23]: a1 = numarray.strings.array(['abc','def','xx'])
In [24]: a1
Out[24]: CharArray(['abc', 'def', 'xx'])
In [25]: from numpy.core import char
In [26]: a = char.array(['abc', 'def', 'xx'])
---------------------------------------------------------------------------
exceptions.TypeError Traceback (most
recent call last)
/data/sparty1/dev/pyfits-numpy/test/<console>
/data/sparty1/dev/site-packages/lib/python/numpy/core/defchararray.py in
array(obj, itemsize, copy, unicode, fortran)
321 dtype += str(itemsize)
322
--> 323 if isinstance(obj, str) or isinstance(obj, unicode):
324 if itemsize is None:
325 itemsize = len(obj)
TypeError: isinstance() arg 2 must be a class, type, or tuple of classes
and types
Chris
|
|
From: Robert K. <rob...@gm...> - 2006-01-18 18:31:36
|
Travis Oliphant wrote: > Robert Kern wrote: >>4) There are some places in numpy that use numpy.dual. >> >>I think we can address all of your concerns by changing #4. And actually, I think we can eat our cake and have it, too, by providing a way to restrict numpy.dual to only use numpy versions. We won't provide a way to force numpy.dual to only use some_other_version. I think Fernando's examples won't be problematic, then. > This is an accurate assessment. However, I do not want to eliminate > number 4 as I've mentioned before. I think there is a place for having > functions that can be over-written with better versions. I agree that > it could be implemented better, however, with some kind of register > function instead of automatically looking in scipy... Like egg entry_points. Please, let's not reinvent this wheel again. http://peak.telecommunity.com/DevCenter/PkgResources#entry-points -- Robert Kern rob...@gm... "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter |
|
From: Travis O. <oli...@ie...> - 2006-01-18 18:19:58
|
Robert Kern wrote: >Fernando Perez wrote: > > > >>Anyway, I won't belabor this point any longer. I'd just like to hear from >>others their opinion on this matter, and if a decision is made to go ahead >>with the overwriting, at least I think the rationale for it should be well >>justified (and be more than "it's convenient"). The fact that over the last >>few weeks we've had several surprised questions on this is, to me, an >>indicator that I'm not the one uncomfortable with this decision. >> >> > >I haven't followed this discussion in great detail, but I believe the current >situation is this: > >1) If you use numpy.dft and numpy.linalg directly, you will always get the numpy >versions no matter what else is installed. > >2) If you want to optionally use optimized scipy versions if they are available >and regular numpy versions otherwise, then you use the functions exposed in >numpy.dual. You do so at your own risk. > >3) pkgload() exists to support the loading of subpackages. It does not reach >into numpy.dft or numpy.linalg at all. It is not relevant to this issue. > >4) There are some places in numpy that use numpy.dual. > >I think we can address all of your concerns by changing #4. > > > This is an accurate assessment. However, I do not want to eliminate number 4 as I've mentioned before. I think there is a place for having functions that can be over-written with better versions. I agree that it could be implemented better, however, with some kind of register function instead of automatically looking in scipy... -Travis |