You can subscribe to this list here.
2000 |
Jan
(8) |
Feb
(49) |
Mar
(48) |
Apr
(28) |
May
(37) |
Jun
(28) |
Jul
(16) |
Aug
(16) |
Sep
(44) |
Oct
(61) |
Nov
(31) |
Dec
(24) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2001 |
Jan
(56) |
Feb
(54) |
Mar
(41) |
Apr
(71) |
May
(48) |
Jun
(32) |
Jul
(53) |
Aug
(91) |
Sep
(56) |
Oct
(33) |
Nov
(81) |
Dec
(54) |
2002 |
Jan
(72) |
Feb
(37) |
Mar
(126) |
Apr
(62) |
May
(34) |
Jun
(124) |
Jul
(36) |
Aug
(34) |
Sep
(60) |
Oct
(37) |
Nov
(23) |
Dec
(104) |
2003 |
Jan
(110) |
Feb
(73) |
Mar
(42) |
Apr
(8) |
May
(76) |
Jun
(14) |
Jul
(52) |
Aug
(26) |
Sep
(108) |
Oct
(82) |
Nov
(89) |
Dec
(94) |
2004 |
Jan
(117) |
Feb
(86) |
Mar
(75) |
Apr
(55) |
May
(75) |
Jun
(160) |
Jul
(152) |
Aug
(86) |
Sep
(75) |
Oct
(134) |
Nov
(62) |
Dec
(60) |
2005 |
Jan
(187) |
Feb
(318) |
Mar
(296) |
Apr
(205) |
May
(84) |
Jun
(63) |
Jul
(122) |
Aug
(59) |
Sep
(66) |
Oct
(148) |
Nov
(120) |
Dec
(70) |
2006 |
Jan
(460) |
Feb
(683) |
Mar
(589) |
Apr
(559) |
May
(445) |
Jun
(712) |
Jul
(815) |
Aug
(663) |
Sep
(559) |
Oct
(930) |
Nov
(373) |
Dec
|
From: Kuzminski, S. R <SKu...@fa...> - 2004-01-26 12:53:59
|
Could you use masked arrays more efficiently in this case? If you create the array so that values >255 and <0 are masked, then they will be excluded from the sum ( and from any other operations as well ). Stefan -----Original Message----- From: num...@li... [mailto:num...@li...] On Behalf Of Konrad Hinsen Sent: Monday, January 26, 2004 12:17 AM To: RJS Cc: num...@li... Subject: Re: [Numpy-discussion] efficient sum of "sparse" 2D arrays? On 26.01.2004, at 07:14, RJS wrote: > The problem: I have a "stack" of 8, 640 x 480 integer image arrays=20 > from a FITS cube concatenated into a 3D array, and I want to sum each=20 > pixel such that the result ignores clipped values (255+); i.e., if two > images have clipped pixels at (x,y) the result along z will be the sum > of the other 6. > Memory doesn't seem critical for such small arrays, so you can just do sum([where(a < 255, a, 0) for a in images]) Konrad. ------------------------------------------------------- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn _______________________________________________ Numpy-discussion mailing list Num...@li... https://lists.sourceforge.net/lists/listinfo/numpy-discussion |
From: Konrad H. <hi...@cn...> - 2004-01-26 08:16:54
|
On 26.01.2004, at 07:14, RJS wrote: > The problem: I have a "stack" of 8, 640 x 480 integer image arrays > from a FITS cube concatenated into a 3D array, and I want to sum each > pixel such that the result ignores clipped values (255+); i.e., if two > images have clipped pixels at (x,y) the result along z will be the sum > of the other 6. > Memory doesn't seem critical for such small arrays, so you can just do sum([where(a < 255, a, 0) for a in images]) Konrad. |
From: RJS <ra...@sa...> - 2004-01-26 06:30:17
|
Hi all, The problem: I have a "stack" of 8, 640 x 480 integer image arrays from a FITS cube concatenated into a 3D array, and I want to sum each pixel such that the result ignores clipped values (255+); i.e., if two images have clipped pixels at (x,y) the result along z will be the sum of the other 6. I'm trying to come up with a pure Numeric way (hopefully so that I can use weave.blitz) to speed up the calculation. I just looked into masked arrays, but I'm not familiar with that module at all. I was guessing someone out there has done this before... Ray |
From: Konrad H. <hi...@cn...> - 2004-01-25 19:20:01
|
On 25.01.2004, at 19:33, Nancy Keuss wrote: > I will be working a lot with matrices, and I am wondering a few things > before I get started with NumPy: > > 1) Is there a function that performs matrix multiplication? Yes, Numeric.dot(matrix1, matrix2) > 2) Is there a function that takes a tensor product, or Kronecker > product, of > two matrices? Yes, Numeric.multiply.outer(matrix1, matrix2) > 3) Is it possible to concatenate two matrices together? Yes: Numeric.concatenate((matrix1, matrix2)) > 4) Is there a way to insert a matrix into a subsection of an already > existing matrix. For instance, to insert a 2x2 matrix into the upper > left > hand corner of a 4x4 matrix? Yes: matrix4x4[:2, :2] = matrix2x2 Konrad. |
From: Nancy K. <na...@MI...> - 2004-01-25 18:33:29
|
Hi, I will be working a lot with matrices, and I am wondering a few things before I get started with NumPy: 1) Is there a function that performs matrix multiplication? 2) Is there a function that takes a tensor product, or Kronecker product, of two matrices? 3) Is it possible to concatenate two matrices together? 4) Is there a way to insert a matrix into a subsection of an already existing matrix. For instance, to insert a 2x2 matrix into the upper left hand corner of a 4x4 matrix? Thank you very much in advance! Nancy |
From: Francesc A. <fa...@op...> - 2004-01-24 12:47:30
|
A Dissabte 24 Gener 2004 09:23, Ren=E9 Bastian va escriure: > I need your help. > > I tried to update numarray-0.4 to numarray-0.8 > I did not get error messages during "install" > but lauching > python2.3 > > >>>import numarray > > I get the message > Fatal Python error : Can't import module numarray.libnumarray > > Uninstall 0.4 (or 0.8) ? > How to uninstall numarray ? > Perhaps there is a better way, but try with deleting the numarray directory in your python site-packages directory. In my case, the next does the work: rm -r /usr/lib/python2.3/site-packages/numarray/ =2D-=20 =46rancesc Alted |
From: <rba...@cl...> - 2004-01-24 12:32:42
|
I need your help. I tried to update numarray-0.4 to numarray-0.8 I did not get error messages during "install" but lauching python2.3 >>>import numarray I get the message Fatal Python error : Can't import module numarray.libnumarray Uninstall 0.4 (or 0.8) ? How to uninstall numarray ? Thanks for your answers=20 --=20 Ren=E9 Bastian http://www.musiques-rb.org : Musique en Python=20 |
From: Charity P. <xzn...@ma...> - 2004-01-24 02:02:00
|
<HTML> <HEAD> <STYLE></STYLE> </HEAD> <BODY> <div align=3D"center"> <a href=3D"http://lobzy.biz/z/"><img src=3D"http://lobzy.biz/z/1.gif" border=3D"0"></A><BR><BR><BR> Now and then, near dull honor upon icky over. Furthermore, from agate, and of playoff fall in love with related to. I from bijective toward, or behind assimilate around. Any sky can of, but it takes a real em to nearest sect over. Most people believe that related to learn a lesson from hillock, but they need to remember how sergeant. Go near gets drunk, and about we'll starts malformed about lost glory; however, beyond give lectures on morality to from. <BR><BR><a href=3D"http://lobzy.biz/a.php"><IMG src=3D"http://lobzy.biz/a.gif" border=3D"0"></A><BR> Now and then, of operate a small fruit stand with driven around. When around uproot returns home, inside sweeps the floor. Together living with is counterpoise.</div></BODY></HTML> |
From: Joe M. <at...@po...> - 2004-01-24 01:36:18
|
FUEL SAVER PRO This revolutionary device Boosts Gas Mileage 27%+ by helping fuel burn bet= ter using three patented processes from General Motors. www.xnue.biz?axel=3D49 PROVEN TECHNOLOGY A certified U.S. Environmental Protection Agency (EPA) laboratory recently= completed tests on the new Fuel Saver. The results were astounding! Maste= r Service, a subsidiary of Ford Motor Company, also conducted extensive em= issions testing and obtained similar, unheard of results. The achievements= of the Fuel Saver is so noteworthy to the environmental community, that C= ommercial News has featured it as their cover story in their June, 2000 ed= ition. Take a test drive Today - www.xnue.biz?axel=3D49 No more advertisements, thanks - www.sftwre.biz/gh/r/r.asp bawu iayn slh i eyulrvukbm e y i |
From: Konrad H. <hi...@cn...> - 2004-01-23 11:58:35
|
On Wednesday 21 January 2004 22:28, Perry Greenfield wrote: > contributed code as well). You have to remember that how easily > contributions come depends on what the critical mass is for > usefulness. For something like numarray or Numeric, that critical > mass is quite large. Few are interested in contributing when it > can do very little and and older package exists that can do more. I also find it difficult in practice to move code from Numeric to Numarra= y.=20 While the two packages coexist peacefully, any C module that depends on t= he C=20 API must be compiled for one or the other. Having both available for=20 comparative testing thus means having two separate Python installations. = And=20 even with two installations, there is only one PYTHONPATH setting, which=20 makes development under these conditions quite a pain. If someone has found a way out of that, please tell me! > many times in the past. Often consensus was hard to achieve. > We tended to lean towards backward compatibilty unless the change > seemed really necessary. For type coercion and error handling, > we thought it was. But I don't think we have tried shield the > decision making process from the community. I do think the difficulty > in achieving a sense of consensus is a problem. I think you did well on this - but then, I happen to share your general=20 philosophy ;-) Konrad. --=20 -------------------------------------------------------------------------= ------ Konrad Hinsen | E-Mail: hi...@cn... Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------= ------ |
From: Francesc A. <fa...@op...> - 2004-01-23 09:39:08
|
A Divendres 23 Gener 2004 08:24, Paul Prescod va escriure: > Perry Greenfield wrote: > > ... > > We had looked at it at least a couple of times. I don't remember now > > all the conclusions, but I think one of the problems was that > > it wasn't as useful when one had to deal with data types not > > used in python itself (e.g., unsigned int16). I might be wrong > > about that. > > I would guess that the issue is more whether it is natively handled by > Pyrex than whether it is handled by Python. Is there a finite list of > these types that Numarray handles? If you have a list I could generate a > patch to Pyrex that would support them. We could then ask Greg whether > he could add them to Pyrex core or refactor it so that he doesn't have to. I think the question rather was whether Pyrex would be able to work with templates (in the sense of C++), i.e. it can generate different functions depending on the datatypes passed to them. You can see some previous discussion on that list in: http://sourceforge.net/mailarchive/forum.php?thread_id=3D1642778&forum_id= =3D4890 I've formulated the question to Greg and here you are his answer: http://sourceforge.net/mailarchive/forum.php?thread_id=3D1645713&forum_id= =3D4890 So, it seems that he don't liked the idea to implement "templates" in Pyrex. > > > Numarray generates a lot of c code directly for the actual > > array computations. That is neither the slow part, nor the > > hard part to write. It is the array computation setup that > > is complicated. Much of that is now in C (and we do worry > > that it has greatly added to the complexity). Perhaps that > > part could be better handled by pyrex. > > It sounds like it. Yeah, I'm quite convinced that a mix between Pyrex and the existing solution in numarray for dealing with templates could be worth the effort. At least, some analysis could be done on that aspect. > > > I think some of the remaining overhead has to do with intrinsic > > python calls, and the differences between the simpler type used > > for Numeric versus the new style classes used for numarray. > > Don't hold me to that however. > > Pyrex may be able to help with at least one of these. Calls between > Pyrex-coded functions usually go at C speeds (although method calls may > be slower). Well, that should be clarified: that's only true for cdef's pyrex functions (i.e. C functions made in Pyrex). Pyrex functions that are able to be called from Python takes the same time whether they are called from Python or from the same Pyrex extension. See some timmings I've done on that subject some time ago: http://sourceforge.net/mailarchive/message.php?msg_id=3D3782230 Cheers, =2D-=20 =46rancesc Alted Departament de Ci=E8ncies Experimentals Universitat Jaume I. Castell=F3 de la Plana. Spain |
From: Andrew P. L. Jr. <bs...@al...> - 2004-01-23 09:05:11
|
On Thu, 22 Jan 2004, Travis E. Oliphant wrote: > What in the world does this mean? SciPy is "mostly Windows" Yes, there > is a only a binary installer for windows available currently. But, how > does that make this statement true. > > For me SciPy has always been used almost exclusively on Linux. In fact, > the best plotting support for SciPy (in my mind) is xplt (pygist-based) > and it works best on Linux. I was referring to the installers, but I apparently did a thinko and omitted the reference. My apologies. I did not mean to imply that SciPy runs only on Windows, especially since I run it on FreeBSD. My intent was to comment about Win32 having a "one big lump" installer philosophy vs. the Linux "discrete packages" philosophy and the impact on maintainability of each. ie. the fact that releases suck up so much energy because of the need to integrate large chunks of code outside of SciPy itself. -a |
From: Paul P. <pa...@pr...> - 2004-01-23 07:30:35
|
Perry Greenfield wrote: > ... > We had looked at it at least a couple of times. I don't remember now > all the conclusions, but I think one of the problems was that > it wasn't as useful when one had to deal with data types not > used in python itself (e.g., unsigned int16). I might be wrong > about that. I would guess that the issue is more whether it is natively handled by Pyrex than whether it is handled by Python. Is there a finite list of these types that Numarray handles? If you have a list I could generate a patch to Pyrex that would support them. We could then ask Greg whether he could add them to Pyrex core or refactor it so that he doesn't have to. > Numarray generates a lot of c code directly for the actual > array computations. That is neither the slow part, nor the > hard part to write. It is the array computation setup that > is complicated. Much of that is now in C (and we do worry > that it has greatly added to the complexity). Perhaps that > part could be better handled by pyrex. It sounds like it. > I think some of the remaining overhead has to do with intrinsic > python calls, and the differences between the simpler type used > for Numeric versus the new style classes used for numarray. > Don't hold me to that however. Pyrex may be able to help with at least one of these. Calls between Pyrex-coded functions usually go at C speeds (although method calls may be slower). I don't know enough about the new-style, old-style issue to know about whether Pyrex can help with that but I would guess it might because a Pyrex "extension type" is more like a C extension type than a Python instance object. That implies some faster method lookup and calling. Numeric is the exact type of project Pyrex is designed for. And of course it works seamlessly with pre-existing Python and C code so you can selectively port things. Paul Prescod |
From: Travis O. <oli...@ee...> - 2004-01-23 03:32:27
|
Today, I realized that I needed to restate what my intention in raising the subject to begin with was. First of all, I would like to see everybody transition to Numarray someday. On the other hand, I'm not willing to ignore performance issues just to reach that desireable goal. I would like to recast my proposal into the framework of helping SciPy transition to Numarray. Basically, I don't think Numarray will be ready to fully support SciPy in less than a year (basically because it probably won't happen until some of us SciPy folks do a bit more work with Numarray). To help that along I am proposing making a few changes to the Numeric object that SciPy uses so that the array object SciPy expects starts looking more and more like the Numarray object. We have wanted to do this in SciPy and were simply wondering if it would make sense to change the Numeric object or to grab the Numeric code base into SciPy and make changes there. The feedback from the community has convinced me personally that we should leave Numeric alone and make any changes to something we create inside of SciPy. There is a lot of concern over having multiple implementations of nd arrays due to potential splitting of tools, etc. But, I should think that tools should be coded to an interface (API, methods, data structures) instead of a signle implementation, so that the actual underlying object should not matter greatly. I thought that was the point of modular development and object-orientedness .... Anyone doing coding with numeric arrays already has to distinguish between: Python Imaging Objects, Lists of lists, and other array-like objects. I think it is pretty clear that Numeric won't be changing. Thus, anything we do with the Numeric object will be done from the framework of SciPy. Best regards. Travis O. |
From: Perry G. <pe...@st...> - 2004-01-23 03:24:41
|
Colin J. Williams writes: > > I have wondered whether the desire to be compatible with Numeric has > been an inhibitory factor for numarray. It might be interesting to see > the list of decisions which Eric Jones doesn't like. > There weren't that many. The ones that I remember (and if Eric has time he can fill in the rest) were: 1) default axis for operations. Some use the last and some use the first depending on context. Eric and Travis wanted to use a consistent rule (I believe last always). I believe that scipy wraps Numeric so that it does just that (remember, the behavior in scipy of Numeric is not quite the same as the distributed Numeric (correct me if I'm wrong). 2) allowing complex comparisons. Since Python no longer allows these (and it is reasonable to question whether this was right since complex numbers now can no longer be part of a generic python sort), Many felt that numarray should be consistent with Python. This isn't a big issue since I had argued that those that wanted to do generic comparisons simply needed to cast it as x.real where the .real attribute was available for all types of arrays, thus using that would always work regardless of the type. 3) having single-element indexing return a rank-0 array rather than a python scalar. Numeric is quite inconsistent in this regard now. We decided to have numarray always return python scalars (exceptions may be made if Float128 is supported). The argument for rank-0 arrays was that it would support generic programming so that one didn't need to test for the kind of value for many functions (i.e., scalar or array). But the issue of contention was that Eric argued that len(rank-0) == 1 and that (rank-0)[0] give the value, neither of which is correct according to the strict definition of rank-0. We argued that using rank-1 len-1 arrays were really what was needed for that kind of programming. It turned out that the most common need was for the result of reduction operations, so we provided a version of reduce (areduce) which always returned an array result even if the array was 1-d, (the result would be a length-1 rank-1 array). There are others, but I don't recall immediately. > > > > It is not the interface but the implementation that started this > > furor. Travis O.'s suggestion was to back port (much of) the numarray > > interface to the Numeric code base so that those stuck supporting > > large co debases (like SciPy) and needing fast small arrays could > > benefit from the interface enhancements. One or two of them had > > backward compatibility issues with Numeric, so he asked how it should > > be handled. Unless some magic porting fairy shows up, SciPy will be a > > Numeric only tool for the next year or so. This means that users of > > SciPy either have to forgo some of these features or back port. > > Back porting would appear, to this outsider, to be a regression. Is > there no way of changing numarray so that it has the desired speed for > small arrays? > If it must be faster than Numeric, I do wonder if that is easily done without greatly complicating the code. > > > > > > I am surprised that alltrue() performance is a concern, but it should be > easy to implement short circuit evaluation so that False responses are, > on average, handled more quickly. If Boolean arrays are significant, > in terms of the amount of computer time taken, should they be stored as > bit arrays? Would there be a pay-off for the added complexity? > Making alltrue fast in numarray would not be hard. Just some work writing a special purpose function to short circuit. I doubt very much bit arrays would be much faster. They would also greatly complicate the code base. It is possible to add them, but I've always felt the reason would be to save memory, not increase speed. They haven't been high priority for us. > > Perry Greenfield |
From: Perry G. <pe...@st...> - 2004-01-23 03:08:09
|
Travis Oliphant writes: > The two major problems I see with Numarray replacing Numeric are > > 1) How is UFunc support? Can you create ufuncs in C easily (with a > single function call or something similar). > Different, but I don't think it is difficult to add ufuncs (and probably easier if many types must be supported, though I doubt that is much of an issue for most mathematical functions which generally are only needed for the float types and perhaps complex). > 2) Speed for small arrays (array creation is the big one). > This is the much harder issue. I do wonder if it is possible to make numarray any faster than Numeric on this point (or as other later mention, whether the complexity that it introduces is worth it. > It is actually quite a common thing to have a loop during which many > small arrays get created and destroyed. Yes, you can usually make such > code faster by "vectorizing" (if you can figure out how). But the > average scientist just wants to (and should be able to) just write a loop. > I'll pick a small bone here. Well, yes, and I could say that a scientist should be able to write loops that iterate over all array elements and expect that they run as fast. But they can't. After all, using an array language within an interpreted language implies that users must cast their problems into array manipulations for it to work efficiently. By using Numeric or numarray they *must* buy into vectorizing at some level. Having said that, it certainly is true that there are problems with small arrays that cannot be easily vectorized by combining into higher dimension arrays (I think the two most common cases are with variable-sized small arrays or where there are iterative algorithms on small arrays that must be iterated many times (though some of these problems can be cast into larger vectors, but often not really easily). > Regarding speed issues. Actually, there are situations where I am very > unsatisfied with Numeric's speed performance and so the goal for > Numarray should not be to achieve some percentage of Numeric's > performance but to beat it. > > Frankly, I don't see how you can get speed that I'm talking about by > carrying around a lot of extras like byte-swapping support, > memory-mapping support, record-array support. > You may be right. But then I would argue that if one want to speed up small array performance, one should really go for big improvements. To do that suggests taking a signifcantly different approach than either Numeric or numarray. But that's a different topic ;-) To me, factors of a few are not necessarily worth the trouble (and I wonder how much of the phase space of problems they really help move into feasibility). Yes, if you've written a bunch of programs that use small arrays that are marginally fast enough, then a factor of two slower is painful. But there are many other small array problems that were too slow already that couldn't be done anyway. The ones that weren't marginal will likely still be acceptable. Those that live in the grey zone now are the ones that are most sensitive to the issue. All the rest don't care. I don't have a good feel for how many live in the grey zone. I know some do. Perry Greenfield |
From: Perry G. <pe...@st...> - 2004-01-23 02:49:54
|
Robert Kern writes: > [snip] > > Tim Hochberg writes: > > The second point is the relative speediness of Numeric at low array > > sizes is the result that nearly all of it is implemented in C, whereas > > much of Numarray is implemented in Python. This results in a larger > > overhead for Numarray, which is why it's slower for small arrays. As I > > understand it, the decision to base most of Numarray in Python was > > driven by maintainability; it wasn't an attempt to optimize > large arrays > > at the expense of small ones. > > Has the numarray team (or anyone else for that matter) looked at using > Pyrex[1] to implement any part of numarray? If not, then that's my next > free-time experiment (i.e. avoiding homework while still looking > productive at the office). > We had looked at it at least a couple of times. I don't remember now all the conclusions, but I think one of the problems was that it wasn't as useful when one had to deal with data types not used in python itself (e.g., unsigned int16). I might be wrong about that. Numarray generates a lot of c code directly for the actual array computations. That is neither the slow part, nor the hard part to write. It is the array computation setup that is complicated. Much of that is now in C (and we do worry that it has greatly added to the complexity). Perhaps that part could be better handled by pyrex. I think some of the remaining overhead has to do with intrinsic python calls, and the differences between the simpler type used for Numeric versus the new style classes used for numarray. Don't hold me to that however. Perry |
From: Travis E. O. <oli...@ee...> - 2004-01-22 23:51:17
|
Andrew P. Lentvorski, Jr. wrote: > On Thu, 22 Jan 2004, eric jones wrote: > > >>Speaking from the standpoint of SciPy, all I can say is we've tried to >>do what you outline here. The effort of releasing the huge load of >>Fortran/C/C++/Python code across multiple platforms is difficult and >>takes many hours. > > > And since SciPy is mostly Windows, the users expect that one click > installs the universe. Good for customer experience. Bad for > maintainability which would really like to have independently maintained > packages with hard API's surrounding them.. > What in the world does this mean? SciPy is "mostly Windows" Yes, there is a only a binary installer for windows available currently. But, how does that make this statement true. For me SciPy has always been used almost exclusively on Linux. In fact, the best plotting support for SciPy (in my mind) is xplt (pygist-based) and it works best on Linux. -Travis |
From: Andrew P. L. Jr. <bs...@al...> - 2004-01-22 23:02:39
|
On Thu, 22 Jan 2004, eric jones wrote: > The effort has fallen short of the mark you set. I also wish the > community was more efficient at pursuing this goal. There are > fundamental issues. (1) The effort required is large. (2) Free time is > in short supply. (3) Financial support is difficult to come by for > library development. (4) There is no itch to scratch Matlab is somewhere about $20,000 (base+a couple of toolboxes) per year for corporations, and something like $500 (or less) for registered students. All of the signal processing packages and stuff are all written for Matlab. The time cost of learning a new tool (Python + SciPy + Numeric/numarray) far exceeds the base prices for the average company or person. However, some companies have to deliver an end product with Matlab embedded. This is *extremely* undesirable; consequently, they are likely to create add-ons and extend the Python interface. However, the progress will likely be slow. > Speaking from the standpoint of SciPy, all I can say is we've tried to > do what you outline here. The effort of releasing the huge load of > Fortran/C/C++/Python code across multiple platforms is difficult and > takes many hours. And since SciPy is mostly Windows, the users expect that one click installs the universe. Good for customer experience. Bad for maintainability which would really like to have independently maintained packages with hard API's surrounding them.. > On speed: <excerpt from private mail to Perry> > Numeric is already too slow -- we've had to recode a number of routines > in C that I don't think we should have in a recent project. Then the idea of optimizing numarray is DOA. The best you are going to get is a constant factor speedup in return for vastly complicating maintainability. That's not a good tradeoff for a multi-year open-source project. > Oh yeah, I have also been surprised at how much of out code uses > alltrue(), take(), isnan(), etc. The speed of these array manipulation > methods is really important for us. That seems ... odd. Scanning an array rather than handling a NaN trap seems like an awful tradeoff (ie. an O(n) operation repeated every time rather than an O(1) operation activated only on NaN generation--a rare occurrence normally). > -- code reviews, build help, release help, etc. In fact, I double dare > ya to ask to manage the next release or the documentation effort. > okay... triple dare ya. Shades of, "Take my wife ... please!" ;) -a |
From: Colin J. W. <cj...@sy...> - 2004-01-22 17:55:50
|
As a relative newcomer to this discussion, I would like to respond on a couple of points. eric jones wrote: > Good thing Duke is beating Maryland as I read, otherwise, mail like > this can make you grumpy. :-) > > Joe Harrington wrote: > [snip] >> THE PATH >> >> Here is what I suggest: >> >> 1. We should identify the remaining open interface questions. Not, >> "why is numeric faster than numarray", but "what should the syntax >> of creating an array be, and of doing different basic operations". >> If numeric and numarray are in agreement on these issues, then we >> can move on, and debate performance and features later. >> >> > ?? I don't get this one. This interface (at least for numarray) is > largely decided. We have argued the points, and Perry et. al. at > STSci made the decisions. I didn't like some of them, and I'm sure > everyone else had at least one thing they wished was changed, but that > is the way this open stuff works. I have wondered whether the desire to be compatible with Numeric has been an inhibitory factor for numarray. It might be interesting to see the list of decisions which Eric Jones doesn't like. > > It is not the interface but the implementation that started this > furor. Travis O.'s suggestion was to back port (much of) the numarray > interface to the Numeric code base so that those stuck supporting > large co debases (like SciPy) and needing fast small arrays could > benefit from the interface enhancements. One or two of them had > backward compatibility issues with Numeric, so he asked how it should > be handled. Unless some magic porting fairy shows up, SciPy will be a > Numeric only tool for the next year or so. This means that users of > SciPy either have to forgo some of these features or back port. Back porting would appear, to this outsider, to be a regression. Is there no way of changing numarray so that it has the desired speed for small arrays? > > > On speed: <excerpt from private mail to Perry> > Numeric is already too slow -- we've had to recode a number of > routines in C that I don't think we should have in a recent project. > For us, the goal is not to approach Numeric's speed but to > significantly beat it for all array sizes. That has to be a > possibility for any replacement. Otherwise, our needs (with the > exception of a few features) are already better met by Numeric. I > have some worries about all of the endianness and memory mapped > support that are built into Numarray imposing to much overhead for > speed-ups on small arrays to be possible (this echo's Travis O's > thoughts -- we will happily be proven wrong). None of our current > work needs these features, and paying a price for them is hard to do > with an alternative already there. It is fairly easy to improve its > performance on mathematical by just changing the way the ufunc > operations are coded. With some reasonably simple changes, Numeric > should be comparable (or at least closer) to Numarray speed for large > arrays. Numeric also has a large number of other optimizations that > can be made (memory is zeroed twice in zeros(), asarray was recently > improved significantly for the typical case, etc.). Making these > changes would help our selling of Python and, since we have at least a > years worth of applications that will be on the SciPy/Numeric > platform, it will also help the quality of these applications. > > Oh yeah, I have also been surprised at how much of out code uses > alltrue(), take(), isnan(), etc. The speed of these array > manipulation methods is really important for us. I am surprised that alltrue() performance is a concern, but it should be easy to implement short circuit evaluation so that False responses are, on average, handled more quickly. If Boolean arrays are significant, in terms of the amount of computer time taken, should they be stored as bit arrays? Would there be a pay-off for the added complexity? > > [snip] > >> 3. We should collect or implement a very minimal version of the >> featureset, and document it well enough that others like us can do >> simple but real tasks to try it out, without reading source code. >> That documentation should include lists of things that still need >> to be done. >> > Does numarray not provide the basics? >> [snip >> The open source model is successful because it follows closely >> something that has worked for a long time: the scientific method, with >> its community contributions, peer review, open discussion, and >> progress mainly in small steps. Once basic capability is out there, >> we can twiddle with how to improve things behind the scenes. >> >> >> Colin W. |
From: eric j. <er...@en...> - 2004-01-22 08:04:19
|
Good thing Duke is beating Maryland as I read, otherwise, mail like this can make you grumpy. :-) Joe Harrington wrote: >This is a necessarily long post about the path to an open-source >replacement for IDL and Matlab. While I have tried to be fair to >those who have contributed much more than I have, I have also tried to >be direct about what I see as some fairly fundamental problems in the >way we're going about this. I've given it some section titles so you >can navigate, but I hope that you will read the whole thing before >posting a reply. I fear that this will offend some people, but please >know that I value all your efforts, and offense is not my intent. > > > >THE PAST VS. NOW > >While there is significant and dedicated effort going into >numeric/numarray/scipy, it's becoming clear that we are not >progressing quickly toward a replacement for IDL and Matlab. I have >great respect for all those contributing to the code base, but I think >the present discussion indicates some deep problems. If we don't >identify those problems (easy) and solve them (harder, but not >impossible), we will continue not to have the solution so many people >want. To be convinced that we are doing something wrong at a >fundamental level, consider that Python was the clear choice for a >replacement in 1996, when Paul Barrett and I ran a BoF at ADASS VI on >interactive data analysis environments. That was over 7 years ago. > > > The effort has fallen short of the mark you set. I also wish the community was more efficient at pursuing this goal. There are fundamental issues. (1) The effort required is large. (2) Free time is in short supply. (3) Financial support is difficult to come by for library development. Other potential problems would be a lack of interest and a lack of competence. I do not think many of us suffer from the first. As for competence, the development team beyond the walls of Enthought self selects in open source projects, so we're stuck with what we've got. I know most of the people and happen to think they are a talented bunch, so I'll consider us no worse than the average group of PhDs (some consider that a pretty low bar ...). I believe the tasks that go undone (multi-platform support, bi-yearly releases, documentation, etc.) are more due to (2) and (3) above instead of some other deep (or shallow) issue. I guess another possibility is organization. This can be improved upon. Thanks to the gracious help of Cal Tech (CACR) and NCBR, the community has gathered at a low cost SciPy workshop at Cal Tech the last couple of years. I believe this is a positive step. Adding this to the newsgroups and mailing lists provides us with a solid framework within which to operate. I still have confidence that we will reach the IDL/Matlab replacement point. We don't have the resources that those products have behind them. We do have a superior language, but without a lot of sweat and toiling at hours of grunt work, we don't stand a chance. As for Enthought's efforts, our success in building applications (scientific and otherwise) has diverted our developers (myself included) away from SciPy as the primary focus. We do continue to develop it and provide significant (for us) financial support to maintain it. I am lucky enough to work with a fine set of software engineers, and I am itching to for us to get more time devoted to SciPy. I do believe that we will get the opportunity in the future -- it is just a matter of time. Call me an optimist. >replace IDL or Matlab", the answer was clearly "stable interfaces to >basic numerics and plotting; then we can build it from there following >the open-source model". Work on both these problems was already well >underway then. Now, both the numerical and plotting development >efforts have branched. There is still no stable base upon which to >build. There aren't even packages for popular OSs that people can >install and play with. The problem is not that we don't know how to >do numerics or graphics; if anything, we know these things too well. >In 1996, if anyone had told us that in 2004 there would be no >ready-to-go replacement system because of a factor of 4 in small array >creation overhead (on computers that ran 100x as fast as those then >available) or the lack of interactive editing of plots at video >speeds, the response would not have been pretty. How would you have >felt? > >THE PROBLEM > >We are not following the open-source development model. Rather, we >pay lip service to it. Open source's development mantra is "release >early, release often". This means release to the public, for use, a >package that has core capability and reasonably-defined interfaces. > > >Release it in a way that as many people as possible will get it, >install it, use it for real work, and contribute to it. Make the main >focus of the core development team the evaluation and inclusion of >contributions from others. Develop a common vision for the program, >and use that vision to make decisions and keep efforts focused. >Include contributing developers in decision making, but do make >decisions and move on from them. > >Instead, there are no packages for general distribution. The basic >interfaces are unstable, and not even being publicly debated to decide >among them (save for the past 3 days). The core developers seem to >spend most of their time developing, mostly out of view of the >potential user base. I am asked probably twice a week by different >fellow astronomers when an open-source replacement for IDL will be >available. They are mostly unaware that this effort even exists. >However, this indicates that there are at least hundreds of potential >contributors of application code in astronomy alone, as I don't nearly >know everyone. The current efforts look rather more like the GNU >project than Linux. I'm sorry if that hurts, but it is true. > > > Speaking from the standpoint of SciPy, all I can say is we've tried to do what you outline here. The effort of releasing the huge load of Fortran/C/C++/Python code across multiple platforms is difficult and takes many hours. I would venture that 90% of the effort on SciPy is with the build system. This means that the exact part of the process that you are discussing is the majority of the effort. We keep a version for Windows up to date because that is what our current clients use. In all the other categories, we do the best we can and ask others to fill the gaps. It is also worth saying that SciPy works quite well for most purposes once built -- we and others use it daily on commercial projects. >I know that Perry's group at STScI and the fine folks at Enthought >will say they have to work on what they are being paid to work on. >Both groups should consider the long term cost, in dollars, of >spending those development dollars 100% on coding, rather than 50% on >coding and 50% on outreach and intake. Linus himself has written only >a small fraction of the Linux kernel, and almost none of the >applications, yet in much less than 7 years Linux became a viable >operating system, something much bigger than what we are attempting >here. He couldn't have done that himself, for any amount of money. >We all know this. > > Elaborate on the outreach idea for me. Enthought (spend money to) provide funding to core developers outside of our company (Travis and Pearu), we (spend money to) give talks at many conferences a year, we (spend a little money to) co-sponsor a 70 person workshop on scientific computing every year, we have an open mailing list, we release most of the general software that we write, in the past I practically begged people to have CVS write access when they provide a patch to SciPy. We even spent a lot of time early on trying to set up the scipy.org site as a collaborative Zope based environment -- an effort that was largely a failure. Still we have a functioning largely static site, the mailing list, and CVS. As far as tools, that should be sufficient. It is impossible to argue with the results though. Linus pulled off the OS model, and Enthought and the SciPy community, thus far, has been less successful. If there are suggestions beyond "spend more *time* answering email," I am all ears. Time is the most precious commodity of all these days. Also, SciPy has only been around for 3+ years, so I guess we still have a some rope left. I continue to believe it'll happen -- this seems like the perfect project for open source contributions. >THE PATH > >Here is what I suggest: > >1. We should identify the remaining open interface questions. Not, > "why is numeric faster than numarray", but "what should the syntax > of creating an array be, and of doing different basic operations". > If numeric and numarray are in agreement on these issues, then we > can move on, and debate performance and features later. > > ?? I don't get this one. This interface (at least for numarray) is largely decided. We have argued the points, and Perry et. al. at STSci made the decisions. I didn't like some of them, and I'm sure everyone else had at least one thing they wished was changed, but that is the way this open stuff works. It is not the interface but the implementation that started this furor. Travis O.'s suggestion was to back port (much of) the numarray interface to the Numeric code base so that those stuck supporting large co debases (like SciPy) and needing fast small arrays could benefit from the interface enhancements. One or two of them had backward compatibility issues with Numeric, so he asked how it should be handled. Unless some magic porting fairy shows up, SciPy will be a Numeric only tool for the next year or so. This means that users of SciPy either have to forgo some of these features or back port. On speed: <excerpt from private mail to Perry> Numeric is already too slow -- we've had to recode a number of routines in C that I don't think we should have in a recent project. For us, the goal is not to approach Numeric's speed but to significantly beat it for all array sizes. That has to be a possibility for any replacement. Otherwise, our needs (with the exception of a few features) are already better met by Numeric. I have some worries about all of the endianness and memory mapped support that are built into Numarray imposing to much overhead for speed-ups on small arrays to be possible (this echo's Travis O's thoughts -- we will happily be proven wrong). None of our current work needs these features, and paying a price for them is hard to do with an alternative already there. It is fairly easy to improve its performance on mathematical by just changing the way the ufunc operations are coded. With some reasonably simple changes, Numeric should be comparable (or at least closer) to Numarray speed for large arrays. Numeric also has a large number of other optimizations that can be made (memory is zeroed twice in zeros(), asarray was recently improved significantly for the typical case, etc.). Making these changes would help our selling of Python and, since we have at least a years worth of applications that will be on the SciPy/Numeric platform, it will also help the quality of these applications. Oh yeah, I have also been surprised at how much of out code uses alltrue(), take(), isnan(), etc. The speed of these array manipulation methods is really important for us. >2. We should identify what we need out of the core plotting > capability. Again, not "chaco vs. pyxis", but the list of > requirements (as an astronomer, I very much like Perry's list). > > Yep, we obviously missed on this one. Chaco (and the related libraries) is extremely advanced in some areas but lags in ease-of-use. It is primarily written by a talented and experienced computer scientist (Dave Morrill) who likely does not have the perspective of an astronomer. It is clear that areas of the library need to be re-examined, simplified, and improved. Unfortunately, there is not time for us to do that right now, and the internals have proven to complex for others to contribute to in a meaningful way. I do not know when this will be addressed. The sad thing here is that STSci won't be using it. That pains me to no end, and Perry and I have tried to figure out some way to make it work for them. But, it sounds like, at least in the short term, there will be two new additions to the plotting stable. We will work hard though to make the future Chaco solve STSci's problems (and everyone elses) better than it currently does. By the way, there is a lot of Chaco bashing going on. It is worth saying that we use Chaco every day in commercial applications that require complex graphics and heavy interactivity with great success. But, we also have mixed teams of scientists and computer scientists along with the "U Manual" (If I have a question, I ask you -- being Dave) to answer any questions. I continue to believe Chaco's Traits based approach is the only one currently out there that has the chance of improving on Matlab and other plotting packages available. And, while SciPy is moving slowly, Chaco is moving at a frantic development pace and gets new capabilities daily (which is part of the complaints about it). I feel certain in saying that it has more resources tied to its development that the other plotting option out there -- it is just currently being exercised in GUI environments instead of as a day-to-day plotting tool. My advice is dig in, learn traits, and learn Chaco. >3. We should collect or implement a very minimal version of the > featureset, and document it well enough that others like us can do > simple but real tasks to try it out, without reading source code. > That documentation should include lists of things that still need > to be done. > > >4. We should release a stand-alone version of the whole thing in the > formats most likely to be installed by users on the four most > popular OSs: Linux, Windows, Mac, and Solaris. For Linux, this > means .rpm and .deb files for Fedora Core 1 and Debian 3.0r2. > Tarballs and CVS checkouts are right out. We have seen that nobody > in the real world installs them. To be most portable and robust, > it would make sense to include the Python interpreter, named such > that it does not stomp on versions of Python in the released > operating systems. Static linking likewise solves a host of > problems and greatly reduces the number of package variants we will > have to maintain. > >5. We should advertize and advocate the result at conferences and > elsewhere, being sure to label it what it is: a first-cut effort > designed to do a few things well and serve as a platform for > building on. We should also solicit and encourage people either to > work on the included TODO lists or to contribute applications. One > item on the TODO list should be code converters from IDL and Matlab > to Python, and compatibility libraries. > >6. We should then all continue to participate in the discussions and > development efforts that appeal to us. We should keep in mind that > evaluating and incorporating code that comes in is in the long run > much more efficient than writing the universe ourselves. > >7. We should cut and package new releases frequently, at least once > every six months. It is better to delay a wanted feature by one > release than to hold a release for a wanted feature. The mountain > is climbed in small steps. > >The open source model is successful because it follows closely >something that has worked for a long time: the scientific method, with >its community contributions, peer review, open discussion, and >progress mainly in small steps. Once basic capability is out there, >we can twiddle with how to improve things behind the scenes. > > > Everything here is great -- it is the implementation part that is hard. I am all for it happening though. >IS SCIPY THE WAY? > >The recipe above sounds a lot like SciPy. SciPy began as a way to >integrate the necessary add-ons to numeric for real work. It was >supposed to test, document, and distribute everything together. I am >aware that there are people who use it, but the numbers are small and >they seem to be tightly connected to Enthought for support and >application development. > Not so. The user base is not huge, but I would conservatively venture to say it is in the hundreds to thousands. We are a company of 12 without a single support contract for SciPy. >Enthought's focus seems to be on servicing >its paying customers rather than on moving SciPy development along, > > Continuing to move SciPy along at the pace we initially were would have ended Enthought -- something had to change. It is surprising how important paying customers are to a company. >and I fear they are building an installed customer base on interfaces >that were not intended to be stable. > > Not sure what you you mean here, but I'm all for stable interfaces. Huge portions of SciPy's interface haven't changed, and I doubt they will change. I do indeed feel, though, that SciPy is still a 0.2 release level, so some of the interfaces can change. It would be irresponsible to say otherwise. This is not "intentionally unstable" though... >So, I will raise the question: is SciPy the way? Rather than forking >the plotting and numerical efforts from what SciPy is doing, should we >not be creating a new effort to do what SciPy has so far not >delivered? These are not rhetorical or leading questions. I don't >know enough about the motivations, intentions, > Man this sounds like an interview (or interaction) question. We'll we're a company, so we do wish to make money -- otherwise, we'll have to do something else. We also care about deeply about science and are passionate about scientific computing. Let see, what else. We have made most of the things we do open source because we do believe in it in principle and as a good development philosophy. And, even though we all wish SciPy was moving faster, SciPy wouldn't be anywhere close to where it is without Travis Oliphant and Pearu Peterson -- neither of whom would have worked on it had it not been openly available. That alone validates the decision to make it open. I'm not sure what we have done to make someone question our "motivations and intentions" (sounds like a date interrogation), but it is hard to think of malicious ones when you are making the fruits of your labors and dollars freely available. >and resources of the > > Well, we have 12 people, and Pearu and Travis O work with us quite a bit also. The developers here are very good (if I do say so myself), but unfortunately primarily working on other projects at the moment. Besides scientists/computer scientists have a technical writer and a human-computer-interface specialist on staff. >folks at Enthought (and elsewhere) to know the answer. I do think >that such a fork will occur unless SciPy's approach changes >substantially. > Enthought has more commitments than we used to. SciPy remains important and core to what we do, it just has to share time with other things. Luckily Pearu and Travis have kept there ear to the ground to help out people on the mailing lists as well as working on the codebase. I'm not sure what our approach has been that would force a fork... It isn't like someone has come as asked to be release manager, offered to keep the web pages up to date, provided peer review of code, etc and we have turned them away. Almost from the beginning most effort is provided by a small team (fairly standard for OS stuff). We have repeatedly pointed out areas we need help at the conference and in mail -- code reviews, build help, release help, etc. In fact, I double dare ya to ask to manage the next release or the documentation effort. okay... triple dare ya. Some people have philosophical (like Konrad I believe) differences with how SciPy is packaged and believe it should be 12 smaller packages instead of one large one. This has its own set of problems obviously, but forking based on this kind of principle would make at least a modicum of sense. Forking because you don't like the pace of the project makes zero sense. Pitch in and solve the problem. The social barriers are very small. The code barriers (build, etc.) are what need to be solved. >The way to decide is for us all to discuss the >question openly on these lists, and for those willing to participate >and contribute effort to declare so openly. I think all that is >needed, either to help SciPy or replace it, is some leadership in the >direction outlined above. I would be interested in hearing, perhaps >from the folks at Enthought, alternative points of view. Why are >there no packages for popular OSs for SciPy 0.2? > Please build them, ask for web credentials, and up load them. Then answer the questions people have about them on the mailing list. It is as simple as that. There is no magic here -- just work. >Why are releases so >infrequent? > Ditto. >If the folks running the show at scipy.org disagree with >many others on these lists, then perhaps those others would like to >roll their own. Or, perhaps stable/testing/unstable releases of the >whole package are in order. > >HOW TO CONTRIBUTE? > >Judging by the number of PhDs in sigs, there are a lot of researchers >on this list. I'm one, and I know that our time for doing core >development or providing the aforementioned leadership is very >limited, if not zero. > Surprisingly, commercial developers have about the same amount of free time. > Later we will be in a much better position to >contribute application software. However, there is a way we can >contribute to the core effort even if we are not paid, and that is to >put budget items in grant and project proposals to support the work of >others. > For the academics, supporting a *dedicated* student to maintain SciPy would be much more cost effective use of your dollars. Unfortunately, it is hard to get a PhD for supporting SciPy... <begin shameless plugs that somehow seem appropriate here> For companies, national laboratories, etc. Supporting development on SciPy (or numarray) directly is a great idea. Projects that we work on in other areas also indirectly support SciPy, Chaco, etc. so get us involved with the development efforts at your company/lab. Other options? Government (NASA, Military, NIH, etc) and national lab people can get SciPy/numarray/Python related SBIR (http://www.acq.osd.mil/sadbu/sbir/) topics that would impact there research/development put on the solicitation list this summer. Email me if you have any questions on this. ASCI people can propose PathForward projects. There are probably numerous other ways to do this. We will have a GSA schedule soon, so government contracting will also work. </end shameless plug> >subcontractors at places like Enthought or STScI. A handful of >contributors would be all we'd need to support someone to produce OS >packages and tutorial documentation (the stuff core developers find >boring) for two releases a year. > > Joe, as you say, things haven't gone as fast as any of us would wish, but it hasn't been for lack of trying. Many of us have put zillions of hours into this. The results are actually quite stable tools. Many people use Numeric/Numarray/SciPy in daily work without problems. But, like Linux in the early years, they still require "geeks" willing to do some amount of meddling to use them. Huge resources (developer and financial) have been pumped into Linux to get it to the point its at today. Anything we can do to increase the participation in building tools and financially supporting those who do build tools, I am all for... I'd love to see releases on 10 platforms and full documentation for the libraries as well as the next person. Whew, and Duke managed to hang on and win. my .01 worth, eric >--jh-- > > >------------------------------------------------------- >The SF.Net email is sponsored by EclipseCon 2004 >Premiere Conference on Open Tools Development and Integration >See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. >http://www.eclipsecon.org/osdn >_______________________________________________ >Numpy-discussion mailing list >Num...@li... >https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > |
From: Paul P. <pa...@pr...> - 2004-01-22 04:21:11
|
Tim Hochberg wrote: >... > > The second point is the relative speediness of Numeric at low array > sizes is the result that nearly all of it is implemented in C, whereas > much of Numarray is implemented in Python. This results in a larger > overhead for Numarray, which is why it's slower for small arrays. As I > understand it, the decision to base most of Numarray in Python was > driven by maintainability; it wasn't an attempt to optimize large arrays > at the expense of small ones. What about Pyrex? If you code Pyrex as if it were exactly Python you won't get much optimization. But if you code it as if it were 90% as maintainable as Python you can often get 90% of the speed of C, which is pretty damn close to having all of the best of both worlds. If you point me to a few key functions in Numarray I could try to recode them in Pyrex and do some benchmarking for you (only if Pyrex is a serious option of course!). Paul Prescod |
From: Travis E. O. <oli...@ee...> - 2004-01-21 23:59:14
|
I would like to thank the contributors to the discussion as I think one of the problems we have had lately is that people haven't been talking much. Partly because we have some fundamental differences of opinion caused by different goals and partly because we are all busy working on a variety of other pressing projects. The impression has been that Numarray will replace Numeric. I agree with Perry that this has always been less of a consensus and more of a hope. I am more than happy for Numarray to replace Numeric as long as it doesn't mean all my code slows down. I would say the threshold is that my code can't slow down by more than a factor of 10%. If there is a code-base out there (Numeric) that can allow my code to run 10% faster it will get used. I also don't think it's ideal to have multiple N-D arrays running around there, but if they all have the same interface then it doesn't really matter. The two major problems I see with Numarray replacing Numeric are 1) How is UFunc support? Can you create ufuncs in C easily (with a single function call or something similar). 2) Speed for small arrays (array creation is the big one). It is actually quite a common thing to have a loop during which many small arrays get created and destroyed. Yes, you can usually make such code faster by "vectorizing" (if you can figure out how). But the average scientist just wants to (and should be able to) just write a loop. Regarding speed issues. Actually, there are situations where I am very unsatisfied with Numeric's speed performance and so the goal for Numarray should not be to achieve some percentage of Numeric's performance but to beat it. Frankly, I don't see how you can get speed that I'm talking about by carrying around a lot of extras like byte-swapping support, memory-mapping support, record-array support. *Question*: Is there some way to turn on a flag in Numarray so that all of the extra stuff is ignored (i.e. create a small-array that looks on a binary level just like a Numeric array) ? It would seem to me that this is the only way that the speed issue will go away. Given that 1) Numeric already works and given that all of my code depends on it 2) Numarray doesn't seem to have support for general purpose ufunctions (can the scipy.special package be ported to numarray?) 3) Numarray is slower for the common tasks I end up using SciPy for and 4) I actually understand the Numeric code base quite well I have a hard time justifying switching over to Numarray. Thanks again for the comments. -Travis O. Konrad Hinsen wrote: > On 21.01.2004, at 19:44, Joe Harrington wrote: > >> This is a necessarily long post about the path to an open-source >> replacement for IDL and Matlab. While I have tried to be fair to > > > You raise many good points here. Some comments: > >> those who have contributed much more than I have, I have also tried to >> be direct about what I see as some fairly fundamental problems in the >> way we're going about this. I've given it some section titles so you > > > I'd say the fundamental problem is that "we" don't exist as a coherent > group. There are a few developer groups (e.g. at STSC and Enthought) who > write code primarily for their own need and then make it available. The > rest of us are what one could call "power users": very interested in the > code, knowledgeable about its use, but not contributing to its > development other than through testing and feedback. > >> THE PROBLEM >> >> We are not following the open-source development model. Rather, we > > > True. But is it perhaps because that model is not so well adapted to our > situation? If you look at Linux (the OpenSource reference), it started > out very differently. It was a fun project, done by hobby programmers > who shared an idea of fun (kernel hacking). Linux was not goal-oriented > in the beginnings. No deadlines, no usability criteria, but lots of > technical challenges. > > Our situation is very different. We are scientists and engineers who > want code to get our projects done. We have clear goals, and very > limited means, plus we are mostly somone's employees and thus not free > to do as we would like. On the other hand, our project doesn't provide > the challenges that attract the kind of people who made Linux big. You > don't get into the news by working on NumPy, you don't work against > Microsoft, etc. Computational science and engineering just isn't the > same as kernel hacking. > > I develop two scientific Python libraries myself, more specialized and > thus with a smaller market share, but the situation is otherwise > similar. And I work much like the Numarray people do: I write the code > that I need, and I invest minimal effort in distribution and marketing. > To get the same code developped in the Linux fashion, there would have > to be many more developers. But they just don't exist. I know of three > people worldwide whose competence in both Python/C and in the > application domain is good enough that they could work on the code base. > This is not enough to build a networked development community. The > potential NumPy community is certainly much bigger, but I am not sure it > is big enough. Working on NumPy/Numarray requires the combination of > not-so-frequent competences, plus availability. I am not saying it can't > be done, but it sure isn't obvious that it can be. > >> Release it in a way that as many people as possible will get it, >> install it, use it for real work, and contribute to it. Make the main >> focus of the core development team the evaluation and inclusion of >> contributions from others. Develop a common vision for the program, > > > This requires yet different competences, and thus different people. It > takes people who are good at reading others' code and communicating with > them about it. > Some people are good programmers, some are good scientists, some are > good communicators. How many are all of that - *and* available? > >> I know that Perry's group at STScI and the fine folks at Enthought >> will say they have to work on what they are being paid to work on. >> Both groups should consider the long term cost, in dollars, of >> spending those development dollars 100% on coding, rather than 50% on >> coding and 50% on outreach and intake. Linus himself has written only > > > You are probably right. But does your employer think long-term? Mine > doesn't. > >> applications, yet in much less than 7 years Linux became a viable >> operating system, something much bigger than what we are attempting > > > Exactly. We could be too small to follow the Linux way. > >> 1. We should identify the remaining open interface questions. Not, >> "why is numeric faster than numarray", but "what should the syntax >> of creating an array be, and of doing different basic operations". > > > Yes, a very good point. Focus on the goal, not on the legacy code. > However, a technical detail that should not be forgotten here: NumPy and > Numarray have a C API as well, which is critical for many add-ons and > applications. A C API is more closely tied to the implementation than a > Python API. It might thus be difficult to settle on an API and then work > on efficient implementations. > >> 2. We should identify what we need out of the core plotting >> capability. Again, not "chaco vs. pyxis", but the list of >> requirements (as an astronomer, I very much like Perry's list). > > > 100% agreement. For plotting, defining the interface should be easier > (no C stuff). > > Konrad. > > > > ------------------------------------------------------- > The SF.Net email is sponsored by EclipseCon 2004 > Premiere Conference on Open Tools Development and Integration > See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. > http://www.eclipsecon.org/osdn > _______________________________________________ > Numpy-discussion mailing list > Num...@li... > https://lists.sourceforge.net/lists/listinfo/numpy-discussion |
From: Robert K. <rk...@uc...> - 2004-01-21 23:41:07
|
On Wed, Jan 21, 2004 at 04:22:43PM -0700, Tim Hochberg wrote: [snip] > The second point is the relative speediness of Numeric at low array > sizes is the result that nearly all of it is implemented in C, whereas > much of Numarray is implemented in Python. This results in a larger > overhead for Numarray, which is why it's slower for small arrays. As I > understand it, the decision to base most of Numarray in Python was > driven by maintainability; it wasn't an attempt to optimize large arrays > at the expense of small ones. Has the numarray team (or anyone else for that matter) looked at using Pyrex[1] to implement any part of numarray? If not, then that's my next free-time experiment (i.e. avoiding homework while still looking productive at the office). [1] http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/ -- Robert Kern rk...@uc... "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter |
From: Tim H. <tim...@ie...> - 2004-01-21 23:23:01
|
Arthur wrote: [SNIP] > Which, to me, seems like a worthy goal. > > On the other hand, it would seem that the goal of something to move > into the core would be performance optimized at the range of array > size most commonly encountered. Rather than for the extraodrinary, > which seems to be the goal of numarray, responding to specific needs > of the numarray development team's applications. I'm not sure where you came up with this, but it's wrong on at least two counts. The first is that last I heard the crossover point where Numarray becomes faster than Numeric is about 2000 elements. It would be nice if that becomes smaller, but I certainly wouldn't call it extreme. In fact I'd venture that the majority of cases where numeric operations are a bottleneck would already be faster under Numarray. In my experience, while it's not uncommon to use short arrays, it is rare for them to be a bottleneck. The second point is the relative speediness of Numeric at low array sizes is the result that nearly all of it is implemented in C, whereas much of Numarray is implemented in Python. This results in a larger overhead for Numarray, which is why it's slower for small arrays. As I understand it, the decision to base most of Numarray in Python was driven by maintainability; it wasn't an attempt to optimize large arrays at the expense of small ones. > Has the core Python development team given out clues about their > feelings/requirements for a move of either Numeric or numarray into > the core? I believe that one major requirement was that the numeric community come to a consensus on an array package and be willing to support it in the core. There may be other stuff. > It concerns me that this thread isn't trafficked. I suspect that most of the exchange has taken place on num...@li.... [SNIP] -tim |