From: <a.s...@gm...> - 2002-03-18 22:32:14
|
Konrad Hinsen <hi...@cn...> writes: > Computational and notational efficiency are rather well separated, > fortunately. Both the current dot function and an hypothetical matrix Yes, the only thing they have in common is that both are currently unsatisfactory (for matrix operations) in numpy, at least for my needs. Although I've solved my most pressing performance problems by patching Numeric [1], I'm obviously interested in a more official solution (i.e. one that is maintained by others :) [...] [order changed by me] > a.s...@gm... (A.Schmolck) writes: > > My impression is that the best path also very much depends on the what the > > feature aspirations and divisions of labor of numpy/numarray and scipy are ^^^^^^^ Darn, I made a confusing mistake -- this should read _future_. > > going to be. For example, scipy is really aimed at scientific users, which > > need performance, and are willing to buy it with inconvenience (like the > > I see the main difference in distribution philosophy. NumPy is an > add-on package to Python, which is in turn used by other add-on > packages in a modular way. SciPy is rather a monolithic > super-distribution for scientific users. > > Personally I strongly favour the modular package approach, and in fact > I haven't installed SciPy on my system for that reason, although I > would be interested in some of its components. [...] > The same approach as for XML could be used: a slim-line version in the > standard distribution that could be replaced by a high-efficiency > extended version for those who care. [...] I personally agree with all your above points -- if you have a look at our "dotblas"-patch mentioned earlier (see [1]), you will find that it aims to do provide that -- have dot run anywhere without a hassle but run (much) faster if the user is willing to install atlas. My main concern was that the argument should shift away a bit from syntactic and implementation details to what audiences and what needs numpy/numarray and are supposed to address and, in this light, how to best strike the balance between convinience for users and maitainers, speed and bloat, generality and efficiency etc. As an example, adding the dotblas patch [1] to Numeric is, I think more convinient for the users (granting a few assumptions (like that it actually works :) for the sake of the argument) -- it gives users that have atlas better-performance and those who don't won't (or at least shouldn't) notice. It is however inconvinient for the maintainers. Whether one should bother including it in this or some other way depends, among the obvious question of whether there is a better way to achieve what it does for both groups (like creating a dedicated Matrix class), also on what numpy is really supposed to achieve. I'm not entirely clear on that. For example I don't know how many numpy users deeply care about their matrix multiplications for big (1000x1000) matrices being 40 times faster. The monolithic approach is not entirely without its charms (remember python's "batteries included" jinggle)? Apart from convinience factors it also has the not unconsiderable advantage that people use _one_ standard module for a certain thing -- rather than 20 different solutions. This certainly helps to improve code quality. Not least because someone goes through the trouble of deciding what merrit's inclusion in the "Big Thing", possibly urging changes but at least almost certainly taking more time for evalutation than an indivdual programmer who just wants to get a certain job done. It also makes life easier for module writers -- they can rely on certain stuff being around (and don't have to reinvent the wheel, another potential improvement to code quality). As such it makes live easier for maintainers, as does the scipy commandment that you have to install atlas/lapack, full-stop (and if it doesn't run on your machine -- well at least it works fast for some people and that might well be better than working slow for everyone in this context). So, I think what's good really depends on what you're aiming at, that's why I'd like to know what users and developers think about these matters. My points regarding scipy and numpy/numarray were just one attempt at interpreting what these respective libraries try to/should/could attempt to be or become. Now, not being a developer for either of them (I've only submitted a few minor patches to scipy), I'm not in a particular good position to venture such interpretations, but I hoped that it would provoke other and more knowledgeable people to share their opinions and insights on this matter (as indeed you did). > I'd love to have efficient matrices without having to install the > whole SciPy package! Welcome to the linear algebra lobby group ;) yep, that would be nice but my impression was that the scipy folks are currently more concerned about performance issues than the numpy/numarray folks and I could live with either package providing what I want. Ideally , I'd like to see a slim core numarray, without any frills (and more streamlined to behave like standard python containers (e.g. indexing and type/casts behavior)) for the python core, something more enabled and efficient for numerics (including matrices!) as a seperate package (like the XML example you quote). And then maybe a bigger pre-bundled collection of (ideally rather modular) numerical libraries for really hard-core scientific users (maybe in the spirit of xemacs-packages and sumo-tar-balls -- no bloat if you don't need it, plenty of features in an instant if you do). Anyway, is there at least general agreement that there should be some new and wonderful matrix class (plus supporting libraries) somewhere (rather than souping up array)? alex Footnotes: [1] patch for faster dot product in Numeric http://www.scipy.org/Members/aschmolck -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.S...@gm... http://www.dcs.ex.ac.uk/people/aschmolc/ |