From: Tim H. <tim...@ie...> - 2003-02-07 23:09:00
|
Chris Barker wrote: >oops, sorry about the blank message. > >Paul F Dubois wrote: > > >>{ CC to GvR just to show why I'm +1 on the if-PEP. I liked this in another >> >> > >What the heck is the if-PEP ? > > Pep 308. It's stirring up a bit of a ruckos on CLP as we speak. >>Perhaps knowlegeable persons could comment on the feasibility of coding MA >>(masked arrays) in straight Python and then using Psyco on it? >> >> > >Is there confusion between Psyco and Pyrex? Psyco runs regular old >Python bytecode, and individually compiles little pieces of it as needed >into machine code. AS I understand it, this should make loops where the >inner part is a pretty simple operation very fast. > >However, Psyco is pretty new, and I have no idea how robust and stable, >but certainly not cross platform. As it generates machine code, it needs >to be carefully ported to each hardware platform, and it currently only >works on x86. > > Psyco seems fairly stable these days. However it's one of those things that probably needs to get a larger cabal of users to shake the bugs out of it. I still only use it to play around with because all things that I need speed from I end up doing in Numeric anyway. >Pyrex, on the other hand, is a "Python-like" language that is tranlated >into C, and then the C is compiled. It generates pretty darn platform >independent, so it should be able to be used on all platforms. > > >In regard to your question about MA (and any ther similar project): I >think Psyco has the potential to be the next-generation Python VM, which >will have much higher performance, and therefore greatly reduce the need >to write extensions for the sake of performance. I supsect that it could >do its best with large, multi-dimensional arrays of numbers if there is >a Python native object of such a type. Psycho, however is not ready for >general use on all platforms, so in the forseeable future, there is a >need for other ways to get decent performance. My suggestion follows: > > > >>It could have been written a lot simpler if performance didn't dictate >>trying to leverage off Numeric. In straight Python one can imagine an add, >>for example, that was roughly: >> for k in 0<= k < len(a.data): >> result.mask[k] = a.mask[k] or b.mask[k] >> result.data[k] = a.data[k] if result.mask[k] else a.data[k] + >>b.data[k] >> >> > >This looks like it could be written in Pyrex. If Pyrex were suitably >NumArray aware, then it could work great. > >What this boils down to, in both the Pyrex and Psyco options, is that >having a multi-dimensional homogenous numeric data type that is "Native" >Python is a great idea! With Pyrex and/or Psyco, Numeric3 (NumArray2 ?) >could be implimented by having only the samallest core in C, and then >rest in Python (or Pyrex) > > For Psyco at least you don't need a multidimensional type. You can get good results with flat array, in particular array.array. The number I posted earlier showed comparable performance for Numeric and a multidimensional array type written all in python and psycoized. And since I suspect that I'm the mysterious person who's name Paul couldn't remember, let me say I suspect the MA would be faster in psycoized python than what your doing now as long as a.data was an instance of array.array. However, there are at least three problems. Psyco doesn't fully support the floating point type('f') right now (although it does support most of the various integral types in addition to 'd'). I assume that these masked arrays are multidimensional, so someone would have to build the basic multidimensional machinery around array.array to make them work. I have a good start on this, but I'm not sure when I'm going to have time to work on this more. The biggy though is that psyco only works on x86 machines. What we really need to do is to clone Armin. >While the Psyco option is the rosy future of Python, Pyrex is here now, >and maybe adopting it to handle NumArrays well would be easier than >re-writing a bunch of NumArray in C. > > This sounds like you're conflating two different issues. The first issue is that Numarray is relatively slow for small arrays. Pyrex may indeed be an easier way to attack this although I wouldn't know, I've only looked at it not tried to use it. However, I think that this is something that can and should wait. Once use cases of numarray being _too_ slow for small arrays start piling up, then it will be time to attack the overhead. Premature optimization is the root of all evil and all that. The second issue is how to deal with code that does not vectorize well. Here Pyrex again might help if it were made Numarray aware. However, isn't this what scipy.weave already does? Again, I haven't used weave, but as I understand it, it's another python-c bridge, but one that's more geared toward numerics stuff. -tim |