From: Francesc A. <fa...@ca...> - 2006-06-13 17:48:37
|
Ei, numexpr seems to be back, wow! :-D A Dimarts 13 Juny 2006 18:56, Tim Hochberg va escriure: > I've finally got around to looking at numexpr again. Specifically, I'm > looking at Francesc Altet's numexpr-0.2, with the idea of harmonizing > the two versions. Let me go through his list of enhancements and comment > (my comments are dedented): Well, as David already said, he committed most of my additions some days=20 ago :-) > - Enhanced performance for strided and unaligned data, specially for > lightweigth computations (e.g. 'a>10'). With this and the addition of > the boolean type, we can get up to 2x better times than previous > versions. Also, most of the supported computations goes faster than > with numpy or numarray, even the simplest one. > > Francesc, if you're out there, can you briefly describe what this > support consists of? It's been long enough since I was messing with this > that it's going to take me a while to untangle NumExpr_run, where I > expect it's lurking, so any hints would be appreciated. This is easy. When dealing with strided or unaligned vectors, instead of=20 copying them completely to well-behaved arrays, they are copied only when t= he=20 virtual machine needs the appropriate blocks. With this, there is no need t= o=20 write the well-behaved array back into main memory, which can bring an=20 important bottleneck, specially when dealing with large arrays. This allows= a=20 better use of the processor caches because data is catched and used only wh= en=20 the VM needs it. Also, I see that David has added support for byteswapped=20 arrays, which is great!=20 > - Support for both numpy and numarray (use the flag --force-numarray > in setup.py). > > At first glance this looks like it doesn't make things to messy, so I'm > in favor of incorporating this. Yeah. I thing you are right. It's only that we need this for our own things= :) > - Add types for int16, int64 (in 32-bit platforms), float32, > complex64 (simple prec.) > > I have some specific ideas about how this should be accomplished. > Basically, I don't think we want to support every type in the same way, > since this is going to make the case statement blow up to an enormous > size. This may slow things down and at a minimum it will make things > less comprehensible. My thinking is that we only add casts for the extra > types and do the computations at high precision. Thus adding two int16 > numbers compiles to two OP_CAST_Ffs followed by an OP_ADD_FFF, and then > a OP_CAST_fF. The details are left as an excercise to the reader ;-). > So, adding int16, float32, complex64 should only require the addition of > 6 casting opcodes plus appropriate modifications to the compiler. > > For large arrays, this should have most of the benfits of giving each > type it's own opcode, since the memory bandwidth is still small, while > keeping the interpreter relatively simple. Yes, I like the idea as well. > Unfortunately, int64 doesn't fit under this scheme; is it used enough to > matter? I hate pile a whole pile of new opcodes on for something that's > rarely used. Uh, I'm afraid that yes. In PyTables, int64, while being a bit bizarre for= =20 some users (specially in 32-bit platforms), is a type with the same rights= =20 than the others and we would like to give support for it in numexpr. In fac= t,=20 Ivan Vilata already has implemented this suport in our local copy of numexp= r,=20 so perhaps (I say perhaps because we are in the middle of a big project now= =20 and are a bit scarce of time resources) we can provide the patch against th= e=20 latest version of David for your consideration. With this we can solve the= =20 problem with int64 support in 32-bit platforms (although addmittedly, the V= M=20 gets a bit more complicated, I really think that this is worth the effort). Cheers, =2D-=20 >0,0< Francesc Altet =A0 =A0 http://www.carabos.com/ V V C=E1rabos Coop. V. =A0=A0Enjoy Data "-" |