From: Todd M. <jm...@st...> - 2004-07-01 16:58:04
|
On Wed, 2004-06-30 at 19:00, Tim Hochberg wrote: > By this do you mean the "#if PY_VERSION_HEX >= 0x02030000 " that is > wrapped around _ndarray_item? If so, I believe that it *is* getting > compiled, it's just never getting called. > > What I think is happening is that the class NumArray inherits its > sq_item from PyClassObject. In particular, I think it picks up > instance_item from Objects/classobject.c. This appears to be fairly > expensive and, I think, ends up calling tp_as_mapping->mp_subscript. > Thus, _ndarray's sq_item slot never gets called. All of this is pretty > iffy since I don't know this stuff very well and I didn't trace it all > the way through. However, it explains what I've seen thus far. > > This is why I ended up using the horrible hack. I'm resetting NumArray's > sq_item to point to _ndarray_item instead of instance_item. I believe > that access at the python level goes through mp_subscrip, so it > shouldn't be affected, and only objects at the C level should notice and > they should just get the faster sq_item. You, will notice that there are > an awful lot of I thinks in the above paragraphs though... Ugh... Thanks for explaining this. > >>I then optimized _ndarray_item (code > >>at end). This halved the execution time of my arbitrary benchmark. This > >>trick may have horrible, unforseen consequences so use at your own risk. > >> > >> > > > >Right now the sq_item hack strikes me as somewhere between completely > >unnecessary and too scary for me! Maybe if python-dev blessed it. > > > > > Yes, very scary. And it occurs to me that it will break subclasses of > NumArray if they override __getitem__. When these subclasses are > accessed from C they will see nd_array's sq_item instead of the > overridden getitem. However, I think I also know how to fix it. But > it does point out that it is very dangerous and there are probably dark > corners of which I'm unaware. Asking on Python-List or PyDev would > probably be a good idea. > > The nonscary, but painful, fix would to rewrite NumArray in C. Non-scary to whom? > >This optimization looks good to me. > > > > > Unfortunately, I don't think the optimization to sq_item will affect > much since NumArray appears to override it with > > >>Finally I commented out the __del__ method numarraycore. This resulted > >>in an additional speedup of 64% for a total speed up of 240%. Still not > >>close to 10x, but a large improvement. However, this is obviously not > >>viable for real use, but it's enough of a speedup that I'll try to see > >>if there's anyway to move the shadow stuff back to tp_dealloc. > >> > >> > > > >FYI, the issue with tp_dealloc may have to do with which mode Python is > >compiled in, --with-pydebug, or not. One approach which seems like it > >ought to work (just thought of this!) is to add an extra reference in C > >to the NumArray instance __dict__ (from NumArray.__init__ and stashed > >via a new attribute in the PyArrayObject struct) and then DECREF it as > >the last part of the tp_dealloc. > > > > > That sounds promising. I looked at this some, and while INCREFing __dict__ maybe the right idea, I forgot that there *is no* Python NumArray.__init__ anymore. So the INCREF needs to be done in C without doing any getattrs; this seems to mean calling a private _PyObject_GetDictPtr function to get a pointer to the __dict__ slot which can be dereferenced to get the __dict__. > [SNIP] > > > > >Well, be picking out your beer. > > > > > I was only about half right, so I'm not sure I qualify... We could always reduce your wages to a 12-pack... Todd |