|
From: Travis O. <oli...@ie...> - 2006-01-20 16:41:09
|
Mark Heslep wrote: > Travis Oliphant wrote: > >> This is actually a bit surprising that opencv can create and fill so >> quickly. Perhaps they are using optimized SSE functions for the >> Intel platform, or something? >> -Travis >> > Ah, sorry, Im an unintentional fraud. Yes I have Intel's optimization > library IPP turned on and had forgotten about it. So one more time: > > With IPP on as before. UseOptimized = # of Cv functions available w/ > IPP > >> python -m timeit -s "import opencv.cv as cv; print >> cv.cvUseOptimized(1); im =cv.cvCreateImage(cv.cvSize(1000,1000), 8, >> 1)" "cv.cvSet( im, cv.cvRealScalar( 7 ) )" >> 305 >> 305 >> 305 >> 305 >> 305 >> 100 loops, best of 3: 2.24 msec per loop > > > And without: > >> python -m timeit -s "import opencv.cv as cv; print >> cv.cvUseOptimized(0); im =cv.cvCreateImage(cv.cvSize(1000,1000), 8, >> 1)" "cv.cvSet( im, cv.cvRealScalar( 7 ) )" >> 0 >> 0 >> 0 >> 0 >> 0 >> 100 loops, best of 3: 6.94 msec per loop > > > So IPP gives me 3X, which leads me to ask about plans for IPP / SSE > for NumPy, no offense intended to non Intel users. I believe I recall > some post that auto code generation in NumArray was the road block? There was some talk of using liboil for this (similar to what _dotblas does). There could definitely be some gains. I don't see any road block other than time and effort.... With my own tests of a ctypes-wrapped function that just mallocs memory and fills it, I put numpy at about 3x slower than that function. The scalar fill function of numpy could definitely be made faster. Right now, it's still pretty generic. -Travis |