From: David L G. <Dav...@no...> - 2006-10-09 17:09:56
|
Tim Hochberg wrote: > David Goldsmith wrote: > >> Tim Hochberg wrote: >> >> >>> I periodically need to perform various linalg operations on a large >>> number of small matrices. The normal numpy approach of stacking up the >>> data into an extra dimension and doing the operations in parallel >>> doesn't work because the linalg functions are only designed to work on >>> one matrix at a time. At first glance, this restriction seems harmless >>> enough since the underlying LAPACK functions are also designed to work >>> one matrix at a time and it seem that there would be little to be gained >>> by parallelizing the Python level code. However, this turns out to be >>> the case. >>> >>> I have some code, originally written for numarray, but I recently ported >>> it over to numpy, that rewrites the linalg function determinant, inverse >>> and solve_linear_equations[1] to operate of 2 or 3D arrays, treating 3D >>> arrays as stacks of 2D arrays. For operating on large numbers of small >>> arrays (for example 1000 2x2 arrays), this approach is over an order of >>> magnitude faster than the obvious map(inverse, threeDArray) approach. >>> >>> So, some questions: >>> >>> 1. Is there any interest in this code for numpy.linalg? I'd be >>> willing to clean it up and work on the other functions in linalg so that >>> they were all consistent. Since these changes should all be backwards >>> compatible (unless your counting on catching an error from one of the >>> linalg functions), I don't see this as a problem to put in after 1.0, so >>> there's really no rush on this. I'm only bringing it up now since I just >>> ported it over to numpy and it's fresh in my mind. >>> >>> >>> >> Say "no" to someone offering to make a Python module more feature rich? >> Blasphemy! :-) >> >> But I am very curious as to the source/basis of the performance >> improvement - did you figure this out? >> >> > Yes and no. Here's what the begining of linalg.det looks lik: > > a = asarray(a) > _assertRank2(a) > _assertSquareness(a) > t, result_t = _commonType(a) > a = _fastCopyAndTranspose(t, a) > n = a.shape[0] > if isComplexType(t): > lapack_routine = lapack_lite.zgetrf > else: > lapack_routine = lapack_lite.dgetrf > # ... > > There's quite a few function calls here, since the actual call to dgetrf > probably takes neglible time for a 2x2 array, the cost of computation > here is dominated by function call overhead, creating the return arrays > and such not. Which is the biggest culprit, I'm not sure; by vectorizing > it, much of that overhead is only incurred once rather than hundreds or > thousands of times, so it all becomes negligible at that point. I didn't > bother to track down the worst offender. > > Ah, that all makes sense (including not bothering to profile the code). DG > -tim > > > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys -- and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Numpy-discussion mailing list > Num...@li... > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > -- HMRD/ORR/NOS/NOAA <http://response.restoration.noaa.gov/emergencyresponse/> |