From: David G. <Dav...@no...> - 2006-10-09 14:32:52
|
Tim Hochberg wrote: > I periodically need to perform various linalg operations on a large > number of small matrices. The normal numpy approach of stacking up the > data into an extra dimension and doing the operations in parallel > doesn't work because the linalg functions are only designed to work on > one matrix at a time. At first glance, this restriction seems harmless > enough since the underlying LAPACK functions are also designed to work > one matrix at a time and it seem that there would be little to be gained > by parallelizing the Python level code. However, this turns out to be > the case. > > I have some code, originally written for numarray, but I recently ported > it over to numpy, that rewrites the linalg function determinant, inverse > and solve_linear_equations[1] to operate of 2 or 3D arrays, treating 3D > arrays as stacks of 2D arrays. For operating on large numbers of small > arrays (for example 1000 2x2 arrays), this approach is over an order of > magnitude faster than the obvious map(inverse, threeDArray) approach. > > So, some questions: > > 1. Is there any interest in this code for numpy.linalg? I'd be > willing to clean it up and work on the other functions in linalg so that > they were all consistent. Since these changes should all be backwards > compatible (unless your counting on catching an error from one of the > linalg functions), I don't see this as a problem to put in after 1.0, so > there's really no rush on this. I'm only bringing it up now since I just > ported it over to numpy and it's fresh in my mind. > Say "no" to someone offering to make a Python module more feature rich? Blasphemy! :-) But I am very curious as to the source/basis of the performance improvement - did you figure this out? DG > 2. If 1., what to do about norm? I think the other functions in > linalg stack naturally, but norm, since it is meant to work on both > vectors and matrices is problematic. Norm already seems a bit > problematic in that, for some values of 'ord', norm can return different > values for a shape [N] and a shape [1,N] array containing the same values: > > >>> linalg.norm(a[:1], 1) > 0.78120069791102054 > >>> linalg.norm(a[0], 1) > 1.4588102446677758 > > My inclination is to introduce stackable 1 and 2D version s of norm, > vnorm and mnorm for lack of better names. Ideally we'd deprecate the use > of norm for ord!=None (Froebenius norm) or at least in cases where the > result depends on the rank [1,N] versus [N]. > > 3. A similar issue arises with dot. One would like to be able to do > stacked matrix products, vector products and matrix-vector products. If > we are making the rest of linalg stackable, linalg would be a sensible > place for a stackable version of dot to live. However, it's immediately > clear how to do this without introducing a whole pile (4) of stacking > dot function to handle the various cases, plus broadcasting, cleanly. > This would require some more thought. Thoughts? > > -tim > > > [1] Those are actually the numarray names, which the current code still > uses, the numpy names are 'det', 'inv' and 'solve' > > [2] Treatment of stacking insolve linear equations is a bit more > complicated, but I'll ignore that for now. > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys -- and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Numpy-discussion mailing list > Num...@li... > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > |