From: A. M. A. <per...@gm...> - 2006-10-04 03:16:18
|
On 03/10/06, Tim Hochberg <tim...@ie...> wrote: > I had an idea regarding which axis to operate on first. Rather than > operate on strictly the longest axis or strictly the innermost axis, a > hybrid approach could be used. We would operate on the longest axis that > would not result in the inner loop overflowing the cache. The idea is > minimize the loop overhead as we do now by choosing the largest axis, > while at the same time attempting to maintain cache friendliness. If elements are smaller than cache lines (usually at least eight bytes, I think), we might end up pulling many times as many bytes into the cache as we actually need if we don't loop along axes with small strides first. Can BLAS be used for some of these operations? A. M. Archibald A. M. Archibald |