From: Roy S. <roy...@ic...> - 2009-11-19 19:36:06
|
On Thu, 19 Nov 2009, Jed Brown wrote: > 1. Interlace fields in vector problems, so e.g. deformation dofs in > elasticity would be ordered [u0,v0,w0,u1,v1,w1,...]. This allows better > L1 reuse (better hardware prefetch because there are fewer pointers to > track, you only pay for the latency of a cache miss once for all 3 > fields rather than once per field). It also allows blocked matrix > formats (BAIJ) which are smaller (one column index per block rather than > one per entry, thus > > sizeof(MatScalar) + sizeof(PetscInt)/(bs*bs) > > per entry instead of > > sizeof(MatScalar) + sizeof(PetscInt) There's something in the PETSc FAQ about: "the AIJ format automatically searches for matching rows and thus still takes advantage of the natural blocks in your matrix to obtain good performance" - does that at least imply that we'd be getting a 1/bs improvement? libMesh would be using SBAIJ by now, except that problems which are sufficiently complex to need the improvement often don't have constant block size. Mixed finite elements with different polynomial degree used to be the only offender, but now per-subdomain variables make things still more complicated. That's a shame, too. One application I'm working on right now is going to end up with roughly a half dozen variables on a big subdomain and one or two dozen on a much bigger subdomain. Storing block structure alone would have been nice. --- Roy |