From: Jed Brown <jed@59...>  20091119 20:02:22

Roy Stogner wrote: > > On Thu, 19 Nov 2009, Jed Brown wrote: > >> 1. Interlace fields in vector problems, so e.g. deformation dofs in >> elasticity would be ordered [u0,v0,w0,u1,v1,w1,...]. This allows better >> L1 reuse (better hardware prefetch because there are fewer pointers to >> track, you only pay for the latency of a cache miss once for all 3 >> fields rather than once per field). It also allows blocked matrix >> formats (BAIJ) which are smaller (one column index per block rather than >> one per entry, thus >> >> sizeof(MatScalar) + sizeof(PetscInt)/(bs*bs) >> >> per entry instead of >> >> sizeof(MatScalar) + sizeof(PetscInt) > > There's something in the PETSc FAQ about: > > "the AIJ format automatically searches for matching rows and thus > still takes advantage of the natural blocks in your matrix to obtain > good performance"  does that at least imply that we'd be getting a > 1/bs improvement? These are the inodes, you benefit from them any time you have consecutive rows with identical nonzero pattern (and it's worth occasionally storing zeros so that this is more common). > libMesh would be using SBAIJ by now, except that problems which are > sufficiently complex to need the improvement often don't have constant > block size. Mixed finite elements with different polynomial degree > used to be the only offender, but now persubdomain variables make > things still more complicated. That could be tricky but you might be happy with just computing a lowbandwidth ordering and assembling into that, e.g. MatSetLocalToGlobalMapping, then insert with MatSetValuesLocal. The ordering should cluster rows with identical sparsity pattern so you would benefit from inodes. Note that incomplete factorization for mixed FEM is very sensitive to ordering, therefore it can be desirable for, e.g. pressure unknowns to always follow velocity unknowns rather than be interlaced where they minimize bandwidth. Blackbox preconditioners are really bad for mixed FEM (mostly due to the resulting system being indefinite) so I don't assemble those Jacobians, just the blocks that will be needed for preconditioning. The only viable alternative is to use exotic DD (Schwarz and Schur) schemes where you carefully choose a coarse level and subdomains so that they remove lowenergy modes and satisfy the necessary divergence theorem, then use a direct subdomain solver and redundant or approximate coarse level solver. Such schemes have yet to be made generic (they are tied to a specific discretization of a specific physics), though I have some ideas for implementing BDDC/FETIDP in a somewhat more flexible way than I've yet seen. (Alas, I have a thesis to finish so I won't be working on it soon). > That's a shame, too. One application I'm working on right now is > going to end up with roughly a half dozen variables on a big subdomain > and one or two dozen on a much bigger subdomain. Storing block > structure alone would have been nice. Sounds like a lot of storage. Are the blocks dense? Have you considered preconditioning with a sparser representation? Jed 