From: Renato P. <re...@gm...> - 2018-06-28 20:32:26
|
> By duplicated, you just mean the multiple DoFs "at" (in a Lagrangian > evaluation sense) each node? Correct. > And integration points are only on sides, so you don't have to worry > about edges/nodes where more than two elements meet? Correct. > 1) You're not using a solution vector, you're using a vector with > different indexing that's merely calculated *from* a solution vector, > so since you have that expensive calculation/transformation anyway > then you aren't stuck using the solution vector directly for > efficiency purposes. Correct. > 2) Plus, it sounds like you don't *need* arbitrary T here - you're > looking at jumps in values of type Number, so you can still use > vectors of type Number, right? I need to identify where they happen. I can have a class of my own, a map indexed by the aperture, a pair - I need it indexed by a Number, that is true. > The good news is that libMesh does have a class, Parallel::Sort, which > can do what you want (it does a bin sort on arbitrary data, so you > could create e.g. a local vector<pair<Number,original_index_type>>, > sort by that, do another parallel communication step afterwards, and > end up with exactly the information you want in a scalable way. That is awesome. > The bad news is that this class isn't perfectly documented, is only > used in one place in the library, and the original developer isn't > currently active, so it may be quite difficult to figure out. That is not so awesome. > If I were you I'd start by doing things the slow easy way: after each > processor makes its own local vector, just allgather those into a > giant serial vector and sort that on each rank. *Then* try to use the > Parallel::Sort interface, once you have something to use for debugging > and a fallback while you do. This is the serial stuff I was talking about. I have other trouble ahead, I wouldn't say performance is the *major* issue right now. It seems that we are converging.... which takes me to a last question: How can I do this (I am not a MPI guy at all, so please be patient ...): > ... after each processor makes its own local vector, just allgather those into a giant serial vector ... Thanks! Renato |