From: Roy S. <roy...@od...> - 2020-08-28 16:45:45
|
On Fri, 28 Aug 2020, John Peterson wrote: > On Fri, Aug 21, 2020 at 9:51 AM Nikhil Vaidya <nik...@gm...> > wrote: > >> I need to print the sparse matrices (Petsc) and vectors involved in my >> calculations to file using print_matlab(). I have observed that the >> matrices and vectors that are written to the matlab scripts in serial and >> parallel runs are not identical. Is this actually the case or am I missing >> something? >> > > By "not identical" I guess you mean that they don't match in all digits, > but are they at least "close"? It's normal to have floating point > differences between serial and parallel runs, but they should be due to > different orders of operations and therefore of order 10-100 * machine > epsilon. To expand: it's even normal to have floating point differences between different parallel runs. Both MPI reductions and threading pool algorithms typically operate on "I'll begin summing the first data I see ready" for efficiency, and "the first data I see ready" depends on how loaded each CPU and network device is, meaning the reductions are practically done in random order. IIRC there's even a funny bit in the MPI standard where they find a very polite and professional way to rephrase "If you don't like it then why don't you go write your own reduction code!?" And if you're using PETSc? You're probably not even be using the same algorithm in serial vs parallel; the default (for performance / robustness reasons) is Block Jacobi (between processors) + ILU0 (within a processor), so the very definition of your preconditioner depends on your partitioning. This is a much bigger issue than the order of operations problem. Because of it, if you want to be able to do testing on different processor counts (or partitioner settings or solver algorithm choices or preconditioner algorithm choices), you can't safely assert that a "gold" regression test standard will be repeatable to a tolerance any better than your solver tolerance (or even equal to your solver tolerance, thanks to conditioning issues). --- Roy |