From: Roy S. <roy...@ic...> - 2011-09-16 20:47:14
|
If I switch ex14's output format from ExodusII to Nemesis, and if I set output_intermediate=true in ex14.in, and if I run on 4 processors, then I get a crash at the first output step. METHOD=dbg shows me *where* the out-of-bounds vector access is, but I don't know the Nemesis_IO_Helper code well enough to figure out what's going wrong. Based on the failure conditions I'd assume it's another code-doesn't-work-with-more-processors-than-coarse-elements bug, but since there are some if cases in the Nemesis which specifically try to test for such bugs, maybe it's something more complex. Anyway, I'm not going to have time to look into it more deeply myself soon, so I figured I'd throw it out to the list in case anyone more familiar with the Exodus/Nemesis code wants to see if it's replicable and/or fixable. --- Roy |
From: John P. <jwp...@gm...> - 2011-09-16 20:55:18
|
On Fri, Sep 16, 2011 at 2:47 PM, Roy Stogner <roy...@ic...> wrote: > > If I switch ex14's output format from ExodusII to Nemesis, and if I > set output_intermediate=true in ex14.in, and if I run on 4 processors, > then I get a crash at the first output step. Parallel mesh on I assume? > METHOD=dbg shows me *where* the out-of-bounds vector access is, but I > don't know the Nemesis_IO_Helper code well enough to figure out what's > going wrong. Based on the failure conditions I'd assume it's another > code-doesn't-work-with-more-processors-than-coarse-elements bug, but > since there are some if cases in the Nemesis which specifically try to > test for such bugs, maybe it's something more complex. > > Anyway, I'm not going to have time to look into it more deeply myself > soon, so I figured I'd throw it out to the list in case anyone more > familiar with the Exodus/Nemesis code wants to see if it's replicable > and/or fixable. I'll see if I can at least replicate it some time this afternoon. -- John |
From: Roy S. <roy...@ic...> - 2011-09-16 20:57:32
|
On Fri, 16 Sep 2011, John Peterson wrote: > On Fri, Sep 16, 2011 at 2:47 PM, Roy Stogner <roy...@ic...> wrote: >> >> If I switch ex14's output format from ExodusII to Nemesis, and if I >> set output_intermediate=true in ex14.in, and if I run on 4 processors, >> then I get a crash at the first output step. > > Parallel mesh on I assume? That was the original reason I switched around the output, but the problem manifests on SerialMesh. > I'll see if I can at least replicate it some time this afternoon. Thanks, --- Roy |
From: John P. <jwp...@gm...> - 2011-09-16 21:05:10
|
On Fri, Sep 16, 2011 at 2:57 PM, Roy Stogner <roy...@ic...> wrote: > > On Fri, 16 Sep 2011, John Peterson wrote: > >> On Fri, Sep 16, 2011 at 2:47 PM, Roy Stogner <roy...@ic...> >> wrote: >>> >>> If I switch ex14's output format from ExodusII to Nemesis, and if I >>> set output_intermediate=true in ex14.in, and if I run on 4 processors, >>> then I get a crash at the first output step. >> >> Parallel mesh on I assume? > > That was the original reason I switched around the output, but the > problem manifests on SerialMesh. OK, I admit I never thought the Nemesis stuff would be used with a SerialMesh and never tested any of it with that. Your patch of r4753 opened up this can of worms...maybe we can "fix" it with 'git revert' ;-) -- John |
From: Roy S. <roy...@ic...> - 2011-09-16 21:14:21
|
On Fri, 16 Sep 2011, John Peterson wrote: > OK, I admit I never thought the Nemesis stuff would be used with a > SerialMesh and never tested any of it with that. > > Your patch of r4753 opened up this can of worms...maybe we can "fix" > it with 'git revert' ;-) Ha! Yeah, if you can't get the same results out of ParallelMesh then put it on the back burner. I didn't expect Nemesis to necessarily work with SerialMesh right away; with that patch I just wanted to remove any obvious obstacles. Although, I suspect any bug we run into on a SerialMesh ought to also be triggerable by serializing a ParallelMesh and then trying to do output before re-deleting remote elements. Some of my earliest ParallelMesh debugging was done with the data kept in such a "like-SerialMesh-but-slower" mode. --- Roy |