From: Roy S. <roy...@ic...> - 2010-02-23 22:00:28
|
On Tue, 23 Feb 2010, Andre Luis Rossa wrote: > I've run libMesh with METHOD=dbg and I've found out what's wrong when the > program tries to access the system's additional vector using more than one > processor (unfortunetly I'm not sure how to fix it, yet). > > The out of range error occurs because, apparently, the additional vector is > dimentioned only for the local (processor domain) nodes. > > I've generated a very small simple mesh (2x2x1 hexa 8). I've executed it with > only 2 processors. > > The error occurs when processor 1 try to compute something for a element > which nodes are over the interface of the two processors domain (i.e., a node > that belongs to both domains). The dof map returns a global index number that > belongs to the processor one's index range. > > As I've verified, the system's "current_solution" is dimentioned for the > global number nodes so it doesn't present this problem. > > So, is there a way to avoid this index error? > Is there something like a "update()" (as the solution vector) method for this > case? > Or maybe should it be realocated for the global number of nodes before using > it? That's right. System::add_vector() by default creates a vector of type PARALLEL. You'll want to reinitialize that vector as GHOSTED or SERIAL if you need each processor to be able to directly access ghost DoF values or all values, respectively. --- Roy |
From: Roy S. <roy...@ic...> - 2010-02-26 23:20:06
|
On Fri, 26 Feb 2010, Andre Luis Rossa wrote: >> You'll want to reinitialize that vector as GHOSTED or >> SERIAL if you need each processor to be able to directly access ghost >> DoF values or all values, respectively. > > Roy, > I haven't found a method to reinitialize the additional vector as you have > said me to do. > I've tried NumericVector::init(const unsigned int, const unsigned int, > const bool) or tried NumericVector::init(const unsigned int, const bool) > passing the global mesh numbers but it haven't worked. That should have worked to give you a SERIAL vector. But bear in mind that no matter what kind of vector you build, you'll have to handle synchronizing it across processors yourself. Which is currently a bit of a hack in libMesh - see the "temporary parallel vector" code in system_projection.C for an example. --- Roy |
From: Roy S. <roy...@ic...> - 2010-04-14 14:01:20
|
On Tue, 13 Apr 2010, Boyce Griffith wrote: > system.solution->init(system.solution->size(), > system.solution->local_size(), > ghost_dofs, GHOSTED); Why create your own ghost_dofs variable rather than just passing in system.get_dof_map().get_send_list()? How do you construct ghost_dofs? > [0] /Users/griffith/sfw/libmesh/include/numerics/petsc_vector.h, line > 787, compiled Apr 13 2010 at 16:41:48 > > libmesh_assert(n_local == 0 || n_local == n || !ghost.empty()); > > If I comment out this assertion, everything seems to work OK. > > Am I doing something wrong here? Possibly. If you're passing in an empty ghost_dofs in a non-serial computation, that's almost certainly a mistake - under what conditions, other than the degenerate cases in the assertion, does one processor really need *no* data from any other? It could also be we've got an overzealous assertion, if there's a possibility I missed. --- Roy |
From: Boyce G. <gri...@ci...> - 2010-04-14 14:13:09
|
On 4/14/10 10:01 AM, Roy Stogner wrote: > > On Tue, 13 Apr 2010, Boyce Griffith wrote: > >> system.solution->init(system.solution->size(), >> system.solution->local_size(), >> ghost_dofs, GHOSTED); > > Why create your own ghost_dofs variable rather than just passing in > system.get_dof_map().get_send_list()? How do you construct > ghost_dofs? I am not sure I understand exactly which DOFs wind up in get_send_list(). >> [0] /Users/griffith/sfw/libmesh/include/numerics/petsc_vector.h, line >> 787, compiled Apr 13 2010 at 16:41:48 >> >> libmesh_assert(n_local == 0 || n_local == n || !ghost.empty()); >> >> If I comment out this assertion, everything seems to work OK. >> >> Am I doing something wrong here? > > Possibly. If you're passing in an empty ghost_dofs in a non-serial > computation, that's almost certainly a mistake - under what > conditions, other than the degenerate cases in the assertion, does one > processor really need *no* data from any other? > > It could also be we've got an overzealous assertion, if there's a > possibility I missed. What I am trying to do is to compute directly the interpolation of the solution at the quadrature points. I am probably being a moron, but what I'm doing is something like: NumericVector<Real> V; MeshBase::const_element_iterator el = mesh.active_local_elements_begin(); const MeshBase::const_element_iterator end_el = mesh.active_local_elements_end(); for ( ; el != end_el; ++el) { const Elem* const elem = *el; fe->reinit(elem); dof_map.dof_indices(elem, dof_indices); for (unsigned int qp = 0; qp < qrule.n_points(); ++qp) { double V_qp = 0.0; for (unsigned int i = 0; i < phi.size(); ++i) { V_qp += V(dof_indices[i])*phi[i][qp]; } // do stuff with V_qp } } For this to work, obviously I need to have access to the nodal values at each node in each local element. So what I do is to setup the ghost DOFs to be whatever global DOFs are needed on each processor which are not already local to that processor. If all of the global DOFs associated with all of the local elements are local to the processor, then the vector ghost DOFs associated with that process will be empty. This seems like a plausible use case to me, but results in this assertion failure. Is there a better way to go about this? Thanks, -- Boyce |
From: Derek G. <fri...@gm...> - 2010-04-14 14:03:19
|
Wait... all you want is a ghosted solution vector? I thought that should be the default when using Petsc now... can anyone correct me? Derek On Apr 13, 2010, at 7:05 PM, Boyce Griffith wrote: > Hi, Folks -- > > I am trying to setup a ghosted solution vector for a LinearImplicitSystem. > > My incomplete understanding of how this works is that, on each > processor, I make a list of the off-processor global DOFs which are > needed, and then I do something like > > system.solution->init(system.solution->size(), > system.solution->local_size(), > ghost_dofs, GHOSTED); > > When I do this, an assertion failure results: > > [0] /Users/griffith/sfw/libmesh/include/numerics/petsc_vector.h, line > 787, compiled Apr 13 2010 at 16:41:48 > > which corresponds to > > // If the mesh is disjoint, the following assertion will fail. > // If the mesh is not disjoint, every processor will either have > // all the dofs, none of the dofs, or some non-zero dofs at the > // boundary between processors. > libmesh_assert(n_local == 0 || n_local == n || !ghost.empty()); > > If I comment out this assertion, everything seems to work OK. > > Am I doing something wrong here? > > Thanks, > > -- Boyce > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Libmesh-users mailing list > Lib...@li... > https://lists.sourceforge.net/lists/listinfo/libmesh-users |
From: Boyce G. <gri...@ci...> - 2010-04-14 14:17:29
|
Doesn't appear to be... -- Boyce On 4/14/10 10:02 AM, Derek Gaston wrote: > Wait... all you want is a ghosted solution vector? I thought that should be the default when using Petsc now... can anyone correct me? > > Derek > > On Apr 13, 2010, at 7:05 PM, Boyce Griffith wrote: > >> Hi, Folks -- >> >> I am trying to setup a ghosted solution vector for a LinearImplicitSystem. >> >> My incomplete understanding of how this works is that, on each >> processor, I make a list of the off-processor global DOFs which are >> needed, and then I do something like >> >> system.solution->init(system.solution->size(), >> system.solution->local_size(), >> ghost_dofs, GHOSTED); >> >> When I do this, an assertion failure results: >> >> [0] /Users/griffith/sfw/libmesh/include/numerics/petsc_vector.h, line >> 787, compiled Apr 13 2010 at 16:41:48 >> >> which corresponds to >> >> // If the mesh is disjoint, the following assertion will fail. >> // If the mesh is not disjoint, every processor will either have >> // all the dofs, none of the dofs, or some non-zero dofs at the >> // boundary between processors. >> libmesh_assert(n_local == 0 || n_local == n || !ghost.empty()); >> >> If I comment out this assertion, everything seems to work OK. >> >> Am I doing something wrong here? >> >> Thanks, >> >> -- Boyce >> >> ------------------------------------------------------------------------------ >> Download Intel® Parallel Studio Eval >> Try the new software tools for yourself. Speed compiling, find bugs >> proactively, and fine-tune applications for parallel performance. >> See why Intel Parallel Studio got high marks during beta. >> http://p.sf.net/sfu/intel-sw-dev >> _______________________________________________ >> Libmesh-users mailing list >> Lib...@li... >> https://lists.sourceforge.net/lists/listinfo/libmesh-users > > |
From: Tim K. <tim...@ce...> - 2010-04-14 14:32:54
|
Dear Derek, On Wed, 14 Apr 2010, Derek Gaston wrote: > Wait... all you want is a ghosted solution vector? I thought that > should be the default when using Petsc now... can anyone correct me? No, as far as I remember, only System::current_local_solution is ghosted by default, not System::solution. The question is, why should anybody want a ghosted System::solution if there is a ghosted copy of that available? As far as I remember, Roy planned to remove this vector duplication (which is often enough confusing for users) some day. Best Regards, Tim -- Dr. Tim Kroeger tim...@me... Phone +49-421-218-7710 tim...@ce... Fax +49-421-218-4236 www.mevis.fraunhofer.de/~tim Fraunhofer MEVIS, Institute for Medical Image Computing Universitaetsallee 29, 28359 Bremen, Germany |
From: Derek G. <fri...@gm...> - 2010-04-14 14:21:51
|
On Apr 14, 2010, at 8:14 AM, Tim Kroeger wrote: > No, as far as I remember, only System::current_local_solution is ghosted by default, not System::solution. Ah yes... good point. I was remembering when current_local_solution got turned into a ghosted vector. > The question is, why should anybody want a ghosted System::solution if there is a ghosted copy of that available? Hah - completely agree. > As far as I remember, Roy planned to remove this vector duplication (which is often enough confusing for users) some day. Yes... even some libMesh veterans still screw it up (like I just did!) ;-) Hopefully someday... but that would mean someone would need to make ghosted vectors work for all the other linear / nonlinear solver packages.... and so far that hasn't happened. Derek |
From: Tim K. <tim...@ce...> - 2010-04-14 14:32:15
|
Dear Derek, On Wed, 14 Apr 2010, Derek Gaston wrote: > On Apr 14, 2010, at 8:14 AM, Tim Kroeger wrote: > >> As far as I remember, Roy planned to remove this vector duplication >> (which is often enough confusing for users) some day. > > Hopefully someday... but that would mean someone would need to make > ghosted vectors work for all the other linear / nonlinear solver > packages.... Not necessarily. For those solver packages, System::current_local_solution is a serial vector right now, isn't it? So when the duplication is being removed, then the remaining vector (what ever it's name would be) could just as well be serial vector. That would not blow up anybody's code since the memory required for a serial vector, although it can be much, is always less than a serial vector *plus* a parallel vector. The more difficult question is, how to perform things without API changes, i.e. without requiring users to change their code. Best Regards, Tim -- Dr. Tim Kroeger tim...@me... Phone +49-421-218-7710 tim...@ce... Fax +49-421-218-4236 www.mevis.fraunhofer.de/~tim Fraunhofer MEVIS, Institute for Medical Image Computing Universitaetsallee 29, 28359 Bremen, Germany |
From: Roy S. <roy...@ic...> - 2010-04-14 14:36:54
|
On Wed, 14 Apr 2010, Tim Kroeger wrote: > Not necessarily. For those solver packages, > System::current_local_solution is a serial vector right now, isn't it? > So when the duplication is being removed, then the remaining vector > (what ever it's name would be) could just as well be serial vector. > That would not blow up anybody's code since the memory required for a > serial vector, although it can be much, is always less than a serial > vector *plus* a parallel vector. Memory-wise, removing the duplication right now could be a good thing, but we'd have to figure out how to do it right CPU-wise. We don't want Trilinos to think it needs to *solve* for a serial vector, for example. > The more difficult question is, how to perform things without API > changes, i.e. without requiring users to change their code. This too. --- Roy |
From: Roy S. <roy...@ic...> - 2010-04-14 14:45:46
|
On Wed, 14 Apr 2010, Boyce Griffith wrote: > I am not sure I understand exactly which DOFs wind up in get_send_list(). Non-local DoFs on local elements, non-local DoFs on elements which neighbor local elements or meet local elements at at least one node, and non-local DoFs which are dependencies (through constraint equations) of any local or other ghost dof. In other words, a superset of the ghost_dofs you're creating, which only includes the first category. In our case that assertion is fine; in yours it should always fail on processor 0, and could potentially fail on any processor except the highest ranked. So our assertion is overzealous... but on the other hand, although what you're doing isn't a bug, it's probably unnecessary - could you just use current_local_solution? That should have a superset of the data you need. --- Roy |
From: Boyce G. <gri...@ci...> - 2010-04-14 14:53:34
|
On 4/14/10 10:45 AM, Roy Stogner wrote: > > On Wed, 14 Apr 2010, Boyce Griffith wrote: > >> I am not sure I understand exactly which DOFs wind up in get_send_list(). > > Non-local DoFs on local elements, non-local DoFs on elements which > neighbor local elements or meet local elements at at least one node, > and non-local DoFs which are dependencies (through constraint > equations) of any local or other ghost dof. > > In other words, a superset of the ghost_dofs you're creating, which > only includes the first category. In our case that assertion is fine; > in yours it should always fail on processor 0, and could potentially > fail on any processor except the highest ranked. > > So our assertion is overzealous... but on the other hand, although > what you're doing isn't a bug, it's probably unnecessary - could you > just use current_local_solution? That should have a superset of the > data you need. Sure, I can probably just use current_local_solution. Is current_local_solution somehow synchronized "automatically" with solution, or am I responsible for that? Thanks, -- Boyce |
From: Roy S. <roy...@ic...> - 2010-04-15 22:42:44
|
On Wed, 14 Apr 2010, Roy Stogner wrote: > > On Wed, 14 Apr 2010, Boyce Griffith wrote: > >> I am not sure I understand exactly which DOFs wind up in get_send_list(). > > Non-local DoFs on local elements, non-local DoFs on elements which > neighbor local elements or meet local elements at at least one node, > and non-local DoFs which are dependencies (through constraint > equations) of any local or other ghost dof. > > In other words, a superset of the ghost_dofs you're creating, which > only includes the first category. In our case that assertion is fine; > in yours it should always fail on processor 0, and could potentially > fail on any processor except the highest ranked. > > So our assertion is overzealous... And I'm removing it. We're actually going to want to use a vector with a similarly restricted ghost dof send_list in System::project_vector. --- Roy |
From: Boyce G. <gri...@ci...> - 2010-04-14 15:06:15
|
On 4/14/10 10:45 AM, Roy Stogner wrote: > > On Wed, 14 Apr 2010, Boyce Griffith wrote: > >> I am not sure I understand exactly which DOFs wind up in get_send_list(). > > Non-local DoFs on local elements, non-local DoFs on elements which > neighbor local elements or meet local elements at at least one node, > and non-local DoFs which are dependencies (through constraint > equations) of any local or other ghost dof. > > In other words, a superset of the ghost_dofs you're creating, which > only includes the first category. In our case that assertion is fine; > in yours it should always fail on processor 0, and could potentially > fail on any processor except the highest ranked. > > So our assertion is overzealous... but on the other hand, although > what you're doing isn't a bug, it's probably unnecessary - could you > just use current_local_solution? That should have a superset of the > data you need. --- > Roy I think I'm starting to understand a little better how this works... If I collect the needed ghost DOFs and use solution->localize(system->current_local_solution, my_ghost_dofs); it looks like only the requested ghost nodes are communicated. Is that correct? Or should I just stick with using solution->localize(system->current_local_solution); ? Thanks! -- Boyce |
From: David K. <dknez@MIT.EDU> - 2010-04-14 15:13:08
|
Boyce Griffith wrote: > > On 4/14/10 10:45 AM, Roy Stogner wrote: >> On Wed, 14 Apr 2010, Boyce Griffith wrote: >> >>> I am not sure I understand exactly which DOFs wind up in get_send_list(). >> Non-local DoFs on local elements, non-local DoFs on elements which >> neighbor local elements or meet local elements at at least one node, >> and non-local DoFs which are dependencies (through constraint >> equations) of any local or other ghost dof. >> >> In other words, a superset of the ghost_dofs you're creating, which >> only includes the first category. In our case that assertion is fine; >> in yours it should always fail on processor 0, and could potentially >> fail on any processor except the highest ranked. >> >> So our assertion is overzealous... but on the other hand, although >> what you're doing isn't a bug, it's probably unnecessary - could you >> just use current_local_solution? That should have a superset of the >> data you need. --- >> Roy > > I think I'm starting to understand a little better how this works... > > If I collect the needed ghost DOFs and use > > solution->localize(system->current_local_solution, my_ghost_dofs); > > it looks like only the requested ghost nodes are communicated. Is that > correct? Or should I just stick with using > > solution->localize(system->current_local_solution); > > ? current_local_solution is updated just by calling system.update(). This is done automatically when you do system.solve(), so you typically can just use current_local_solution without having to do anything special. Dave |
From: Tim K. <tim...@ce...> - 2010-04-14 15:17:30
|
Dear Boyce, On Wed, 14 Apr 2010, Boyce Griffith wrote: > I think I'm starting to understand a little better how this works... > > If I collect the needed ghost DOFs and use > > solution->localize(system->current_local_solution, my_ghost_dofs); > > it looks like only the requested ghost nodes are communicated. Is that > correct? Or should I just stick with using > > solution->localize(system->current_local_solution); A lot of synchronization is done automatically, but I am not quite sure when. As a hint, have a look at what System::update() does and where this is called. I would not be surprised if you didn't have to care about this at all, but this depends of course on what you are doing exactly and also where you are doing that. Also, I think that that worst thing that can happen is that you have to call System::update() manually at some point in your code. Best Regards, Tim -- Dr. Tim Kroeger tim...@me... Phone +49-421-218-7710 tim...@ce... Fax +49-421-218-4236 www.mevis.fraunhofer.de/~tim Fraunhofer MEVIS, Institute for Medical Image Computing Universitaetsallee 29, 28359 Bremen, Germany |
From: Boyce G. <gri...@ci...> - 2010-04-14 15:28:00
|
On 4/14/10 11:17 AM, Tim Kroeger wrote: > Dear Boyce, > > On Wed, 14 Apr 2010, Boyce Griffith wrote: > >> I think I'm starting to understand a little better how this works... >> >> If I collect the needed ghost DOFs and use >> >> solution->localize(system->current_local_solution, my_ghost_dofs); >> >> it looks like only the requested ghost nodes are communicated. Is that >> correct? Or should I just stick with using >> >> solution->localize(system->current_local_solution); > > A lot of synchronization is done automatically, but I am not quite sure > when. As a hint, have a look at what System::update() does and where > this is called. I would not be surprised if you didn't have to care > about this at all, but this depends of course on what you are doing > exactly and also where you are doing that. Also, I think that that worst > thing that can happen is that you have to call System::update() manually > at some point in your code. Thanks! I think I'm starting to understand how this stuff all fits together. Best, -- Boyce |
From: Boyce G. <gri...@ci...> - 2010-04-15 01:38:31
|
Hi, Folks -- I am trying to put together a libMesh-based AMR solver which has some variables I would like to treat as piecewise constants. I think that the way to do this in libMesh is to set the order to be CONSTANT and the FE type to be MONOMIAL. When I do this, I get segfaults in EquationSystems::reinit() when libMesh and my application code are compiled with optimizations enabled. Similar segfaults do not seem to occur when I compile in debugging mode even though I am using the same PETSc build in both cases. The error almost always occurs on process 0 in VecScatterBegin: (gdb) where #0 0x93ef144e in __semwait_signal () #1 0x93ef12cf in nanosleep$UNIX2003 () #2 0x93f46e71 in sleep$UNIX2003 () #3 0x00410efb in PetscSleep () #4 0x003dc732 in PetscAttachDebugger () #5 0x003dd2b1 in PetscAttachDebuggerErrorHandler () #6 0x003de1ba in PetscError () #7 0x003e2658 in PetscDefaultSignalHandler () #8 0x003e2844 in PetscSignalHandler_Private () #9 <signal handler called> #10 0x0036b625 in VecScatterBegin_1 () #11 0x00353609 in VecScatterBegin () #12 0x00f0f040 in PetscVector<double>::localize () #13 0x00ff74e9 in System::project_vector () #14 0x00ff7aec in System::project_vector () #15 0x00fd4c73 in System::restrict_vectors () #16 0x00fd1c61 in System::prolong_vectors () #17 0x00f931ba in EquationSystems::reinit () #18 0x0000ce92 in main () #19 0x00002a16 in start () The point in the computation at which the segfault occurs seems to depend on the number of processors. If I switch the interpolation order to first, I also get segfaults in reinit, but at different points in the computation (generally later in the computation). I think it also happens with second, although I am having trouble reproducing the error in that case right now... Also, in a few runs, instead of a segfault, I get: [0]PETSC ERROR: VecDestroy_Seq() line 628 in src/vec/vec/impls/seq/bvec2.c Likely memory corruption in heap However, this is hard to reproduce. I forgot to save the stack trace when this happened, but I think it is happening at the same place in the code. Am I doing something wrong here? If I switch the piecewise-constant variables to linear Lagrange FEs, everything seems to run fine. Thanks! -- Boyce PS: Tomorrow I am going to work on getting this running with valgrind... |
From: Roy S. <roy...@ic...> - 2010-04-15 15:27:41
|
On Wed, 14 Apr 2010, Boyce Griffith wrote: > I am trying to put together a libMesh-based AMR solver which has some > variables I would like to treat as piecewise constants. I think that > the way to do this in libMesh is to set the order to be CONSTANT and the > FE type to be MONOMIAL. Right. The only question is which system you put them in - the same system as your other variables if they vary and you want them to do so in a fully-coupled way, a separate ImplicitSystem (or subclass) if they vary and you want that variation to be decoupled or one-way coupled, or a separate ExplicitSystem if they are constant (or if they vary in a non-stiff fashion). > When I do this, I get segfaults in EquationSystems::reinit() when > libMesh and my application code are compiled with optimizations enabled. > Similar segfaults do not seem to occur when I compile in debugging > mode even though I am using the same PETSc build in both cases. What version and configuration of PETSc? > The error almost always occurs on process 0 in VecScatterBegin: Yup. > However, this is hard to reproduce. Yes and no. I can't reproduce it with every version of PETSc, but I *can* reproduce it with ex21 of libMesh, even in debugging mode. I'd been hoping that there was just some problem with one of my PETSc builds. But if you're seeing the same bug then odds are it's a bug in libMesh and it's just only triggering segfaults in some PETSc builds. > Am I doing something wrong here? Probably not; if we're both seeing the same problem then I suspect we've got a bug with DG in the SVN head. I'm surprised Lorenzo hasn't run into it first, though. > PS: Tomorrow I am going to work on getting this running with valgrind... I'd appreciate it. Also if you do manage to reproduce the problem in a simple test case, let me know. ex21 is currently hard-coded to use a 3D L-shaped domain, the error doesn't occur until after a few refinement steps, and by that time the size of the send_list (where the problem mostly likely is) is up in the thousands and hard to examine manually. --- Roy |
From: Boyce G. <gri...@ci...> - 2010-04-15 15:42:26
|
On 4/15/10 11:27 AM, Roy Stogner wrote: > What version and configuration of PETSc? PETSc 3.0.0-p9 running on OS X 10.5 using gcc 4.4.2 and OpenMPI 1.3.3. PETSc was compiled at optimization level -O3 using --with-debugging=0. >> However, this is hard to reproduce. > > Yes and no. I can't reproduce it with every version of PETSc, but I > *can* reproduce it with ex21 of libMesh, even in debugging mode. I'd > been hoping that there was just some problem with one of my PETSc > builds. But if you're seeing the same bug then odds are it's a bug in > libMesh and it's just only triggering segfaults in some PETSc builds. I should have been more explicit. I always seem to get an error at approximately the same point in the computation. It is usually a segfault, but very occasionally is is a memory corruption error which is detected by PETSc. It seems to be random as to whether it is one or the other, with the probability of the error being a segfault apparently being much larger than the error being a memory corruption error detected by PETSc. >> PS: Tomorrow I am going to work on getting this running with valgrind... > > I'd appreciate it. Also if you do manage to reproduce the problem in > a simple test case, let me know. ex21 is currently hard-coded to use > a 3D L-shaped domain, the error doesn't occur until after a few > refinement steps, and by that time the size of the send_list (where > the problem mostly likely is) is up in the thousands and hard to > examine manually. Where is the relevant send_list --- within EquationSystems::reinit()? This error is cropping up after several refinement steps for me too (usually many), but the total number of nodes in the problem is not too huge, so it might be a little easier to diagnose. -- Boyce |
From: Roy S. <roy...@ic...> - 2010-04-15 15:58:04
|
On Thu, 15 Apr 2010, Boyce Griffith wrote: > > > On 4/15/10 11:27 AM, Roy Stogner wrote: >> What version and configuration of PETSc? > > PETSc 3.0.0-p9 running on OS X 10.5 using gcc 4.4.2 and OpenMPI 1.3.3. > > PETSc was compiled at optimization level -O3 using --with-debugging=0. So the problem is OS, compiler, and MPI implementation independent. Still a slim chance it's PETSc (I've only reproduced it on 3.0.0, couldn't get it to crop up with a brief test using 3.1) but I'd bet it's us. >> I'd appreciate it. Also if you do manage to reproduce the problem in >> a simple test case, let me know. ex21 is currently hard-coded to use >> a 3D L-shaped domain, the error doesn't occur until after a few >> refinement steps, and by that time the size of the send_list (where >> the problem mostly likely is) is up in the thousands and hard to >> examine manually. > > Where is the relevant send_list --- within EquationSystems::reinit()? It's constructed by a DofMap function that gets called from there, and it's used by the failing localize() that also gets called from in there. > This error is cropping up after several refinement steps for me too (usually > many), but the total number of nodes in the problem is not too huge, so it > might be a little easier to diagnose. Thanks. I'm going to add some more parameters to ex21.in, see if this still crops up in 2D or 1D. It was a low priority when I thought it might just be a bug in my PETSc install, but if it's not? I'd been hoping to release a libMesh 0.6.5 soon and this needs to be fixed first. --- Roy |
From: Derek G. <der...@in...> - 2010-04-15 16:04:48
|
On Apr 15, 2010, at 9:57 AM, Roy Stogner wrote: > So the problem is OS, compiler, and MPI implementation independent. > Still a slim chance it's PETSc (I've only reproduced it on 3.0.0, > couldn't get it to crop up with a brief test using 3.1) but I'd bet > it's us. We still use Petsc 2.3.3 around here... and haven't run into this problem (and we do use Constant Monomials and adaptivity quite a bit). That's just one data point... but it could be some interaction with libMesh and Petsc 3.0. Derek |
From: Roy S. <roy...@ic...> - 2010-04-15 18:18:06
|
On Thu, 15 Apr 2010, Derek Gaston wrote: > On Apr 15, 2010, at 9:57 AM, Roy Stogner wrote: > >> So the problem is OS, compiler, and MPI implementation independent. >> Still a slim chance it's PETSc (I've only reproduced it on 3.0.0, >> couldn't get it to crop up with a brief test using 3.1) but I'd bet >> it's us. > > We still use Petsc 2.3.3 around here... and haven't run into this > problem (and we do use Constant Monomials and adaptivity quite a > bit). That's just one data point... but it could be some > interaction with libMesh and Petsc 3.0. It might be triggered by PETSc in some subtle way, but I'm certain now it's not being caused by PETSc - we've got a corrupt entry being generated at the end of the send_list. Still need to figure out how. If you haven't triggered it, and I can't trigger it with any CG work, then my best guess is that I typoed Lorenzo's latest DG send_list patch when integrating that recently. --- Roy |
From: Roy S. <roy...@ic...> - 2010-04-15 23:21:35
|
On Thu, 15 Apr 2010, Roy Stogner wrote: > On Thu, 15 Apr 2010, Derek Gaston wrote: > >> On Apr 15, 2010, at 9:57 AM, Roy Stogner wrote: >> >>> So the problem is OS, compiler, and MPI implementation independent. >>> Still a slim chance it's PETSc (I've only reproduced it on 3.0.0, >>> couldn't get it to crop up with a brief test using 3.1) but I'd bet >>> it's us. >> >> We still use Petsc 2.3.3 around here... and haven't run into this >> problem (and we do use Constant Monomials and adaptivity quite a >> bit). That's just one data point... but it could be some >> interaction with libMesh and Petsc 3.0. > > It might be triggered by PETSc in some subtle way, but I'm certain now > it's not being caused by PETSc - we've got a corrupt entry being > generated at the end of the send_list. > > Still need to figure out how. If you haven't triggered it, and I > can't trigger it with any CG work, then my best guess is that I typoed > Lorenzo's latest DG send_list patch when integrating that recently. My best guess was *way* off. There are a few problem fixes that I'll be committing in a minute, but the real killer looks like it's in the projection send_list construction used by System::project_vector. We were getting invalid_id added to the list during adaptive coarsening of discontinuous elements, and PETSc naturally can react badly when told to scatter index -1. The same bug could affect adaptive coarsening of high order non-Lagrange continous elements, but I suspect nobody would have noticed it, since that manifestation would have been to add a little extra projection error to elements where you'd just decided to coarsen them anyway. --- Roy |
From: Lorenzo B. <bot...@gm...> - 2010-04-15 18:29:18
|
Hi all, I haven't any experience with piecewise CONSTANTS my first approximation is usually FIRST order. The dof_map patch should only be considered when using coupling_matrix. Lorenzo Il giorno 15/apr/2010, alle ore 20.17, Roy Stogner ha scritto: > > > On Thu, 15 Apr 2010, Derek Gaston wrote: > >> On Apr 15, 2010, at 9:57 AM, Roy Stogner wrote: >> >>> So the problem is OS, compiler, and MPI implementation independent. >>> Still a slim chance it's PETSc (I've only reproduced it on 3.0.0, >>> couldn't get it to crop up with a brief test using 3.1) but I'd bet >>> it's us. >> >> We still use Petsc 2.3.3 around here... and haven't run into this >> problem (and we do use Constant Monomials and adaptivity quite a >> bit). That's just one data point... but it could be some >> interaction with libMesh and Petsc 3.0. > > It might be triggered by PETSc in some subtle way, but I'm certain now > it's not being caused by PETSc - we've got a corrupt entry being > generated at the end of the send_list. > > Still need to figure out how. If you haven't triggered it, and I > can't trigger it with any CG work, then my best guess is that I typoed > Lorenzo's latest DG send_list patch when integrating that recently. > --- > Roy > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Libmesh-users mailing list > Lib...@li... > https://lists.sourceforge.net/lists/listinfo/libmesh-users |