From: Kirk, B. (JSC-EG311) <ben...@na...> - 2009-04-17 11:09:34
Attachments:
petsc_vector_nonsense
|
Tim, Please find attached a patch which addresses the ridiculous implementation of adding a scalar to a PetscVector pointed out last week... Let me know if it works for you, if so I'll submit it. -Ben |
From: Tim K. <tim...@ce...> - 2009-04-17 12:20:19
|
Dear Ben, On Fri, 17 Apr 2009, Kirk, Benjamin (JSC-EG311) wrote: > Please find attached a patch which addresses the ridiculous implementation > of adding a scalar to a PetscVector pointed out last week... Let me know if > it works for you, if so I'll submit it. Actually, I'm not using that function anywhere in my code, so there is no easy way for me to test it. (Remember that Jed was the one who pointed this out initially.) On the other hand, there are some more possibly inefficient things in that file. For instance, PetscVector::insert() calls PetscVector::set() for each index, and I suppose it would be better to use VecSetValues() instead. It's difficult to decide how far one should go at the moment with optimizing the PetscVector class. Anyway, I looked over your patch and it seems correct to me, and in particular it coincides with Jed's suggestion. Since Jed seems to be familiar with PETSc very well, this could be considered of being enough reason to submit the patch right now. Other optimizations should probably wait until we know whether this is really a bottleneck. Before I run the application with the PETSc log output option that Jed suggested, I would like the remining bug to be fixed, since one never knows what that implies. I'm currently waiting for Roy to report whether he can reproduce the bug; I guess that he has been too busy by now. Best Regards, Tim -- Dr. Tim Kroeger tim...@me... Phone +49-421-218-7710 tim...@ce... Fax +49-421-218-4236 Fraunhofer MEVIS, Institute for Medical Image Computing Universitaetsallee 29, 28359 Bremen, Germany |
From: Roy S. <roy...@ic...> - 2009-04-17 14:41:11
|
On Fri, 17 Apr 2009, Tim Kroeger wrote: > I would like the remining bug to be fixed, since one never knows > what that implies. I'm currently waiting for Roy to report whether > he can reproduce the bug; I guess that he has been too busy by now. Sorry about the delay; you guess right. The prodding was helpful. I did have time to set up your test program and get it started this morning. Not sure if/when it's going to finish, though. I'd like to run all 24 processes on the same box for ease of debugging, but it's currently halfway through the refinement steps and using 6GB memory out of 4GB available RAM. We've got a couple 16GB nodes here, but they're even busier than I am through most of the week lately. But if the 4GB server proves insufficient I'll see if I can monopolize one of the big guys tomorrow or Sunday. --- Roy |
From: Tim K. <tim...@ce...> - 2009-04-22 06:18:44
|
Dear Roy, On Fri, 17 Apr 2009, Roy Stogner wrote: > I did have time to set up your test program and get it started this > morning. Not sure if/when it's going to finish, though. Did it reveal anything yet? > I'd like to run all 24 processes on the same box for > ease of debugging, but it's currently halfway through the refinement > steps and using 6GB memory out of 4GB available RAM. You mean, it was already swapping? I see, that's going to slow it down essentially. By the way: Nobody seems to have checked in my patch that I sent to the list last week (April 15) (nor has anybody stated that my patch could cause problems). Best Regards, Tim -- Dr. Tim Kroeger tim...@me... Phone +49-421-218-7710 tim...@ce... Fax +49-421-218-4236 Fraunhofer MEVIS, Institute for Medical Image Computing Universitaetsallee 29, 28359 Bremen, Germany |
From: Roy S. <roy...@ic...> - 2009-04-24 22:21:32
|
On Wed, 22 Apr 2009, Tim Kroeger wrote: > On Fri, 17 Apr 2009, Roy Stogner wrote: > >> I did have time to set up your test program and get it started this >> morning. Not sure if/when it's going to finish, though. > > Did it reveal anything yet? On the big machine I get the same error, and relatively quickly. I'm going to glance over the data structures in the debugger tonight, but unless I can find an obvious underlying cause I won't have much time to spend on it for a while. --- Roy |
From: Roy S. <roy...@ic...> - 2009-05-01 23:50:51
|
On Fri, 24 Apr 2009, Roy Stogner wrote: > On Wed, 22 Apr 2009, Tim Kroeger wrote: > >> On Fri, 17 Apr 2009, Roy Stogner wrote: >> >>> I did have time to set up your test program and get it started this >>> morning. Not sure if/when it's going to finish, though. >> >> Did it reveal anything yet? > > On the big machine I get the same error, and relatively quickly. I'm > going to glance over the data structures in the debugger tonight, but > unless I can find an obvious underlying cause I won't have much time > to spend on it for a while. The old version of gdb on that machine is killing me - even with METHOD=dbg it can't seem to find many libMesh methods, can't seem to call the ones it does find, and sometimes crashes or returns incorrect results from calls it does make. If anyone knows how to walk through the libstdc++ std::map data structure to find a specific entry, let me know. This didn't work for me: http://help.lockergnome.com/linux/GDB-capabilities-exploring-STL-classes--ftopict279673.html I've hassled the sysadmins to try and get a newer gdb to use. For now, examining what data structures I could directly, I've at least got a start on the problem. It's the same symptom as last time (constraint application failing because a constraining DoF isn't semilocal) but it's not the same cause. Last time we were trying to satisfy the correct constraint equation but we missed adding the constraining DoF to the send_list. This time the constraint equation itself apparently came out wrong - it's trying to use one incorrect DoF index. And that's probably not the root of the problem. I presume that the odd choice of "24.01" in the build_cube parameters was necessary to reproduce the problem? Then the real bug has to be in one of the hacks where we use nodal coordinates to identify a node. I think most of my own sins in that regard are limited to ParallelMesh. Ben does a bit of that in MeshCommunication that we might look at. But the most likely culprit is probably find_neighbors(). I've got to run right now, and won't get a chance to really look at this again until next Wednesday. Sorry about the delays. And thanks again for all the ghosted vectors work. Regardless of how long it takes us to make it efficient, it's already been invaluable as a debugging tool. --- Roy |
From: Tim K. <tim...@ce...> - 2009-05-04 06:53:38
|
Dear Roy, On Fri, 1 May 2009, Roy Stogner wrote: > I presume that the odd choice of "24.01" in > the build_cube parameters was necessary to reproduce the problem? Actually, for me there is no reason to believe this. The "24.01" is used for a different reason, having to do with some of the interna of the application. I have not tested whether the crash depends on this. (In fact, for the previous bug, as you might remember, I mixed up the arguments of build_cube() once again, hence unwittingly using a completely different geometry than in the application, which did not influence the bug. Only because this time I got the argument order right and hence use the same geometry as in the application, this is not a reason to believe that the bug *depends* on the geometry.) > I've got to run right now, and won't get a chance to really look at > this again until next Wednesday. Sorry about the delays. Doesn't matter; it's not so urgent any more as it was some time ago. (-: > And thanks again for all the ghosted vectors work. Regardless of how > long it takes us to make it efficient, it's already been invaluable as > a debugging tool. I am pleased to hear that! Best Regards, Tim -- Dr. Tim Kroeger tim...@me... Phone +49-421-218-7710 tim...@ce... Fax +49-421-218-4236 Fraunhofer MEVIS, Institute for Medical Image Computing Universitaetsallee 29, 28359 Bremen, Germany |
From: Tim K. <tim...@ce...> - 2009-05-14 14:00:41
|
Dear Roy, On Mon, 4 May 2009, Tim Kroeger wrote: >> I've got to run right now, and won't get a chance to really look at >> this again until next Wednesday. Sorry about the delays. > > Doesn't matter; it's not so urgent any more as it was some time ago. > (-: Well, to prevent any possible misunderstanding, I would like to add that by this sentence, I didn't mean that it became totally irrelevant for me. Although there is currently no short term deadline pressure for me on this item, I would appreciate to have the ghosted vectors working before the next short term deadline pressure arises. If there is any sensible task that I could do to assist you in finding the bug, please let me know. Best Regards, Tim -- Dr. Tim Kroeger tim...@me... Phone +49-421-218-7710 tim...@ce... Fax +49-421-218-4236 Fraunhofer MEVIS, Institute for Medical Image Computing Universitaetsallee 29, 28359 Bremen, Germany |
From: Roy S. <roy...@ic...> - 2009-05-14 23:14:54
|
On Thu, 14 May 2009, Tim Kroeger wrote: > Although there is currently no short term deadline pressure for me on this > item, I would appreciate to have the ghosted vectors working before the next > short term deadline pressure arises. If there is any sensible task that I > could do to assist you in finding the bug, please let me know. If I could think of anything, I'd have mentioned it now, I'm afraid. Right now the debug cycle is glacial, but I'm not sure how to avoid: Recompiling. I can't get gdb or idb to walk though our DofConstraints structure properly (can't find function X, can't cast to unknown type Y, etc...) and that's left me using std::cerr as a major debugging tool. Rerunning. This bug depends on the precise mesh partitioning, which we redo whenever we load a new file, so I can't just save the failing mesh and restart from that, I have to restart from the beginning and walk though all the dozens of AMR/C steps... in dbg or devel mode, if I want to be able to use gdb at all on the result. Work. Your typical debug cycle doesn't include "Spend days giving and listening to talks" or "Fix and rerun sensitivity analyses with a different code", but the last few weeks have been swamped for me. It looks like add_constraints_to_send_list didn't quite do what it was supposed to, because this is another version of the same bug: a hanging face node has four dependencies, one of which is a hanging edge node with two dependencies, and the processor with the face node somehow isn't getting the farther grand-dependency added to its send_list. It'll be another run or two until I'm certain of that, though, and probably a few more before I've figured out why it's happening and how to fix it. --- Roy |
From: Tim K. <tim...@ce...> - 2009-05-15 06:20:48
|
Dear Roy, On Thu, 14 May 2009, Roy Stogner wrote: > Right now the debug cycle is glacial, but I'm not sure how to avoid: > [...] Thank you very much for your interim report. I just wanted to make sure that things have not been forgotten. I understand that it takes time to find this bug. Sorry if my mail made you feel defensive. Best Regards, Tim -- Dr. Tim Kroeger tim...@me... Phone +49-421-218-7710 tim...@ce... Fax +49-421-218-4236 Fraunhofer MEVIS, Institute for Medical Image Computing Universitaetsallee 29, 28359 Bremen, Germany |
From: Roy S. <roy...@ic...> - 2009-05-15 11:57:25
|
On Fri, 15 May 2009, Tim Kroeger wrote: > On Thu, 14 May 2009, Roy Stogner wrote: > >> Right now the debug cycle is glacial, but I'm not sure how to avoid: [...] > > Thank you very much for your interim report. I just wanted to make sure that > things have not been forgotten. I understand that it takes time to find this > bug. > > Sorry if my mail made you feel defensive. Not defensive at you, just exasperated at gdb. My attempts to get a proper debugger's-eye-view of complex STL-tree-based classes have been both numerous and fruitless. --- Roy |
From: Roy S. <roy...@ic...> - 2009-05-16 13:26:17
|
On Thu, 14 May 2009, Roy Stogner wrote: > It'll be another run or two until I'm certain of that, though, and > probably a few more before I've figured out why it's happening and > how to fix it. Got it! The reason this bug was so hard to find is that, in some sense, it's more of a design mistake than a bug! enforce_constraints_exactly() works correctly with serial vectors; ghosted vectors are correctly behaving as their API specifies... the two correct behaviors just weren't 100% compatible yet, is all. When I wrote enforce_constraints_exactly(), for some reason, (possibly because we weren't yet recursively expanding contraints and so a more straightforward method wasn't yet possible, possibly because I was still learning how the contraint system worked), I made it work by building the constraint matrix C for each element, looping over the rows that correspond to local constrained dofs, and setting vglobal_i = sum(C_ij*vlocal_j) If you'll forgive my abused notation: The problem here is that we've got an element with dof a on processor 1 and dof b on processor 2, the latter of which depends on dof c on processor 3. Because dof c isn't a constraint dependency on processor 1, that processor doesn't have it in the send_list. This means that vlocal_c is inaccurately 0 on a processor 1 serial vector, but we don't really care, because C_ac is 0 too and the inaccuracy in vlocal_c doesn't propagate to vglobal_a. But accessing vlocal_c, even to multiply it by 0, throws a libmesh_error() on a ghosted vector! Anyway, an immediate fix is simple: just skip accumulating indices where C_ij==0.0. I've committed that to SVN, and on my machine it works to take the test case you sent all the way to completion. In the long term we'll want to change enforce_constraints_exactly to just loop over local dofs and directly use the constraint rows, but I'd like to make sure this old bug is fixed before I risk mucking things up and adding new bugs. ;-) --- Roy |
From: John P. <jwp...@gm...> - 2009-05-16 14:56:35
|
On Sat, May 16, 2009 at 8:26 AM, Roy Stogner <roy...@ic...> wrote: > > On Thu, 14 May 2009, Roy Stogner wrote: > >> It'll be another run or two until I'm certain of that, though, and >> probably a few more before I've figured out why it's happening and >> how to fix it. > > Got it! > > The reason this bug was so hard to find is that, in some sense, it's > more of a design mistake than a bug! enforce_constraints_exactly() > works correctly with serial vectors; ghosted vectors are correctly > behaving as their API specifies... the two correct behaviors just > weren't 100% compatible yet, is all. > > When I wrote enforce_constraints_exactly(), for some reason, (possibly > because we weren't yet recursively expanding contraints and so a more > straightforward method wasn't yet possible, possibly because I was > still learning how the contraint system worked), I made it work by > building the constraint matrix C for each element, looping over the > rows that correspond to local constrained dofs, and setting > vglobal_i = sum(C_ij*vlocal_j) > > If you'll forgive my abused notation: The problem here is that we've > got an element with dof a on processor 1 and dof b on processor 2, the > latter of which depends on dof c on processor 3. Because dof c isn't > a constraint dependency on processor 1, that processor doesn't have it > in the send_list. This means that vlocal_c is inaccurately 0 on a > processor 1 serial vector, but we don't really care, because C_ac is 0 > too and the inaccuracy in vlocal_c doesn't propagate to vglobal_a. > But accessing vlocal_c, even to multiply it by 0, throws a > libmesh_error() on a ghosted vector! > > Anyway, an immediate fix is simple: just skip accumulating indices > where C_ij==0.0. I've committed that to SVN, and on my machine it > works to take the test case you sent all the way to completion. In > the long term we'll want to change enforce_constraints_exactly to just > loop over local dofs and directly use the constraint rows, but I'd > like to make sure this old bug is fixed before I risk mucking things > up and adding new bugs. ;-) Way to go Roy! I'd like to personally double your libmesh developer salary this month ;-) -- John |
From: Kirk, B. (JSC-EG311) <ben...@na...> - 2009-05-16 15:11:01
|
> Way to go Roy! > > I'd like to personally double your libmesh developer salary this month ;-) I second that. When we are all in Austin week after next let's decide on a libMesh-0.7.0 release date. We also need to devise some bonus incentive program for meeting the milestone? ;-) -Ben |
From: Tim K. <tim...@ce...> - 2009-05-18 06:48:00
|
Dear Roy, On Sat, 16 May 2009, Roy Stogner wrote: > Got it! [...] Great work! I'll let the application run today and see whether any new bugs emerge. (-: > Anyway, an immediate fix is simple: just skip accumulating indices > where C_ij==0.0. I've committed that to SVN, and on my machine it > works to take the test case you sent all the way to completion. I would strongly suggest to add a comment at that position in the code. Otherwise, if you (or somebody else) later should decide that -Wfloat-equal should be enabled, you'll get a warning at this point, and since you might have forgotten the reason, you'll be tempted to remove that seemingly useless piece of code. Best Regards, Tim -- Dr. Tim Kroeger tim...@me... Phone +49-421-218-7710 tim...@ce... Fax +49-421-218-4236 Fraunhofer MEVIS, Institute for Medical Image Computing Universitaetsallee 29, 28359 Bremen, Germany |
From: Tim K. <tim...@ce...> - 2009-05-26 06:35:04
Attachments:
patch
|
Dear Roy, On Mon, 18 May 2009, Tim Kroeger wrote: >> Got it! [...] > > Great work! I'll let the application run today and see whether any > new bugs emerge. (-: When I first tested it, it kept crashing at the same point as before. It took me quite long to find out what I did wrong: I compiled using METHO=devel make and linked against the devel version. Seems right, doesn't it? Well, but if you look closely, you see that I missed out a "D", so I actually compiled the optimized version. Feature request for your build system: If the user ever has compiled a version other than optimized, "make" with unset "METHOD" variable should produce an error message and not work. After I had fixed my compile statement, the next run crashed at a different (later) point. I conjecture that that was out-of-memory, so I restarted it on a larger number of nodes (same number of CPUs, though). It then finally ran through without crash. By the way, the temporal scalability is also quite bad. You might remember that this bug has emerged as a test of how my application behaves on a larger number of CPUs. The runtimes for 8 CPUs on 3 nodes (copied from my mail of April 8, 2009) were: no-ghosted-1 : 11:34:10 no-ghosted-2 : 11:35:54 ghosted-1 : 17:25:28 ghosted-2 : 16:33:23 Now, the new result for 24 CPUs on 4 nodes (3 nodes ran out of memory) is: ghosted : 10:42:34 Not really nice. In particular not essentially faster than the unghosted version on a comparable number of nodes. I'll perfom the log output that Jed suggested and see what he says. Perhaps there is some easy possibility to make it faster. >> Anyway, an immediate fix is simple: just skip accumulating indices >> where C_ij==0.0. I've committed that to SVN, and on my machine it >> works to take the test case you sent all the way to completion. > > I would strongly suggest to add a comment at that position in the > code. Otherwise, if you (or somebody else) later should decide that > -Wfloat-equal should be enabled, you'll get a warning at this point, > and since you might have forgotten the reason, you'll be tempted to > remove that seemingly useless piece of code. I guess that you didn't have the time yet to write such a comment, so I wrote that for you (see attachment), you just have to check it in. Best Regards, Tim -- Dr. Tim Kroeger tim...@me... Phone +49-421-218-7710 tim...@ce... Fax +49-421-218-4236 Fraunhofer MEVIS, Institute for Medical Image Computing Universitaetsallee 29, 28359 Bremen, Germany |
From: Roy S. <roy...@ic...> - 2009-04-22 23:09:25
|
On Wed, 22 Apr 2009, Tim Kroeger wrote: > On Fri, 17 Apr 2009, Roy Stogner wrote: > >> I did have time to set up your test program and get it started this >> morning. Not sure if/when it's going to finish, though. > > Did it reveal anything yet? It left the system hosed for long enough that I had to kill it. I'll be able to try again on a bigger system this weekend. > By the way: Nobody seems to have checked in my patch that I sent to the list > last week (April 15) (nor has anybody stated that my patch could cause > problems). It's one of the many things backing up in my inbox right now, sorry. --- Roy |
From: Roy S. <roy...@ic...> - 2009-04-22 23:29:18
|
On Wed, 22 Apr 2009, Roy Stogner wrote: > On Wed, 22 Apr 2009, Tim Kroeger wrote: > >> By the way: Nobody seems to have checked in my patch that I sent to the >> list last week (April 15) (nor has anybody stated that my patch could cause >> problems). > > It's one of the many things backing up in my inbox right now, sorry. Patch looks good. I'm in a rush now, but I'll add it late tonight or tomorrow. One question first: can anyone think of a better name than constrain_nothing()? The analogy to our other constrain_* functions makes sense, but it seems odd outside that context to have a method essentially named "do_nothing" that does modify its inputs. I can't think of any better name myself, so I'll commit the new method as constrain_nothing(). But if someone *can* think of a more descriptive name, let me know so we can change it while the API's still only got one user. ;-) --- Roy |
From: Tim K. <tim...@ce...> - 2009-04-23 06:34:10
|
Dear Roy, On Wed, 22 Apr 2009, Roy Stogner wrote: > One question first: can anyone think of a better name than > constrain_nothing()? The analogy to our other constrain_* functions > makes sense, but it seems odd outside that context to have a method > essentially named "do_nothing" that does modify its inputs. Well, of course you are right. But actually, I find this is only a symptom of the fact that a user might not expect the constrain_*() methods to modify their dof_indices argument. At least, that was true for me quite a long time, and it caused a number of programming errors in my applications. I learned this when I implemented the constrain_dyad_matrix() function. Since that time I understand that these methods *have* to do this. What I want to say is this: First, having a method called "constrain_nothing()" might make the innocent user look into the details earlier and prevent him from making mistakes. Second, the whole constaining mechanism could be reworked completely. I'm thinking about something like this: start_constraining(const old_row_dofs, const old_col_dofs, new_row_dofs, new_col_dofs); constrain_vector(const old_row_dofs, const new_row_dofs, vector); constrain_matrix(const old_row_dofs, const old_col_dofs, const new_row_dofs, const new_col_dofs, matrix); constrain_dyad_matrix(const old_row_dofs, const old_col_dofs, const new_row_dofs, const new_col_dofs, v, w); Or, perhaps better, have a "Constraining" class that is created locally by the user and holds all required information (including the constraint matrix). Any opinions? Best Regards, Tim -- Dr. Tim Kroeger tim...@me... Phone +49-421-218-7710 tim...@ce... Fax +49-421-218-4236 Fraunhofer MEVIS, Institute for Medical Image Computing Universitaetsallee 29, 28359 Bremen, Germany |
From: Roy S. <roy...@ic...> - 2009-04-23 15:47:35
|
On Thu, 23 Apr 2009, Tim Kroeger wrote: > On Wed, 22 Apr 2009, Roy Stogner wrote: > >> One question first: can anyone think of a better name than >> constrain_nothing()? The analogy to our other constrain_* functions makes >> sense, but it seems odd outside that context to have a method essentially >> named "do_nothing" that does modify its inputs. > > Well, of course you are right. But actually, I find this is only a symptom > of the fact that a user might not expect the constrain_*() methods to modify > their dof_indices argument. Good point. > At least, that was true for me quite a long time, and it caused a > number of programming errors in my applications. I learned this > when I implemented the constrain_dyad_matrix() function. Since that > time I understand that these methods *have* to do this. Well, they have to get an expanded dof_indices vector. They don't necessarily have to expand their argument rather than making a copy or keeping new indices in aseparate vector. Presumably we just chose the most efficient semantic over the more intuitive ones. > What I want to say is this: First, having a method called > "constrain_nothing()" might make the innocent user look into the > details earlier and prevent him from making mistakes. constrain_nothing() it is, then. > Second, the whole constaining mechanism could be reworked > completely. I'm thinking about something like this: > > start_constraining(const old_row_dofs, > const old_col_dofs, > new_row_dofs, > new_col_dofs); > > constrain_vector(const old_row_dofs, > const new_row_dofs, > vector); > > constrain_matrix(const old_row_dofs, > const old_col_dofs, > const new_row_dofs, > const new_col_dofs, > matrix); > > constrain_dyad_matrix(const old_row_dofs, > const old_col_dofs, > const new_row_dofs, > const new_col_dofs, > v, > w); > > Or, perhaps better, have a "Constraining" class that is created locally by > the user and holds all required information (including the constraint > matrix). > > Any opinions? Probably a Constraining class; otherwise we'd have to regenerate that constraint matrix over and over again, right? And while a more intuitive constraint API would be nice, we've got something that works now, so it's not a high priority for me. Patches would be welcomed and (eventually...) included, though. --- Roy |
From: Tim K. <tim...@ce...> - 2009-04-24 06:15:57
|
Dear Roy, On Thu, 23 Apr 2009, Roy Stogner wrote: > constrain_nothing() it is, then. Okay, thank you. >> Second, the whole constaining mechanism could be reworked >> completely. [...] > > Probably a Constraining class; otherwise we'd have to regenerate that > constraint matrix over and over again, right? Yes, you are right. > And while a more > intuitive constraint API would be nice, we've got something that works > now, so it's not a high priority for me. The same applies to me. > Patches would be welcomed and (eventually...) included, though. I'll keep that in mind for the unlikely case that I should some day feel that I have nothing to do. (-: Best Regards, Tim -- Dr. Tim Kroeger tim...@me... Phone +49-421-218-7710 tim...@ce... Fax +49-421-218-4236 Fraunhofer MEVIS, Institute for Medical Image Computing Universitaetsallee 29, 28359 Bremen, Germany |
From: Roy S. <roy...@ic...> - 2009-05-14 18:27:32
|
On Sat, 9 May 2009, Lorenzo Botti wrote: > This code produces the problem during the reinit after the second solve.Hope > it can help. > > Without coarsening it seems that all works fine! Here's what I get running the code on the libMesh svn head with METHOD=dbg: *** Warning, This Code is Deprecated! src/base/libmesh.C, line 356, compiled May 14 2009 at 11:28:14 *** Beginning Solve 0 Number of elements: 219 assembling elliptic dg system... done System has: 768 degrees of freedom. Linear solver converged at step: 21, final residual: 4.1012e-12 L2-Error is: 0.00666744 H1-Error is: 0.144821 Beginning Solve 1 Number of elements: 827 assembling elliptic dg system... done System has: 2896 degrees of freedom. Linear solver converged at step: 30, final residual: 1.0506e-11 L2-Error is: 0.00264921 H1-Error is: 0.102276 Beginning Solve 2 Number of elements: 3003 assembling elliptic dg system... done System has: 10512 degrees of freedom. Linear solver converged at step: 50, final residual: 2.03714e-11 L2-Error is: 0.0016323 H1-Error is: 0.070119 *** Warning, This Code is Deprecated! src/base/libmesh.C, line 366, compiled May 14 2009 at 11:28:14 *** Since your code hadn't updated the libMesh::init() calls to use a LibMeshInit object instead, is it safe for me to assume you're running with an older libMesh version? Which one? Would you try checking out the current SVN version and see if you can reproduce the problem there? --- Roy |
From: Jed B. <je...@59...> - 2009-05-15 12:07:12
Attachments:
signature.asc
|
Roy Stogner wrote: > My attempts to get a proper debugger's-eye-view of complex > STL-tree-based classes have been both numerous and fruitless. Do you have any experience with http://sourceware.org/gdb/wiki/ProjectArcher I haven't used it because I work primarily in C, but it should help with this. Jed |
From: Roy S. <roy...@ic...> - 2009-05-15 12:49:02
|
On Fri, 15 May 2009, Jed Brown wrote: > Roy Stogner wrote: > >> My attempts to get a proper debugger's-eye-view of complex >> STL-tree-based classes have been both numerous and fruitless. > > Do you have any experience with > > http://sourceware.org/gdb/wiki/ProjectArcher No, thank you! I'd tried a couple different macro sets that were supposed to work on top of vanilla gdb, but never anything that changed the source itself. I'll give this a shot. --- Roy |
From: Jed B. <je...@59...> - 2009-05-26 11:07:38
Attachments:
signature.asc
|
Tim Kroeger wrote: > Not really nice. In particular not essentially faster than the > unghosted version on a comparable number of nodes. It is important to know what preconditioners are being used (preconditioners always change in parallel, though not as much when there is a coarse level as in multigrid). Also, memory performance (especially bandwidth) is usually the overwhelming issue for implicit solvers (e.g. you are very lucky to get 4% of peak FPU performance for MatVec on Core 2 Quad). Thus using more cores frequently does not help, and you need more sockets to improve performance. Network latency is also a factor, but if your subdomains are big enough that you are having memory issues, it should not be an issue (rather, if it is an issue then we have faulty algorithms at play). There may well be more going on here, but it's important to consider these issues. > I'll perfom the log output that Jed suggested and see what he says. > Perhaps there is some easy possibility to make it faster. FWIW, I always run nontrivial jobs with -log_summary. It does not add measurable overhead and that output is really useful. If the effort required to ignore that output is less than the run (i.e. the run is more than a few seconds), it is worthwhile to have that profiling info. Jed |