From: Manav Bhatia <bhatiamanav@gm...>  20081018 18:04:14

Hi, For my application requiring the solution of a nonlinear transient system, I am doing a reinit of the fe object per elem per nonlinear iteration. For larger systems this has started to be a major CPU time expense. I am now considering saving one fe per elem in memory so that I do not have to do these reinits. Ofcourse, I will be committing a considerable amount of memory as well. I am writing to ask if anyone has tried this, and could share his/ her experiences or comment on this. Thanks, Manav 
From: Manav Bhatia <bhatiamanav@gm...>  20081018 18:04:14

Hi, For my application requiring the solution of a nonlinear transient system, I am doing a reinit of the fe object per elem per nonlinear iteration. For larger systems this has started to be a major CPU time expense. I am now considering saving one fe per elem in memory so that I do not have to do these reinits. Ofcourse, I will be committing a considerable amount of memory as well. I am writing to ask if anyone has tried this, and could share his/ her experiences or comment on this. Thanks, Manav 
From: John Peterson <jwpeterson@gm...>  20081018 19:56:17

On Sat, Oct 18, 2008 at 1:04 PM, Manav Bhatia <bhatiamanav@...> wrote: > Hi, > > For my application requiring the solution of a nonlinear transient > system, I am doing a reinit of the fe object per elem per nonlinear > iteration. For larger systems this has started to be a major CPU time > expense. > > I am now considering saving one fe per elem in memory so that I do > not have to do these reinits. Ofcourse, I will be committing a > considerable amount of memory as well. > > I am writing to ask if anyone has tried this, and could share his/ > her experiences or comment on this. I think it depends which part of reinit is taking up the most time. If it is the derivative calculations then I'm not sure how much info you can really cache. I'm assuming here you have elements with nonaffine maps... If you have a hybrid mesh (a mixture of geometric element types) you will probably gain some performance if you loop over all elements of a given type instead of switching back and forth repeatedly between geometric element types.  John 
From: Manav Bhatia <bhatiamanav@gm...>  20081018 20:13:41

I reinit a fe object with only the same elem kind. Hence, that aspect of reallocating memory space due to changing elem types does not seem to be a problem. Why would nonaffine maps affect the memory footprint regarding caching of information? Since my mesh geometry does not change during the course of the computation, I think storing the N, dN/dx, dN/dy and dN/dz for the different quadrature points should work. (I do not need second order derivatives). I can either do this external to the fe object (by wrapping it in some code that gets this information for each element and stores it), or I can simply store the entire fe object for each element and initialize it only once. I am curious about this: doesn't any nonlinear transient computation require this information per element per iteration? Am I the first one to consider caching this information? How do the CPU/memory overheads for your problems work out? Regards, Manav On Oct 18, 2008, at 3:56 PM, John Peterson wrote: > On Sat, Oct 18, 2008 at 1:04 PM, Manav Bhatia > <bhatiamanav@...> wrote: >> Hi, >> >> For my application requiring the solution of a nonlinear transient >> system, I am doing a reinit of the fe object per elem per nonlinear >> iteration. For larger systems this has started to be a major CPU time >> expense. >> >> I am now considering saving one fe per elem in memory so that I do >> not have to do these reinits. Ofcourse, I will be committing a >> considerable amount of memory as well. >> >> I am writing to ask if anyone has tried this, and could share his/ >> her experiences or comment on this. > > I think it depends which part of reinit is taking up the most time. > If it is the derivative calculations then I'm not sure how much info > you can really cache. I'm assuming here you have elements with > nonaffine maps... > > If you have a hybrid mesh (a mixture of geometric element types) you > will probably gain some performance if you loop over all elements of a > given type instead of switching back and forth repeatedly between > geometric element types. > >  > John 
From: Benjamin Kirk <benjamin.kirk@na...>  20081018 21:25:00

> I reinit a fe object with only the same elem kind. Hence, that aspect > of reallocating memory space due to changing elem types does not seem > to be a problem. What I was specifically asking is where code like AutoPtr<FEBase> fe (FEBase::build(dim, fe_type)); Sits. The issue is if it is inside your assemble() function then you are building a new FE object(s) at each call to assemble, which will happen at least (n_nl_iterations*n_time_steps). Even if this is outside your element loop (as it should be) then each time that new object is created all the vectors it contains are empty, and they must have some memory allocation which goes on in the first call to reinit(). If they persist between calls to assemble (by putting them in main or something) then you avoid that aspect. Also, if you don't need second derivative support have you configured the library without it? (That is actually the default.) That will help without changing any of your code... Ben 
From: Manav Bhatia <bhatiamanav@gm...>  20081018 21:47:37

hmmm..... actually, that might be it. My FEBase::build is inside the assembly routine. I will move this out of the routine and update you on how things change. Thanks, Manav On Oct 18, 2008, at 5:24 PM, Benjamin Kirk wrote: >> I reinit a fe object with only the same elem kind. Hence, that aspect >> of reallocating memory space due to changing elem types does not seem >> to be a problem. > > What I was specifically asking is where code like > > AutoPtr<FEBase> fe (FEBase::build(dim, fe_type)); > > Sits. > > The issue is if it is inside your assemble() function then you are > building > a new FE object(s) at each call to assemble, which will happen at > least > (n_nl_iterations*n_time_steps). Even if this is outside your > element loop > (as it should be) then each time that new object is created all the > vectors > it contains are empty, and they must have some memory allocation > which goes > on in the first call to reinit(). > > If they persist between calls to assemble (by putting them in main or > something) then you avoid that aspect. > > Also, if you don't need second derivative support have you > configured the > library without it? (That is actually the default.) That will help > without > changing any of your code... > > Ben > 
From: John Peterson <jwpeterson@gm...>  20081018 20:55:14

 Forwarded message  From: John Peterson <jwpeterson@...> Date: Sat, Oct 18, 2008 at 3:54 PM Subject: Re: [Libmeshusers] fe reinit To: Manav Bhatia <bhatiamanav@...> On Sat, Oct 18, 2008 at 3:13 PM, Manav Bhatia <bhatiamanav@...> wrote: > > Why would nonaffine maps affect the memory footprint regarding caching of > information? It doesn't affect the memory, it effects the amount of computation which needs to be done during reinit. (When the map is affine, the Jacobian need only be computed at one quadrature point, and reused everywhere. Roy has implemented this "has_affine_map" optimization.) I brought this up in regard to your original question about time performance, in trying to figure out what reinit is spending the most time doing. > Since my mesh geometry does not change during the course of the computation, > I think storing the N, dN/dx, dN/dy and dN/dz for the different quadrature > points should work. (I do not need second order derivatives). I can either This only works if all your elements have exactly the same shape. If that's the case, Roy has also implemented a "cached_nodes_still_fit" optimization that should already be doing what you suggest. If reinit is taking a long time, we need to see which of these optimizations is not in effect, and why.  John  John 
From: Derek Gaston <friedmud@gm...>  20081018 21:14:47

On Oct 18, 2008, at 2:13 PM, Manav Bhatia wrote: > I am curious about this: doesn't any nonlinear transient computation > require this information per element per iteration? Am I the first one > to consider caching this information? How do the CPU/memory overheads > for your problems work out? This _really_ depends on your application. For a lot of my applications that I've solved fully implicit with newtons method and fairly large timesteps... solving the linear systems takes up a LOT more time than reinit(). On the other hand if you're solving either explicitly or using very small timesteps so your linear systems are pretty easy to solve... I could see how reinit could start taking up more solve time. Currently, I do everything Jacobian Free... which means I'm evaluating my residual _millions_ of times (even during the actual linear solve). In this case I'm solving fairly large (6 or 7 variables) systems, so doing one FE reinit to evaluate the residual for 7 equations isn't too bad. Currently we're spending about 50 to 60 percent of our runtime just evaluating residuals... and only about 15% of our runtime is coming from reinit() like activities (and that includes recalculating the value and gradient of all of our variables at every quadrature point). In my case though... a lot of our computational time goes into calculating highly nonlinear material properties... some of which even have Fourier series to evaluate at every quadrature point. Along with very tight coupling between the variables that also has to be evaluated. What I'm getting at here is that there are lots of ways to spend computational time... and it is _very_ application dependent. I'm personally not worried about how long FE::reinit() is taking because I know for a fact that it has been uberoptimized by Roy (and others) and I'm confident that all of the calculations happening in there are absolutely necessary. All of that said... I am interested to hear about it if you get this FE caching stuff to work... Derek 
From: Roy Stogner <roystgnr@ic...>  20081019 02:57:21

On Sat, 18 Oct 2008, Derek Gaston wrote: > What I'm getting at here is that there are lots of ways to spend > computational time... and it is _very_ application dependent. This is definitely true. > I'm personally not worried about how long FE::reinit() is taking > because I know for a fact that it has been uberoptimized by Roy Whereas this is a contemptible lie. ;) I've added a few optimizations that were enough to get rid of the worst reinit cost in my own applictions, but it's still definitely not as fast as it could be. One problem I recall specifically is that we've got some shape function and master derivative evaluations that are done with loops of scalarreturning function calls where a single vectorfilling function call.  Roy 