Tim Kroeger wrote:
> Dear Jed and Lorenzo,
>
> On Wed, 10 Jun 2009, Jed Brown wrote:
>
>> Tim Kroeger wrote:
>>
>>> Please find attached the log output now.
>>
>> First, I'm confused about why all the detailed output is missing. See
>> attached for the sort of output I expect. In particular, there is
>> fine-grained output on vector operations and solver internals with
>> details about load balancing.
>
> I have no idea what happend here. I should note that I supplied the
> option "-log_summary" using PetscOptionsSetValue rather than directly on
> the command line, but that shouldn't be the point, should it?
This is not okay, that just registers the option in the options
database, but the profiling functions don't check there (for performance
reasons). The option either needs to be present at PetscInitialize,
i.e. any of
1. Passed in on the command-line
2. Specified in the file that is the third argument of PetscInitialize
3. Put in $HOME/.petscrc
4. Put in $PWD/.petscrc
OR, if you must, call PetscLogBegin() to start logging and make sure
that -log_summary is in the options database by the time PetscFinalize
is called.
>> I see about 30% of the time spent in assemble which about what I'd
>> expect, but solve time is relatively low, especially when compared to
>> some of the FE ops. I don't recall, is a significant part of your code
>> explicit?
>
> There are some explicit parts, but it wouldn't expect them to be
> significant.
>
> When I watch the output while the application is running, it feels as if
> it is "most of the time" either inside assemble/solve of one of the
> implicit parts or inside EquationSystems::reinit().
According to the libmesh profiling, the sum of these steps is only half
the runtime. It should be essentially everything (especially since
function evaluation is not a separate step). But, it appears that the
final profiling output for, say, assemble(), does not include sub-events
(so there is no way for us to know the total amount of time spent in
"assembly" without defining PerfLog objects for the various stages).
This is different than PETSc's convention so keep that in mind.
>> inverse_map is getting called a ton.
>
> Yes. I use a PointLocator, but that is at then end, after the main
> computation has finished. The PointLocator is used very heavily then,
> and this part of the code runs in serial (only on core #0) and takes
> about 35 minutes.
>
> I see that this contradicts the fact that over 90 minutes are spent
> inside inverse_map(). I have currently no idea where else I am using that.
It's also called in assembly somewhere and seems to be a significant
bottleneck. Roy et al., could you remind me why this is required?
>>> As I see it, there are two obvious things that I could do. That is (1)
>>> update to mpich2 1.1, and (2) recompile PETSc with
>>> "--with-debugging=no". Would you recommend to do so?
>>
>> 1. Time for MPI_Barrier/MPI_Send looks a bit high, that might be
>> improved with ch3:nemesis, but would mainly be due to network
>> configuration.
>>
>> 2. This can make a significant difference on the PETSc operations, but
>> it's not clear that much time is being spent in PETSc anyway. If you do
>> rebuild, maybe you could also upgrade to PETSc-3.0.0.
>
> Okay, so I won't rebuild now but keep this in mind.
>
>>> On the other hand, a main issue was that the ghosted code is by about a
>>> factor of 1.5 slower than the non-ghosted code. That shouldn't have to
>>> do with debugging options, because they were enabled for both versions.
>>
>> Do you have a log from an equivalent non-ghosted run?
>
> Good point. (-:
>
> I don't have one right now. Actually, running on 24 cores on 4 nodes
> won't be possible without ghosted (due to memory shortage). I will
> perform two computations (one ghosted and one non-ghosted) with 8 cores
> on 4 nodes and send you the logs. That will probably be tomorrow.
Great.
Jed
|