The following example highlights how to measure thread contributions to a running loop. The main thing to notice is how it uses separate parallel and do directives in order to allow the timing calls to occur in the gap after becoming multi-threaded and before the loop starts. Any other way would cause different behaviour. For example, placing the timer calls before the parallel would lead to a single timer capturing the end to end time of the loop oblivious to any parallel activity. The timer calls could also be moved into the loop body and as such inside the do directives, and in this case the timing is per thread, but it only includes an accumulation of the loop bodies run by each, the loop overhead will be left out. This wouldn’t be ideal as there might be a large number of calls in some cases and if the loop was light as below the overhead to the user would be noticeable. Note that the PMTM_create_timer call could not be moved inside the loop as only one create is allowed per timer.
:::fortran
program example
use PMTM
use MPI
implicit none
integer :: err_code
type(pmtm_timer) :: loop_timer
integer :: loop_idx
real :: res
integer , parameter :: N = 10000
call MPI_Init(err_code)
! These calls should only occur once, hence they appear before the parallel region.
call PMTM_init("example_file_", "Example Application", & err_code)
call PMTM_parameter_output(PMTM_DEFAULT_INSTANCE , &
"Loop Count", PMTM_OUTPUT_ALWAYS , .false., N, err_code)
! Launch the parallel region. Most importantly the loop_timer needs to be "private".
!$omp parallel default(none), private(err_code , loop_timer), reduction(+:res)
! Running parallel, create a timer in each thread and start it counting. So, if
! OMP_NUM_THREADS=16 for example, there would be 16 timers running after these two
! calls.
call PMTM_create_timer(PMTM_DEFAULT_GROUP , "Loop Timer", PMTM_TIMER_ALL , err_code)
call PMTM_timer_start(loop_timer)
! Run the parallel loop.
!$omp do
do loop_idx = 1, N
res = res + loop_idx
end do
!$omp end do
! Stop all the timers
call PMTM_timer_stop(loop_timer)
!$omp end parallel
! Only one thread will exist here since the "end parallel" will have stopped all but
! one. Call finalize in series which will tidy up and report.
call PMTM_finalize(err_code)
call MPI_finalize(err_code)
end program