Dear Ben and Roy,
I got the different cost time using "METHOD=pro" and "METHOD=dbg". You can
find the details from the following tables. In "dbg", the problem is always
there. However, in "pro", the problem disapears. Any advice? In this case, I
run the codes for both in slave node. Thanks a lot.
in METHOD=pro:
-------------------------------------------------------------------------------------------------------------
| libMesh Performance: Alive time=22.2695, Active
time=14.2938 |
-------------------------------------------------------------------------------------------------------------
| Event nCalls Total Time Avg Time Total
Time Avg Time % of Active Time |
| w/o Sub w/o Sub With
Sub With Sub w/o S With S |
|-------------------------------------------------------------------------------------------------------------|
|
|
|
|
|
DofMap
|
| add_neighbors_to_send_list() 3 0.0997 0.033244
0.1065 0.035511 0.70 0.75 |
| build_constraint_matrix() 33576 0.0186 0.000001
0.0186 0.000001 0.13 0.13 |
| cnstrn_elem_mat_vec() 33576 0.0101 0.000000
0.0101 0.000000 0.07 0.07 |
| compute_sparsity() 3 0.2933 0.097754
0.4115 0.137154 2.05 2.88 |
| create_dof_constraints() 3 0.1100 0.036674
0.1637 0.054568 0.77 1.15 |
| distribute_dofs() 3 0.1971 0.065701
0.6517 0.217241 1.38 4.56 |
| dof_indices() 424766 0.4175 0.000001
0.4175 0.000001 2.92 2.92 |
| enforce_constraints_exactly() 2 0.0055 0.002742
0.0055 0.002742 0.04 0.04 |
| old_dof_indices() 67152 0.0644 0.000001
0.0644 0.000001 0.45 0.45 |
| prepare_send_list() 3 0.0015 0.000506
0.0015 0.000506 0.01 0.01 |
| reinit() 3 0.3766 0.125545
0.3766 0.125545 2.63 2.63 |
|
|
|
FE
|
| compute_affine_map() 161783 0.3852 0.000002
0.3852 0.000002 2.70 2.70 |
| compute_face_map() 64674 0.1622 0.000003
0.1622 0.000003 1.13 1.13 |
| compute_shape_functions() 161783 0.1218 0.000001
0.1218 0.000001 0.85 0.85 |
| init_face_shape_functions() 54129 0.2135 0.000004
0.2135 0.000004 1.49 1.49 |
| init_shape_functions() 115011 0.9611 0.000008
0.9611 0.000008 6.72 6.72 |
| inverse_map() 519958 1.4425 0.000003
1.4425 0.000003 10.09 10.09 |
|
|
|
GMVIO
|
| write_nodal_data() 1 0.1555 0.155485
0.1555 0.155485 1.09 1.09 |
|
|
|
JumpErrorEstimator
|
| estimate_error() 2 1.0333 0.516627
3.9642 1.982106 7.23 27.73 |
|
|
|
LocationMap
|
| find() 50456 0.0286 0.000001
0.0286 0.000001 0.20 0.20 |
| init() 4 0.0226 0.005662
0.0226 0.005662 0.16 0.16 |
|
|
|
Mesh
|
| contract() 2 0.0185 0.009264
0.0462 0.023103 0.13 0.32 |
| find_neighbors() 3 0.5844 0.194807
0.6276 0.209214 4.09 4.39 |
| read() 1 0.2718 0.271756
0.2718 0.271756 1.90 1.90 |
| renumber_nodes_and_elem() 8 0.1015 0.012692
0.1015 0.012692 0.71 0.71 |
|
|
|
MeshCommunication
|
| broadcast_bcs() 1 0.0012 0.001206
0.0330 0.033009 0.01 0.23 |
| broadcast_mesh() 1 0.0422 0.042237
0.0451 0.045126 0.30 0.32 |
| compute_hilbert_indices() 4 1.6419 0.410467
1.6419 0.410467 11.49 11.49 |
| find_global_indices() 4 0.1052 0.026299
1.7979 0.449477 0.74 12.58 |
| parallel_sort() 4 0.0153 0.003821
0.0470 0.011757 0.11 0.33 |
|
|
|
MeshRefinement
|
| _coarsen_elements() 4 0.0276 0.006894
0.0278 0.006938 0.19 0.19 |
| _refine_elements() 4 0.1460 0.036506
0.2486 0.062145 1.02 1.74 |
| add_point() 50456 0.0483 0.000001
0.0890 0.000002 0.34 0.62 |
| make_coarsening_compatible() 5 0.1862 0.037243
0.1862 0.037243 1.30 1.30 |
| make_refinement_compatible() 5 0.0291 0.005817
0.0309 0.006181 0.20 0.22 |
|
|
|
MetisPartitioner
|
| partition() 3 0.3619 0.120617
1.7711 0.590353 2.53 12.39 |
|
|
|
Parallel
|
| allgather() 16 0.0735 0.004591
0.0735 0.004591 0.51 0.51 |
| broadcast() 13 0.0346 0.002663
0.0346 0.002663 0.24 0.24 |
| gather() 3 0.0001 0.000029
0.0001 0.000029 0.00 0.00 |
| max() 30 0.0736 0.002454
0.0736 0.002454 0.52 0.52 |
| min() 16 0.0107 0.000668
0.0107 0.000668 0.07 0.07 |
| probe() 26 0.0213 0.000818
0.0213 0.000818 0.15 0.15 |
| receive() 26 0.0033 0.000128
0.0246 0.000947 0.02 0.17 |
| send() 26 0.0035 0.000136
0.0035 0.000136 0.02 0.02 |
| send_receive() 34 0.0004 0.000012
0.0286 0.000842 0.00 0.20 |
| sum() 20 0.1321 0.006607
0.1321 0.006607 0.92 0.92 |
| wait() 26 0.0000 0.000001
0.0000 0.000001 0.00 0.00 |
|
|
|
Partitioner
|
| set_node_processor_ids() 3 0.1087 0.036232
0.1282 0.042741 0.76 0.90 |
| set_parent_processor_ids() 3 0.0313 0.010417
0.0313 0.010417 0.22 0.22 |
|
|
|
PetscLinearSolver
|
| solve() 3 3.2520 1.083999
3.2520 1.083999 22.75 22.75 |
|
|
|
ProjectVector
|
| operator() 2 0.0847 0.042333
0.1667 0.083344 0.59 1.17 |
|
|
|
System
|
| assemble() 3 0.6752 0.225067
1.6206 0.540184 4.72 11.34 |
| project_vector() 2 0.0870 0.043506
0.3003 0.150132 0.61 2.10 |
-------------------------------------------------------------------------------------------------------------
| Totals: 1737648
14.2938 100.00 |
-------------------------------------------------------------------------------------------------------------
in METHOD=dbg
-------------------------------------------------------------------------------------------------------------
| libMesh Performance: Alive time=970.489, Active
time=958.407 |
-------------------------------------------------------------------------------------------------------------
| Event nCalls Total Time Avg Time Total
Time Avg Time % of Active Time |
| w/o Sub w/o Sub With
Sub With Sub w/o S With S |
|-------------------------------------------------------------------------------------------------------------|
|
|
|
|
|
DofMap
|
| add_neighbors_to_send_list() 3 0.4857 0.161916
0.5219 0.173970 0.05 0.05 |
| build_constraint_matrix() 33576 0.1788 0.000005
0.1788 0.000005 0.02 0.02 |
| cnstrn_elem_mat_vec() 33576 0.1048 0.000003
0.1048 0.000003 0.01 0.01 |
| compute_sparsity() 3 16.2741 5.424711
16.9691 5.656377 1.70 1.77 |
| create_dof_constraints() 3 0.4682 0.156067
0.6576 0.219187 0.05 0.07 |
| distribute_dofs() 3 1.5506 0.516868
3.0141 1.004706 0.16 0.31 |
| dof_indices() 424766 2.6192 0.000006
2.6192 0.000006 0.27 0.27 |
| enforce_constraints_exactly() 2 0.0223 0.011134
0.0223 0.011134 0.00 0.00 |
| old_dof_indices() 67152 0.4018 0.000006
0.4018 0.000006 0.04 0.04 |
| prepare_send_list() 3 0.4080 0.135996
0.4080 0.135996 0.04 0.04 |
| reinit() 3 1.1455 0.381827
1.1455 0.381827 0.12 0.12 |
|
|
|
FE
|
| compute_affine_map() 161783 7.5528 0.000047
7.5528 0.000047 0.79 0.79 |
| compute_face_map() 64674 3.3548 0.000052
3.3548 0.000052 0.35 0.35 |
| compute_shape_functions() 161783 14.1906 0.000088
14.1906 0.000088 1.48 1.48 |
| init_face_shape_functions() 54129 1.8255 0.000034
1.8255 0.000034 0.19 0.19 |
| init_shape_functions() 115011 11.1598 0.000097
11.1598 0.000097 1.16 1.16 |
| inverse_map() 519958 3.8413 0.000007
3.8413 0.000007 0.40 0.40 |
|
|
|
GMVIO
|
| write_nodal_data() 1 0.5128 0.512829
0.5128 0.512829 0.05 0.05 |
|
|
|
JumpErrorEstimator
|
| estimate_error() 2 5.3186 2.659298
33.5234 16.761681 0.55 3.50 |
|
|
|
LocationMap
|
| find() 50456 0.1874 0.000004
0.1874 0.000004 0.02 0.02 |
| init() 4 0.1253 0.031314
0.1253 0.031314 0.01 0.01 |
|
|
|
Mesh
|
| contract() 2 0.1673 0.083652
0.2817 0.140854 0.02 0.03 |
| find_neighbors() 3 7.1693 2.389767
7.6922 2.564061 0.75 0.80 |
| read() 1 1.5032 1.503193
1.5032 1.503193 0.16 0.16 |
| renumber_nodes_and_elem() 8 0.4307 0.053843
0.4307 0.053843 0.04 0.04 |
|
|
|
MeshCommunication
|
| broadcast_bcs() 1 0.0165 0.016476
0.0202 0.020218 0.00 0.00 |
| broadcast_mesh() 1 0.2634 0.263384
0.2666 0.266577 0.03 0.03 |
| compute_hilbert_indices() 4 0.9906 0.247642
0.9906 0.247642 0.10 0.10 |
| find_global_indices() 4 746.7912 186.697788
837.7610 209.440249 77.92 87.41 |
| parallel_sort() 4 44.3904 11.097589
45.6040 11.400992 4.63 4.76 |
|
|
|
MeshRefinement
|
| _coarsen_elements() 4 0.1212 0.030298
0.1414 0.035350 0.01 0.01 |
| _refine_elements() 4 0.5802 0.145040
1.1861 0.296525 0.06 0.12 |
| add_point() 50456 0.2761 0.000005
0.5098 0.000010 0.03 0.05 |
| make_coarsening_compatible() 11 1.9512 0.177386
1.9512 0.177386 0.20 0.20 |
| make_refinement_compatible() 11 0.3092 0.028110
0.3611 0.032829 0.03 0.04 |
|
|
|
MetisPartitioner
|
| partition() 3 2.4345 0.811508
693.1930 231.064329 0.25 72.33 |
|
|
|
Parallel
|
| allgather() 16 0.0325 0.002033
0.0325 0.002033 0.00 0.00 |
| broadcast() 13 0.0066 0.000511
0.0066 0.000511 0.00 0.00 |
| gather() 3 0.0001 0.000037
0.0001 0.000037 0.00 0.00 |
| max() 267 0.3244 0.001215
0.3244 0.001215 0.03 0.03 |
| min() 467 10.4474 0.022371
10.4474 0.022371 1.09 1.09 |
| probe() 26 29.8630 1.148579
29.8630 1.148579 3.12 3.12 |
| receive() 26 0.0065 0.000250
29.8696 1.148832 0.00 3.12 |
| send() 26 14.6244 0.562477
14.6244 0.562477 1.53 1.53 |
| send_receive() 34 0.0025 0.000073
44.4968 1.308729 0.00 4.64 |
| sum() 20 1.2742 0.063712
1.2742 0.063712 0.13 0.13 |
| wait() 26 0.0001 0.000004
0.0001 0.000004 0.00 0.00 |
|
|
|
Partitioner
|
| set_node_processor_ids() 3 0.9364 0.312139
1.1704 0.390143 0.10 0.12 |
| set_parent_processor_ids() 3 0.1398 0.046587
0.1398 0.046587 0.01 0.01 |
|
|
|
PetscLinearSolver
|
| solve() 3 3.9902 1.330075
3.9911 1.330380 0.42 0.42 |
|
|
|
ProjectVector
|
| operator() 2 0.5253 0.262668
0.9876 0.493799 0.05 0.10 |
|
|
|
System
|
| assemble() 3 15.0199 5.006632
33.1146 11.038210 1.57 3.46 |
| project_vector() 2 2.0911 1.045574
3.3312 1.665623 0.22 0.35 |
-------------------------------------------------------------------------------------------------------------
| Totals: 1738348
958.4074 100.00 |
-------------------------------------------------------------------------------------------------------------
Regards,
Yujie
On Wed, Jan 27, 2010 at 10:42 AM, Kirk, Benjamin (JSC-EG311) <
benjamin.kirk-1@...> wrote:
> >> When I sent the following email to libmesh mail list. I met one
> >> problem because of the size of the email. Could you give me some
> >> advice regarding this problem? thanks a lot.
> >
> > It looks like it made it through eventually; just a little late.
>
> I had to approve it based on size, and it was originally sent late US time
> so I didn't get to it until this morning. This is the second approval I've
> had to make in 24 hours, I'll see if there is
>
> > I'm not sure if you'll get an answer, though. Ben is the one
> > responsible for find_global_indices, and he's swamped with other
> > things right now. It does a parallel sort, which can be very
> > sensitive to MPI implementation.
>
> > It only gets used for I/O and the cost should scale more slowly than
> > solves, though; for large implicit 2D/3D problems it shouldn't be an
> > issue even on inefficient MPI implementations.
>
> Yes, this issue is bizarre indeed. The code does not even do that much
> communication there... You might want to compile with METHOD=pro and run it
> through gprof - that will give you finer grained granularity as to what the
> issue may actually be.
>
> Can you confirm that the problem doesn't exist on one processor? What are
> the details of the mesh you are using??
>
> -Ben
>
>
|