From: John P. <pet...@cf...> - 2005-08-23 15:21:05
|
Hi, Does anybody (Ben) have a suggestion on calling dof_indices less, or possibly optimizing it in some way? This is a performance log for a 2D code, where dof_indices is called about 9.7 million times comprising about 9% of the active run time. I've looked at the function itself, and it's extremely clean, there aren't any N^2 algorithms hiding inside it -- hell each call to the function is barely measurable! So I think the answer may be to call it less, but I'm not sure how that can possibly be done...suggestions? -J ------------------------------------------------------------------------------- | libMesh Performance: Alive time=589.784, Active time=758.435 | ------------------------------------------------------------------------------- | Event nCalls Total Avg Percent of | | Time Time Active Time | |-------------------------------------------------------------------------------| | | | | | DofMap | | build_constraint_matrix() 931538 13.6470 0.000015 1.80 | | cnstrn_elem_mat_vec() 931538 17.9481 0.000019 2.37 | | compute_sparsity() 446 9.7677 0.021901 1.29 | | create_dof_constraints() 446 14.1753 0.031783 1.87 | | distribute_dofs() 446 0.6981 0.001565 0.09 | | dof_indices() 9699248 69.6233 0.000007 9.18 | | old_dof_indices() 3573438 25.4853 0.000007 3.36 | | reinit() 446 3.3871 0.007594 0.45 | | | | ErrorVector | | mean() 444 0.0138 0.000031 0.00 | | variance() 222 0.0068 0.000031 0.00 | | | | FE | | compute_face_map() 459128 3.9193 0.000009 0.52 | | compute_map() 1767056 18.5769 0.000011 2.45 | | compute_shape_functions() 1767056 13.8564 0.000008 1.83 | | init_face_shape_functions() 334762 2.3655 0.000007 0.31 | | init_shape_functions() 706688 13.0474 0.000018 1.72 | | inverse_map() 2664273 23.9439 0.000009 3.16 | | | | KellyErrorEstimator | | boundary integrals 24094 3.3071 0.000137 0.44 | | construct side 245368 1.4483 0.000006 0.19 | | dof_indices() 245368 7.6013 0.000031 1.00 | | fe_e->reinit() 245368 29.3907 0.000120 3.88 | | fe_f->reinit() 245368 14.4939 0.000059 1.91 | | inverse_map() 245368 8.9831 0.000037 1.18 | | jump integral 245368 2.5304 0.000010 0.33 | | side->hmax() 245368 1.5651 0.000006 0.21 | | std::sqrt() 222 0.0041 0.000018 0.00 | | | | Mesh | | contract() 222 0.0645 0.000290 0.01 | | find_neighbors() 223 1.1143 0.004997 0.15 | | renumber_nodes_and_elem() 223 0.0647 0.000290 0.01 | | | | MeshRefinement | | _coarsen_elements() 444 0.0646 0.000146 0.01 | | _refine_elements() 444 1.3837 0.003116 0.18 | | add_point() 86448 0.5349 0.000006 0.07 | | make_coarsening_compatible() 1131 0.6501 0.000575 0.09 | | make_refinement_compatible() 1131 0.0405 0.000036 0.01 | | update_nodes_map() 444 0.2081 0.000469 0.03 | | | | MeshTools::Generation | | build_cube() 1 0.0003 0.000316 0.00 | | | | System | | assemble() 1892 277.6570 0.146753 36.61 | | project_vector() 2442 120.3143 0.049269 15.86 | | solve() 1892 56.5525 0.029890 7.46 | ------------------------------------------------------------------------------- | Totals: 2.47e+07 758.4355 100.00 | ------------------------------------------------------------------------------- |
From: Roy S. <roy...@ic...> - 2005-08-23 15:50:23
|
On Tue, 23 Aug 2005, John Peterson wrote: > Does anybody (Ben) have a suggestion on calling dof_indices less, or possibly > optimizing it in some way? This is a performance log for a 2D code, where > dof_indices is called about 9.7 million times comprising about 9% of the > active run time. I've looked at the function itself, and it's > extremely clean, there aren't any N^2 algorithms hiding inside it -- > hell each call to the function is barely measurable! So I think the > answer may be to call it less, but I'm not sure how that can > possibly be done...suggestions? Well, half of the dof_indices and old_dof_indices calls are occuring during the project_vector functions, which currently get called at least twice as often as necessary. We need to implement an "add_unprojected_vector" version of "add_vector" which marks the new vector (i.e. the rhs vector) to be zeroed instead of projected at each remeshing, and we need to redo the arguments to project_vector so that it can be used to project a bunch of vectors at once rather than one at a time. That shouldn't be too complicated and it'll cut 25% or more out of the dof_indices time - I've just been putting it off because I didn't realize the inefficiency was causing noticeable problems. --- Roy |
From: John P. <pet...@cf...> - 2005-08-23 16:00:24
|
Roy Stogner writes: > On Tue, 23 Aug 2005, John Peterson wrote: > > That shouldn't be too complicated and it'll cut 25% or more out of the > dof_indices time - I've just been putting it off because I didn't > realize the inefficiency was causing noticeable problems. Sounds good, 25% off of dof_indices would be awesome actually. If you implement those changes to the project_vector, I can run some test code, and get you more performance numbers. -John |
From: John P. <pet...@cf...> - 2005-08-23 16:42:42
|
Roy Stogner writes: > On Tue, 23 Aug 2005, John Peterson wrote: > > > Does anybody (Ben) have a suggestion on calling dof_indices less, or possibly > > optimizing it in some way? This is a performance log for a 2D code, where > > dof_indices is called about 9.7 million times comprising about 9% of the > > active run time. I've looked at the function itself, and it's > > extremely clean, there aren't any N^2 algorithms hiding inside it -- > > hell each call to the function is barely measurable! So I think the > > answer may be to call it less, but I'm not sure how that can > > possibly be done...suggestions? > > Well, half of the dof_indices and old_dof_indices calls are occuring > during the project_vector functions, which currently get called at > least twice as often as necessary. We need to implement an > "add_unprojected_vector" version of "add_vector" which marks the new > vector (i.e. the rhs vector) to be zeroed instead of projected at each > remeshing, and we need to redo the arguments to project_vector so that > it can be used to project a bunch of vectors at once rather than one > at a time. > > That shouldn't be too complicated and it'll cut 25% or more out of the > dof_indices time - I've just been putting it off because I didn't > realize the inefficiency was causing noticeable problems. As an aside, my results are a bit exaggerated by the fact that I was also doing a predictor-corrector timestepping scheme, so there were about three extra vectors being projected at each timestep than there normally would have been for Crank-Nicolson. -John |
From: Roy S. <roy...@ic...> - 2005-08-23 16:52:26
|
On Tue, 23 Aug 2005, John Peterson wrote: > As an aside, my results are a bit exaggerated by the fact that I was > also doing a predictor-corrector timestepping scheme, so there were about > three extra vectors being projected at each timestep than there normally > would have been for Crank-Nicolson. Okay, you can now pass "false" to add_vector to tell the system not to project it. If your extra vectors don't need to be saved through mesh refinements, make sure to change the invocation for them. I haven't changed project_vector() to do multiple vectors at once yet, though - that'll be a little more involved and a lower priority. If your extra vectors definitely need to be projected, let me know and I'll fix project_vector() next. --- Roy |