You can subscribe to this list here.
2003 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(2) |
Oct
(2) |
Nov
(27) |
Dec
(31) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2004 |
Jan
(6) |
Feb
(15) |
Mar
(33) |
Apr
(10) |
May
(46) |
Jun
(11) |
Jul
(21) |
Aug
(15) |
Sep
(13) |
Oct
(23) |
Nov
(1) |
Dec
(8) |
2005 |
Jan
(27) |
Feb
(57) |
Mar
(86) |
Apr
(23) |
May
(37) |
Jun
(34) |
Jul
(24) |
Aug
(17) |
Sep
(50) |
Oct
(24) |
Nov
(10) |
Dec
(60) |
2006 |
Jan
(47) |
Feb
(46) |
Mar
(127) |
Apr
(19) |
May
(26) |
Jun
(62) |
Jul
(47) |
Aug
(51) |
Sep
(61) |
Oct
(42) |
Nov
(50) |
Dec
(33) |
2007 |
Jan
(60) |
Feb
(55) |
Mar
(77) |
Apr
(102) |
May
(82) |
Jun
(102) |
Jul
(169) |
Aug
(117) |
Sep
(80) |
Oct
(37) |
Nov
(51) |
Dec
(43) |
2008 |
Jan
(71) |
Feb
(94) |
Mar
(98) |
Apr
(125) |
May
(54) |
Jun
(119) |
Jul
(60) |
Aug
(111) |
Sep
(118) |
Oct
(125) |
Nov
(119) |
Dec
(94) |
2009 |
Jan
(109) |
Feb
(38) |
Mar
(93) |
Apr
(88) |
May
(29) |
Jun
(57) |
Jul
(53) |
Aug
(48) |
Sep
(68) |
Oct
(151) |
Nov
(23) |
Dec
(35) |
2010 |
Jan
(84) |
Feb
(60) |
Mar
(184) |
Apr
(112) |
May
(60) |
Jun
(90) |
Jul
(23) |
Aug
(70) |
Sep
(119) |
Oct
(27) |
Nov
(47) |
Dec
(54) |
2011 |
Jan
(22) |
Feb
(19) |
Mar
(92) |
Apr
(93) |
May
(35) |
Jun
(91) |
Jul
(32) |
Aug
(61) |
Sep
(7) |
Oct
(69) |
Nov
(81) |
Dec
(23) |
2012 |
Jan
(64) |
Feb
(95) |
Mar
(35) |
Apr
(36) |
May
(63) |
Jun
(98) |
Jul
(70) |
Aug
(171) |
Sep
(149) |
Oct
(64) |
Nov
(67) |
Dec
(126) |
2013 |
Jan
(108) |
Feb
(104) |
Mar
(171) |
Apr
(133) |
May
(108) |
Jun
(100) |
Jul
(93) |
Aug
(126) |
Sep
(74) |
Oct
(59) |
Nov
(145) |
Dec
(93) |
2014 |
Jan
(38) |
Feb
(45) |
Mar
(26) |
Apr
(41) |
May
(125) |
Jun
(70) |
Jul
(61) |
Aug
(66) |
Sep
(60) |
Oct
(110) |
Nov
(27) |
Dec
(30) |
2015 |
Jan
(43) |
Feb
(67) |
Mar
(71) |
Apr
(92) |
May
(39) |
Jun
(15) |
Jul
(46) |
Aug
(63) |
Sep
(84) |
Oct
(82) |
Nov
(69) |
Dec
(45) |
2016 |
Jan
(92) |
Feb
(91) |
Mar
(148) |
Apr
(43) |
May
(58) |
Jun
(117) |
Jul
(92) |
Aug
(140) |
Sep
(49) |
Oct
(33) |
Nov
(85) |
Dec
(40) |
2017 |
Jan
(41) |
Feb
(36) |
Mar
(49) |
Apr
(41) |
May
(73) |
Jun
(51) |
Jul
(12) |
Aug
(69) |
Sep
(26) |
Oct
(43) |
Nov
(75) |
Dec
(23) |
2018 |
Jan
(86) |
Feb
(36) |
Mar
(50) |
Apr
(28) |
May
(53) |
Jun
(65) |
Jul
(26) |
Aug
(43) |
Sep
(32) |
Oct
(28) |
Nov
(52) |
Dec
(17) |
2019 |
Jan
(39) |
Feb
(26) |
Mar
(71) |
Apr
(30) |
May
(73) |
Jun
(18) |
Jul
(5) |
Aug
(10) |
Sep
(8) |
Oct
(24) |
Nov
(12) |
Dec
(34) |
2020 |
Jan
(17) |
Feb
(10) |
Mar
(6) |
Apr
(4) |
May
(15) |
Jun
(3) |
Jul
(8) |
Aug
(15) |
Sep
(6) |
Oct
(3) |
Nov
|
Dec
(4) |
2021 |
Jan
(4) |
Feb
(4) |
Mar
(21) |
Apr
(14) |
May
(13) |
Jun
(18) |
Jul
(1) |
Aug
(39) |
Sep
(1) |
Oct
|
Nov
(3) |
Dec
|
2022 |
Jan
|
Feb
|
Mar
(2) |
Apr
(8) |
May
|
Jun
|
Jul
|
Aug
(3) |
Sep
|
Oct
(3) |
Nov
|
Dec
|
2023 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(7) |
Sep
(3) |
Oct
|
Nov
|
Dec
(1) |
From: 강신성 <ss...@pu...> - 2021-06-29 02:16:10
|
Hello, all. I have a question about a non-compliant output problem in libMesh. I look into "rb_evaluation.C" and found that there is a code such that *"RB_output_error_bounds[n] = abs_error_bound * eval_output_dual_norm(n, mu);" *in line 292. My questions are: 1) Can we address a non-compliant output problem in libMesh? 2) In the above code, is it correct that "eval_output_dual_norm(n, mu)*"* is the dual norm of the *dual* residual? I look forward to hearing from you. Best regards, Shinseong Kang ------------------------------------------------------------ Shinseong Kang Graduate Student Pusan National University, South Korea Tel.: +82-51-510-3052 H.P.: +82-10-9770-6595 E-mail: ss...@pu... ------------------------------------------------------------ |
From: John P. <jwp...@gm...> - 2021-06-28 18:45:23
|
Hi Edgar, Including your previous results, the ex4 patch alone still has the fastest "Active" time on average: # previous results upstream: AVG= 1.51775 ex4 patch: AVG= 1.29828 # current results virtual function patch: AVG= 1.52382 virtual function + ex4 patches: AVG= 1.6248 But since you rebuilt libmesh between then and now, I'm not sure we should really compare the two. Another thing to mention is that running the test in parallel is probably counter productive to our goals since: 1.) It reduces the overall active time, and the shorter the duration of the thing you are trying to time, the more it is affected by the timing code itself. 2.) It will introduce more variability in the results. Currently, the coefficient of variation (mean divided by stddev) for these results is on the order of 15-18%, a fair bit larger than the differences in the times themselves. If you are interested in investigating this further, I would suggest that you re-check the previous results on your current libmesh build, but I'd also run everything in serial to try and reduce the variation to the point where one of the four possible versions is statistically faster than all the others... -- John On Sun, Jun 27, 2021 at 10:03 PM edgar <edg...@cr...> wrote: > On 2021-06-18 21:45, John Peterson wrote: > > Your compiler flags are definitely far more advanced/aggressive than > > mine, > > I cannot take credit for that, really. I only modified the -O2 to -O3, > made sure that -funroll_loops was there and customised to my processor > (amdfam10). All the other flags come directly from the Makefile provided > by libMesh. > > > which are just on the default of -O2. However, I think what we should > > conclude from your results is that there is something slower than it > > needs > > to be with DenseMatrix::resize(), not that we should move the > > DenseMatrix > > creation/destruction inside the loop over elements. What I tried (see > > attached patch or the "dense_matrix_resize_no_virtual" branch in my > > fork) > > is avoiding the virtual function call to DenseMatrix::zero() which is > > currently made from DenseMatrix::resize(). In my testing, this change > > did > > not seem to make much of a difference but I'm curious about what you > > would > > get with your compiler args, this patch, and the unpatched ex4. > > There _is_ something consistently different for sure. I only ran the > case with `mpirun -np 4' and `-n 40'. The difference of the sums of > times is in the order of 1 second. For five tests of this size and my > rather limited system, I would say that your change yields marginally > faster computation, and should be used. In which case, my modifications > should be avoided. > > In the interest of completeness, I need to say that I had to rebuild > libMesh, because of compilation errors. I don't quite remember what > version it is right now, but it is not the updated master branch (due to > some issues that I am having with my Internet connection). Although this > may not affect the comparison, it should be noted. > > The results are shown below and in examples/introduction/sums.org > > #+name: tbl-results > #+caption: The first two columns correspond to the (patched) original > code. The last pair are the results with my modification (also with > patch). In each case, the first of the columns is alive time, and the > second one is active time. Data was copied from the .bz2 files. > | 3.65205 | 1.292 | 3.63248 | 1.31057 | > | 4.82533 | 1.76303 | 5.31107 | 1.95794 | > | 5.05955 | 1.84457 | 5.26696 | 1.964 | > | 3.86126 | 1.40952 | 3.53834 | 1.29313 | > | 3.58892 | 1.30998 | 4.369 | 1.59834 | > > #+caption: calculate the sums of each column > #+begin_src python :var data=tbl-results > ex4_alive = sum((I[0] for I in data)) > ex4_active = sum((I[1] for I in data)) > ex4_mod_alive = sum((I[2] for I in data)) > ex4_mod_active = sum((I[3] for I in data)) > return [["ex4_alive", "ex4_active", "ex4_mod_alive", > "ex4_mod_active"], > None, > [ex4_alive, ex4_active, ex4_mod_alive, ex4_mod_active]] > #+end_src > > #+RESULTS: > | ex4_alive | ex4_active | ex4_mod_alive | ex4_mod_active | > |-----------+--------------------+--------------------+----------------| > | 20.98711 | 7.6190999999999995 | 22.117849999999997 | 8.12398 | -- John |
From: Vikram G. <vik...@gm...> - 2021-06-28 04:52:25
|
Direct modification of the matrix (in general) to apply such a constraint will need consideration of the basis type and order, and how these map to the dofs. I am not sure this will be that easy to do, but I might be wrong. Vikram Garg vikramvgarg.github.io/ On Sun, Jun 27, 2021 at 9:00 PM Renato Poli <re...@gm...> wrote: > I see your point. I was resisting to use the penalty while looking for a > more elegant solution. > > Just to get a feeling from your side: I thought on adding master-slave > constraints through constrain rows, before system assembly. Does that sound > reasonable? > > Thanks, > Renato > > Em dom., 27 de jun. de 2021 22:44, Vikram Garg <vik...@gm...> > escreveu: > >> Hi Renato, >> I was suggesting that you use the scalar variable as the >> unknown uniform displacement on the boundary. I think it should be okay to >> add it as a variable that exists on the entire domain, as long as you >> specify the penalization only on the boundary. >> >> It looks like the penalty method has already been used to apply such >> constraints on the displacements, see page 21 here: >> https://www.osti.gov/servlets/purl/1463026 >> >> Thanks. >> Vikram Garg >> >> vikramvgarg.github.io/ >> >> >> On Sun, Jun 27, 2021 at 7:19 PM Renato Poli <re...@gm...> wrote: >> >>> Hi, thanks for the answer: >>> >>> @edgar: it is a FORCE (Neumann) while DISPLACEMENTS (primary variables) >>> must be tied together. (I am trying to solve Mandel's problem in >>> poroelasticity). Please let me know if it is clear. >>> >>> @Vikram: I did not get exactly the use of the SCALAR variable. It is a >>> single value in the whole domain, is that correct? Should I add this >>> variable _only_ in the boundary domain? I am digging into the >>> systems_of_equations_ex3, let's see if it gets clearer in the next few >>> days. >>> >>> Thanks, >>> Renato >>> >>> On Sun, Jun 27, 2021 at 8:38 PM Vikram Garg <vik...@gm...> >>> wrote: >>> >>>> You might need a Lagrange multiplier formulation that sets a state >>>> variable >>>> in an entire subdomain or boundary to a single unknown scalar. Example >>>> 3 in >>>> systems of equations might help, it shows how SCALAR variables can be >>>> used. >>>> You could incorporate the scalar variable into the weak form via a >>>> penalty >>>> method. >>>> >>>> Vikram Garg >>>> >>>> vikramvgarg.github.io/ >>>> >>>> >>>> On Sun, Jun 27, 2021 at 6:10 PM edgar <edg...@cr...> wrote: >>>> >>>> > On 2021-06-27 23:01, Renato Poli wrote: >>>> > > I'd like to add a force to a whole boundary. The constraint is that >>>> the >>>> > > whole boundary must have the same displacement. >>>> > > ---8<--- snip >>>> > >>>> > For the sake of disambiguation: do you want to impose displacement or >>>> > force? The displacement of a boundary is not only dependent on the >>>> force >>>> > it receives. >>>> > >>>> > I am guessing that the developers (regular user here! hi!) would like >>>> to >>>> > know. >>>> > >>>> > >>>> > _______________________________________________ >>>> > Libmesh-users mailing list >>>> > Lib...@li... >>>> > https://lists.sourceforge.net/lists/listinfo/libmesh-users >>>> > >>>> >>>> _______________________________________________ >>>> Libmesh-users mailing list >>>> Lib...@li... >>>> https://lists.sourceforge.net/lists/listinfo/libmesh-users >>>> >>> |
From: edgar <edg...@cr...> - 2021-06-28 03:24:44
|
On 2021-06-18 21:45, John Peterson wrote: > Your compiler flags are definitely far more advanced/aggressive than > mine, I cannot take credit for that, really. I only modified the -O2 to -O3, made sure that -funroll_loops was there and customised to my processor (amdfam10). All the other flags come directly from the Makefile provided by libMesh. > which are just on the default of -O2. However, I think what we should > conclude from your results is that there is something slower than it > needs > to be with DenseMatrix::resize(), not that we should move the > DenseMatrix > creation/destruction inside the loop over elements. What I tried (see > attached patch or the "dense_matrix_resize_no_virtual" branch in my > fork) > is avoiding the virtual function call to DenseMatrix::zero() which is > currently made from DenseMatrix::resize(). In my testing, this change > did > not seem to make much of a difference but I'm curious about what you > would > get with your compiler args, this patch, and the unpatched ex4. There _is_ something consistently different for sure. I only ran the case with `mpirun -np 4' and `-n 40'. The difference of the sums of times is in the order of 1 second. For five tests of this size and my rather limited system, I would say that your change yields marginally faster computation, and should be used. In which case, my modifications should be avoided. In the interest of completeness, I need to say that I had to rebuild libMesh, because of compilation errors. I don't quite remember what version it is right now, but it is not the updated master branch (due to some issues that I am having with my Internet connection). Although this may not affect the comparison, it should be noted. The results are shown below and in examples/introduction/sums.org #+name: tbl-results #+caption: The first two columns correspond to the (patched) original code. The last pair are the results with my modification (also with patch). In each case, the first of the columns is alive time, and the second one is active time. Data was copied from the .bz2 files. | 3.65205 | 1.292 | 3.63248 | 1.31057 | | 4.82533 | 1.76303 | 5.31107 | 1.95794 | | 5.05955 | 1.84457 | 5.26696 | 1.964 | | 3.86126 | 1.40952 | 3.53834 | 1.29313 | | 3.58892 | 1.30998 | 4.369 | 1.59834 | #+caption: calculate the sums of each column #+begin_src python :var data=tbl-results ex4_alive = sum((I[0] for I in data)) ex4_active = sum((I[1] for I in data)) ex4_mod_alive = sum((I[2] for I in data)) ex4_mod_active = sum((I[3] for I in data)) return [["ex4_alive", "ex4_active", "ex4_mod_alive", "ex4_mod_active"], None, [ex4_alive, ex4_active, ex4_mod_alive, ex4_mod_active]] #+end_src #+RESULTS: | ex4_alive | ex4_active | ex4_mod_alive | ex4_mod_active | |-----------+--------------------+--------------------+----------------| | 20.98711 | 7.6190999999999995 | 22.117849999999997 | 8.12398 | |
From: edgar <edg...@cr...> - 2021-06-28 02:41:53
|
On 2021-06-28 00:18, Renato Poli wrote: > Hi, thanks for the answer: > > @edgar: it is a FORCE (Neumann) while DISPLACEMENTS (primary variables) > must be tied together ... Please let me know if it is clear. Totally. Thanks. I wish I could do more :) . > Thanks, > Renato |
From: Renato P. <re...@gm...> - 2021-06-28 00:22:09
|
Hi, thanks for the answer: @edgar: it is a FORCE (Neumann) while DISPLACEMENTS (primary variables) must be tied together. (I am trying to solve Mandel's problem in poroelasticity). Please let me know if it is clear. @Vikram: I did not get exactly the use of the SCALAR variable. It is a single value in the whole domain, is that correct? Should I add this variable _only_ in the boundary domain? I am digging into the systems_of_equations_ex3, let's see if it gets clearer in the next few days. Thanks, Renato On Sun, Jun 27, 2021 at 8:38 PM Vikram Garg <vik...@gm...> wrote: > You might need a Lagrange multiplier formulation that sets a state variable > in an entire subdomain or boundary to a single unknown scalar. Example 3 in > systems of equations might help, it shows how SCALAR variables can be used. > You could incorporate the scalar variable into the weak form via a penalty > method. > > Vikram Garg > > vikramvgarg.github.io/ > > > On Sun, Jun 27, 2021 at 6:10 PM edgar <edg...@cr...> wrote: > > > On 2021-06-27 23:01, Renato Poli wrote: > > > I'd like to add a force to a whole boundary. The constraint is that the > > > whole boundary must have the same displacement. > > > ---8<--- snip > > > > For the sake of disambiguation: do you want to impose displacement or > > force? The displacement of a boundary is not only dependent on the force > > it receives. > > > > I am guessing that the developers (regular user here! hi!) would like to > > know. > > > > > > _______________________________________________ > > Libmesh-users mailing list > > Lib...@li... > > https://lists.sourceforge.net/lists/listinfo/libmesh-users > > > > _______________________________________________ > Libmesh-users mailing list > Lib...@li... > https://lists.sourceforge.net/lists/listinfo/libmesh-users > |
From: Vikram G. <vik...@gm...> - 2021-06-27 23:38:18
|
You might need a Lagrange multiplier formulation that sets a state variable in an entire subdomain or boundary to a single unknown scalar. Example 3 in systems of equations might help, it shows how SCALAR variables can be used. You could incorporate the scalar variable into the weak form via a penalty method. Vikram Garg vikramvgarg.github.io/ On Sun, Jun 27, 2021 at 6:10 PM edgar <edg...@cr...> wrote: > On 2021-06-27 23:01, Renato Poli wrote: > > I'd like to add a force to a whole boundary. The constraint is that the > > whole boundary must have the same displacement. > > ---8<--- snip > > For the sake of disambiguation: do you want to impose displacement or > force? The displacement of a boundary is not only dependent on the force > it receives. > > I am guessing that the developers (regular user here! hi!) would like to > know. > > > _______________________________________________ > Libmesh-users mailing list > Lib...@li... > https://lists.sourceforge.net/lists/listinfo/libmesh-users > |
From: edgar <edg...@cr...> - 2021-06-27 23:09:59
|
On 2021-06-27 23:01, Renato Poli wrote: > I'd like to add a force to a whole boundary. The constraint is that the > whole boundary must have the same displacement. > ---8<--- snip For the sake of disambiguation: do you want to impose displacement or force? The displacement of a boundary is not only dependent on the force it receives. I am guessing that the developers (regular user here! hi!) would like to know. |
From: Renato P. <re...@gm...> - 2021-06-27 23:02:19
|
Hi, I'd like to add a force to a whole boundary. The constraint is that the whole boundary must have the same displacement. It seems desirable to add the force to an extra node, and then constraint the whole boundary to this note through add_constrain_row. Is that so? Do you see any better approach? Can you point to some example code to guide me? Thanks, Renato |
From: edgar <edg...@cr...> - 2021-06-19 02:54:38
|
On 2021-06-18 21:45, John Peterson wrote: > On Thu, Jun 10, 2021 at 5:55 PM edgar <edg...@cr...> wrote: > >> On 2021-06-10 19:27, John Peterson wrote: >> > I recorded the "Active time" for the "Matrix Assembly Performance" >> > PerfLog >> > in introduction_ex4 running "./example-opt -d 3 -n 40" for both the >> > original codepath and your proposed change, averaging the results over >> > 5 >> > runs. The results were: >> > >> > Original code, "./example-opt -d 3 -n 40" >> > import numpy as np >> > np.mean([3.91801, 3.93206, 3.94358, 3.97729, 3.90512]) = 3.93 >> > >> > Patch, "./example-opt -d 3 -n 40" >> > import numpy as np >> > np.mean([4.10462, 4.06232, 3.95176, 3.92786, 3.97992]) = 4.00 >> > >> > so I'd say the original code path is marginally (but still >> > statistically >> > significantly) faster, although keep in mind that matrix assembly is >> > only >> > about 21% of the total time for this example while the solve is about >> > 71%. >> >> Superinteresting, I am sending you my benchmarks. I must say that I >> had >> initially run only 2 benchmarks, and both came out faster with the >> modifications. Now, I found that >> - The original code is more efficient with `-n 40' >> - The modified code is more efficient with `-n 15' and `mpirun -np 4' >> - That I ran the 5-test trial several times and some times, the >> original >> code was more efficient with `-n 15', but the first and second run >> with >> the modified code were always faster (my computer heating up?) >> >> The gains are really marginal in any case. It would be interesting to >> run with -O3... (I just did [1]). >> It seems that the differences are now a little bit more substantial, >> and >> that the modified code would be faster. I hope not to have made any >> mistakes. >> >> The code and the benchmarks are in the attached file. >> - examples >> |- introduction >> |- ex4 (original code) >> |- output_*_.txt.bz2 (running -n 40 with -O2) >> |- output_15_*_.txt.bz2 (running -n 15 with -O2) >> |- output_40_O3_*_.txt.bz2 (running -n 40 with -O3) >> |- ex4_mod (modified code) >> |- output_*_.txt.bz2 (running -n 40 with -O2) >> |- output_15_*_.txt.bz2 (running -n 15 with -O2) >> |- output_40_O3_*_.txt.bz2 (running -n 40 with -O3) >> >> >> [1] I manually compiled like this (added -O3 instead of -O2; disregard >> the CCFLAGS et al): >> >> $ mpicxx -std=gnu++17 -DNDEBUG -march=amdfam10 -O3 >> > > > Your compiler flags are definitely far more advanced/aggressive than > mine, > which are just on the default of -O2. However, I think what we should > conclude from your results is that there is something slower than it > needs > to be with DenseMatrix::resize(), not that we should move the > DenseMatrix > creation/destruction inside the loop over elements. What I tried (see > attached patch or the "dense_matrix_resize_no_virtual" branch in my > fork) > is avoiding the virtual function call to DenseMatrix::zero() which is > currently made from DenseMatrix::resize(). In my testing, this change > did > not seem to make much of a difference but I'm curious about what you > would > get with your compiler args, this patch, and the unpatched ex4. I will surely test it. I will have more time next week. Sorry for the delay. |
From: John P. <jwp...@gm...> - 2021-06-18 21:45:41
|
On Thu, Jun 10, 2021 at 5:55 PM edgar <edg...@cr...> wrote: > On 2021-06-10 19:27, John Peterson wrote: > > I recorded the "Active time" for the "Matrix Assembly Performance" > > PerfLog > > in introduction_ex4 running "./example-opt -d 3 -n 40" for both the > > original codepath and your proposed change, averaging the results over > > 5 > > runs. The results were: > > > > Original code, "./example-opt -d 3 -n 40" > > import numpy as np > > np.mean([3.91801, 3.93206, 3.94358, 3.97729, 3.90512]) = 3.93 > > > > Patch, "./example-opt -d 3 -n 40" > > import numpy as np > > np.mean([4.10462, 4.06232, 3.95176, 3.92786, 3.97992]) = 4.00 > > > > so I'd say the original code path is marginally (but still > > statistically > > significantly) faster, although keep in mind that matrix assembly is > > only > > about 21% of the total time for this example while the solve is about > > 71%. > > Superinteresting, I am sending you my benchmarks. I must say that I had > initially run only 2 benchmarks, and both came out faster with the > modifications. Now, I found that > - The original code is more efficient with `-n 40' > - The modified code is more efficient with `-n 15' and `mpirun -np 4' > - That I ran the 5-test trial several times and some times, the original > code was more efficient with `-n 15', but the first and second run with > the modified code were always faster (my computer heating up?) > > The gains are really marginal in any case. It would be interesting to > run with -O3... (I just did [1]). > It seems that the differences are now a little bit more substantial, and > that the modified code would be faster. I hope not to have made any > mistakes. > > The code and the benchmarks are in the attached file. > - examples > |- introduction > |- ex4 (original code) > |- output_*_.txt.bz2 (running -n 40 with -O2) > |- output_15_*_.txt.bz2 (running -n 15 with -O2) > |- output_40_O3_*_.txt.bz2 (running -n 40 with -O3) > |- ex4_mod (modified code) > |- output_*_.txt.bz2 (running -n 40 with -O2) > |- output_15_*_.txt.bz2 (running -n 15 with -O2) > |- output_40_O3_*_.txt.bz2 (running -n 40 with -O3) > > > [1] I manually compiled like this (added -O3 instead of -O2; disregard > the CCFLAGS et al): > > $ mpicxx -std=gnu++17 -DNDEBUG -march=amdfam10 -O3 > Your compiler flags are definitely far more advanced/aggressive than mine, which are just on the default of -O2. However, I think what we should conclude from your results is that there is something slower than it needs to be with DenseMatrix::resize(), not that we should move the DenseMatrix creation/destruction inside the loop over elements. What I tried (see attached patch or the "dense_matrix_resize_no_virtual" branch in my fork) is avoiding the virtual function call to DenseMatrix::zero() which is currently made from DenseMatrix::resize(). In my testing, this change did not seem to make much of a difference but I'm curious about what you would get with your compiler args, this patch, and the unpatched ex4. -- John |
From: edgar <edg...@cr...> - 2021-06-10 22:55:59
|
On 2021-06-10 19:27, John Peterson wrote: > I recorded the "Active time" for the "Matrix Assembly Performance" > PerfLog > in introduction_ex4 running "./example-opt -d 3 -n 40" for both the > original codepath and your proposed change, averaging the results over > 5 > runs. The results were: > > Original code, "./example-opt -d 3 -n 40" > import numpy as np > np.mean([3.91801, 3.93206, 3.94358, 3.97729, 3.90512]) = 3.93 > > Patch, "./example-opt -d 3 -n 40" > import numpy as np > np.mean([4.10462, 4.06232, 3.95176, 3.92786, 3.97992]) = 4.00 > > so I'd say the original code path is marginally (but still > statistically > significantly) faster, although keep in mind that matrix assembly is > only > about 21% of the total time for this example while the solve is about > 71%. Superinteresting, I am sending you my benchmarks. I must say that I had initially run only 2 benchmarks, and both came out faster with the modifications. Now, I found that - The original code is more efficient with `-n 40' - The modified code is more efficient with `-n 15' and `mpirun -np 4' - That I ran the 5-test trial several times and some times, the original code was more efficient with `-n 15', but the first and second run with the modified code were always faster (my computer heating up?) The gains are really marginal in any case. It would be interesting to run with -O3... (I just did [1]). It seems that the differences are now a little bit more substantial, and that the modified code would be faster. I hope not to have made any mistakes. The code and the benchmarks are in the attached file. - examples |- introduction |- ex4 (original code) |- output_*_.txt.bz2 (running -n 40 with -O2) |- output_15_*_.txt.bz2 (running -n 15 with -O2) |- output_40_O3_*_.txt.bz2 (running -n 40 with -O3) |- ex4_mod (modified code) |- output_*_.txt.bz2 (running -n 40 with -O2) |- output_15_*_.txt.bz2 (running -n 15 with -O2) |- output_40_O3_*_.txt.bz2 (running -n 40 with -O3) [1] I manually compiled like this (added -O3 instead of -O2; disregard the CCFLAGS et al): $ mpicxx -std=gnu++17 -DNDEBUG -march=amdfam10 -O3 -felide-constructors -funroll-loops -fstrict-aliasing -Wdisabled-optimization -fopenmp -I/usr/include -I/usr/include/curl -I -I/usr/include -I/usr/include/eigen3 -I/usr/include/vtk -I/usr/local/petsc/linux-c-opt/include -I/usr/local/petsc/linux-c-opt//include -I/usr/include/superlu -I/usr/local/include -I/usr/include/scotch -I/usr/include/tirpc -c exact_solution.C -o exact_solution.x86_64-pc-linux-gnu.opt.o $ mpicxx -std=gnu++17 -DNDEBUG -march=amdfam10 -O3 -felide-constructors -funroll-loops -fstrict-aliasing -Wdisabled-optimization -fopenmp -I/usr/include -I/usr/include/curl -I -I/usr/include -I/usr/include/eigen3 -I/usr/include/vtk -I/usr/local/petsc/linux-c-opt/include -I/usr/local/petsc/linux-c-opt//include -I/usr/include/superlu -I/usr/local/include -I/usr/include/scotch -I/usr/include/tirpc -c introduction_ex4.C -o introduction_ex4.x86_64-pc-linux-gnu.opt.o $ mpicxx -std=gnu++17 -march=amdfam10 -O3 -felide-constructors -funroll-loops -fstrict-aliasing -Wdisabled-optimization -fopenmp exact_solution.x86_64-pc-linux-gnu.opt.o introduction_ex4.x86_64-pc-linux-gnu.opt.o -o example-opt -Wl,-rpath -Wl,/usr/lib -Wl,-rpath -Wl,/lib -Wl,-rpath -Wl,/usr/lib -Wl,-rpath -Wl,/usr/local/petsc/linux-c-opt/lib -Wl,-rpath -Wl,/usr/local/lib -Wl,-rpath -Wl,/usr/include/scotch -Wl,-rpath -Wl,/usr/lib/openmpi -Wl,-rpath -Wl,/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0 /usr/lib/libHYPRE.so -L/usr/lib -lmesh_opt -ltimpi_opt -L/lib -L/usr/local/petsc/linux-c-opt/lib -L/usr/local/lib -L/usr/include/scotch -L/usr/lib/openmpi -L/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0 -lhdf5_cpp -lcurl -lnlopt -lglpk -lvtkIOCore -lvtkCommonCore -lvtkCommonDataModel -lvtkFiltersCore -lvtkIOXML -lvtkImagingCore -lvtkIOImage -lvtkImagingMath -lvtkIOParallelXML -lvtkParallelMPI -lvtkParallelCore -lvtkCommonExecutionModel -lpetsc -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lsuperlu -lfftw3_mpi -lfftw3 -llapack -lblas -lopenblas -lesmumps -lptscotch -lptscotcherr -lscotch -lscotcherr -lbz2 -lcgns -lmedC -lmed -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lmetis -lz -lOpenCL -lyaml -lhwloc -lX11 -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgcc_s -lpthread -lquadmath -lstdc++ -ldl -ltirpc -fopenmp |
From: John P. <jwp...@gm...> - 2021-06-10 19:28:20
|
On Thu, Jun 10, 2021 at 12:05 PM edgar <edg...@cr...> wrote: > On 2021-06-10 16:00, John Peterson wrote: > > On Wed, Jun 9, 2021 at 9:11 PM edgar <edg...@cr...> wrote: > > > >> Hi, > >> > >> I am humbly sharing something which I think would improve the > >> documentation and the logic of examples 3 and 4 a bit. I think that > >> this > >> would apply to other examples as well. (I was planning to keep > >> learning > >> from the examples, and have a more substantial contribution at the > >> end, > >> but it has been like a month since I last touched libMesh, and I it > >> seems that I am going to be very busy in the next couple of months). > >> > >> Thanks. > > > > > > Hi Edgar, > > > > I agree with the updates to the code comments in both files, so thanks > > for > > those. In the ex4 diff, it looks like you move the Ke, Fe declarations > > from > > outside the for-loop over elements to inside? This is not likely to be > > an > > optimization, though, since creating that storage once and "resizing" > > it > > many times in the loop avoids dynamic memory allocations... the resizes > > are > > no-ops if the same Elem type is used at each iteration of the for-loop. > > If > > you have some performance profiling for this example that suggests > > otherwise, I'd be happy to take a look. > > In all honesty, John, I did run a performance log on them, and the > modification was faster, but I don't have it anymore. As I implied, my > intention was to implement the changes in most examples, but I just > haven't had the time. I can reproduce the logs, but I don't know when I > will have the time for that :S (sorry :( ). I guess that the reduced > time comes from the compiler recognising the variable as short-lived > within the loop and avoiding the resizing of the matrices for each loop. > I recorded the "Active time" for the "Matrix Assembly Performance" PerfLog in introduction_ex4 running "./example-opt -d 3 -n 40" for both the original codepath and your proposed change, averaging the results over 5 runs. The results were: Original code, "./example-opt -d 3 -n 40" import numpy as np np.mean([3.91801, 3.93206, 3.94358, 3.97729, 3.90512]) = 3.93 Patch, "./example-opt -d 3 -n 40" import numpy as np np.mean([4.10462, 4.06232, 3.95176, 3.92786, 3.97992]) = 4.00 so I'd say the original code path is marginally (but still statistically significantly) faster, although keep in mind that matrix assembly is only about 21% of the total time for this example while the solve is about 71%. -- John |
From: edgar <edg...@cr...> - 2021-06-10 17:05:27
|
On 2021-06-10 16:00, John Peterson wrote: > On Wed, Jun 9, 2021 at 9:11 PM edgar <edg...@cr...> wrote: > >> Hi, >> >> I am humbly sharing something which I think would improve the >> documentation and the logic of examples 3 and 4 a bit. I think that >> this >> would apply to other examples as well. (I was planning to keep >> learning >> from the examples, and have a more substantial contribution at the >> end, >> but it has been like a month since I last touched libMesh, and I it >> seems that I am going to be very busy in the next couple of months). >> >> Thanks. > > > Hi Edgar, > > I agree with the updates to the code comments in both files, so thanks > for > those. In the ex4 diff, it looks like you move the Ke, Fe declarations > from > outside the for-loop over elements to inside? This is not likely to be > an > optimization, though, since creating that storage once and "resizing" > it > many times in the loop avoids dynamic memory allocations... the resizes > are > no-ops if the same Elem type is used at each iteration of the for-loop. > If > you have some performance profiling for this example that suggests > otherwise, I'd be happy to take a look. In all honesty, John, I did run a performance log on them, and the modification was faster, but I don't have it anymore. As I implied, my intention was to implement the changes in most examples, but I just haven't had the time. I can reproduce the logs, but I don't know when I will have the time for that :S (sorry :( ). I guess that the reduced time comes from the compiler recognising the variable as short-lived within the loop and avoiding the resizing of the matrices for each loop. It may take some days before I reply. |
From: John P. <jwp...@gm...> - 2021-06-10 16:01:20
|
On Wed, Jun 9, 2021 at 9:11 PM edgar <edg...@cr...> wrote: > Hi, > > I am humbly sharing something which I think would improve the > documentation and the logic of examples 3 and 4 a bit. I think that this > would apply to other examples as well. (I was planning to keep learning > from the examples, and have a more substantial contribution at the end, > but it has been like a month since I last touched libMesh, and I it > seems that I am going to be very busy in the next couple of months). > > Thanks. Hi Edgar, I agree with the updates to the code comments in both files, so thanks for those. In the ex4 diff, it looks like you move the Ke, Fe declarations from outside the for-loop over elements to inside? This is not likely to be an optimization, though, since creating that storage once and "resizing" it many times in the loop avoids dynamic memory allocations... the resizes are no-ops if the same Elem type is used at each iteration of the for-loop. If you have some performance profiling for this example that suggests otherwise, I'd be happy to take a look. -- John |
From: edgar <edg...@cr...> - 2021-06-10 02:11:20
|
Hi, I am humbly sharing something which I think would improve the documentation and the logic of examples 3 and 4 a bit. I think that this would apply to other examples as well. (I was planning to keep learning from the examples, and have a more substantial contribution at the end, but it has been like a month since I last touched libMesh, and I it seems that I am going to be very busy in the next couple of months). Thanks. |
From: John P. <jwp...@gm...> - 2021-05-13 14:39:26
|
On Wed, May 12, 2021 at 12:50 PM edgar <edg...@cr...> wrote: > On 2021-05-12 16:57, edgar wrote: > > On 2021-05-10 15:59, John Peterson wrote: > >> On Thu, May 6, 2021 at 9:48 PM edgar <edg...@cr...> wrote: > >> We renumbered our examples once (many years ago) so it's possible that > >> references to "ex13" that you see have simply never been updated > >> properly. > >> If you can point us to them I'll take a look. > > > > I see. I found these: > > > > ┌──── > > │ find . -type f -exec grep --color -nH --null -e 'example 13' \{\} + > > └──── > > > > ┌──── > > │ ./fem_system/ex1/fem_system_ex1.C\026:// example 13 can be solved > > using the > > │ ./systems_of_equations/ex3/systems_of_equations_ex3.C\025:// example > > 13 can be solved using a scalar Lagrange multiplier > > │ ./vector_fe/ex2/vector_fe_ex2.C\026:// example 13 can be solved using > > the > > └──── > > I found another one of these in fem_system_ex3.C: > > // This is just Systems of Equations Example 6 recast. > > In that same example, it reads: > > // Declare the system "Navier-Stokes" and its variables. > > But the system is named "Linear Elasticity". I also think that these two > can be merged into 1: > > #+begin_src diff > --- a/examples/fem_system/ex3/fem_system_ex3.C 2021-03-22 > 18:33:18.000000000 -0600 > +++ b/examples/fem_system/ex3/fem_system_ex3.C 2021-05-12 > 12:41:14.847399776 -0500 > @@ -172,7 +172,7 @@ > // Create an equation systems object. > EquationSystems equation_systems (mesh); > > - // Declare the system "Navier-Stokes" and its variables. > + // Declare the system "Linear Elasticity" and its variables. > ElasticitySystem & system = > equation_systems.add_system<ElasticitySystem> ("Linear > Elasticity"); > > @@ -195,10 +195,10 @@ > a_system->add_variable("u_accel", FIRST, LAGRANGE); > a_system->add_variable("v_accel", FIRST, LAGRANGE); > a_system->add_variable("w_accel", FIRST, LAGRANGE); > - } > > - if (time_solver == std::string("newmark")) > - system.time_solver = libmesh_make_unique<NewmarkSolver>(system); > + system.time_solver = > libmesh_make_unique<NewmarkSolver>(system); > + } > + > > else if( time_solver == std::string("euler") ) > { > #+end_src > Thanks for pointing those out, I pushed a fix for both. -- John |
From: John P. <jwp...@gm...> - 2021-05-13 14:18:47
|
On Wed, May 12, 2021 at 11:58 AM edgar <edg...@cr...> wrote: > On 2021-05-10 15:59, John Peterson wrote: > > On Thu, May 6, 2021 at 9:48 PM edgar <edg...@cr...> wrote: > > We renumbered our examples once (many years ago) so it's possible that > > references to "ex13" that you see have simply never been updated > > properly. > > If you can point us to them I'll take a look. > > I see. I found these: > > ┌──── > │ find . -type f -exec grep --color -nH --null -e 'example 13' \{\} + > └──── > > ┌──── > │ ./fem_system/ex1/fem_system_ex1.C\026:// example 13 can be solved > using the > │ ./systems_of_equations/ex3/systems_of_equations_ex3.C\025:// example > 13 can be solved using a scalar Lagrange multiplier > │ ./vector_fe/ex2/vector_fe_ex2.C\026:// example 13 can be solved using > the > └──── > Thanks, these have now been updated to refer to "systems_of_equations_ex2" which is what "example 13" became. -- John |
From: Tobias M. <tob...@un...> - 2021-05-12 21:57:56
|
On 5/12/21 11:41 PM, John Peterson wrote: > > > On Wed, May 12, 2021 at 4:25 PM Tobias Moehle > <tob...@un... <mailto:tob...@un...>> > wrote: > > Dear all, > > I hope that someone has experience or some good ideas to help me > out: I > am using a setup where the finite element basis is augmented by a > function which contains the most rapidly changing features of the > solution to allow for a considerably coarser grid. > > However, when it comes to printing of the solution, I am stuck: > - In best case, I'd like to use a refined mesh for printing. To > represent the solution reasonably good, I have tried to copy the mesh > into another one which I refine. My initial idea was to use a child > class to MeshFunction to project the original solution with the > augmentation function onto the finer grid; but since the > "MeshFunction::operator()" is not virtual, one cannot overload it. > > > It looks like those operator() functions aren't explicitly marked > virtual, but they definitely are! (Also, they are marked "override", > which is another clue that they are virtual.) So your approach of > subclassing MeshFunction sounds like a reasonable one to me... I will > push a commit adding the virtual keyword where it's missing. > Oh, that sounds very good! I actually had the impression that it is rather a case as discussed here: https://stackoverflow.com/questions/21075922/function-overriding-in-c-works-without-virtual/21076030#21076030 just that the base-class will be called because here indirection is used!? Than I will follow this route and hopefully also get the refinement-part going. Many thanks for the fast answer! > -- > John |
From: John P. <jwp...@gm...> - 2021-05-12 21:41:44
|
On Wed, May 12, 2021 at 4:25 PM Tobias Moehle <tob...@un...> wrote: > Dear all, > > I hope that someone has experience or some good ideas to help me out: I > am using a setup where the finite element basis is augmented by a > function which contains the most rapidly changing features of the > solution to allow for a considerably coarser grid. > > However, when it comes to printing of the solution, I am stuck: > - In best case, I'd like to use a refined mesh for printing. To > represent the solution reasonably good, I have tried to copy the mesh > into another one which I refine. My initial idea was to use a child > class to MeshFunction to project the original solution with the > augmentation function onto the finer grid; but since the > "MeshFunction::operator()" is not virtual, one cannot overload it. > It looks like those operator() functions aren't explicitly marked virtual, but they definitely are! (Also, they are marked "override", which is another clue that they are virtual.) So your approach of subclassing MeshFunction sounds like a reasonable one to me... I will push a commit adding the virtual keyword where it's missing. -- John |
From: Tobias M. <tob...@un...> - 2021-05-12 21:25:23
|
Dear all, I hope that someone has experience or some good ideas to help me out: I am using a setup where the finite element basis is augmented by a function which contains the most rapidly changing features of the solution to allow for a considerably coarser grid. However, when it comes to printing of the solution, I am stuck: - In best case, I'd like to use a refined mesh for printing. To represent the solution reasonably good, I have tried to copy the mesh into another one which I refine. My initial idea was to use a child class to MeshFunction to project the original solution with the augmentation function onto the finer grid; but since the "MeshFunction::operator()" is not virtual, one cannot overload it. - For the moment being, I am also happy to plot the solution on the coarser grid, hoping that the representation is not completely spoiled. However, I don't see a simple way to augment the solution-vector by a user-specified function? I have tried to get this working manually, but at least in parallel mode always ended in a mess of functions and some contributions that were not available to the processor... Many thanks in advance, Tobias |
From: edgar <edg...@cr...> - 2021-05-12 17:50:26
|
On 2021-05-12 16:57, edgar wrote: > On 2021-05-10 15:59, John Peterson wrote: >> On Thu, May 6, 2021 at 9:48 PM edgar <edg...@cr...> wrote: >> We renumbered our examples once (many years ago) so it's possible that >> references to "ex13" that you see have simply never been updated >> properly. >> If you can point us to them I'll take a look. > > I see. I found these: > > ┌──── > │ find . -type f -exec grep --color -nH --null -e 'example 13' \{\} + > └──── > > ┌──── > │ ./fem_system/ex1/fem_system_ex1.C\026:// example 13 can be solved > using the > │ ./systems_of_equations/ex3/systems_of_equations_ex3.C\025:// example > 13 can be solved using a scalar Lagrange multiplier > │ ./vector_fe/ex2/vector_fe_ex2.C\026:// example 13 can be solved using > the > └──── I found another one of these in fem_system_ex3.C: // This is just Systems of Equations Example 6 recast. In that same example, it reads: // Declare the system "Navier-Stokes" and its variables. But the system is named "Linear Elasticity". I also think that these two can be merged into 1: #+begin_src diff --- a/examples/fem_system/ex3/fem_system_ex3.C 2021-03-22 18:33:18.000000000 -0600 +++ b/examples/fem_system/ex3/fem_system_ex3.C 2021-05-12 12:41:14.847399776 -0500 @@ -172,7 +172,7 @@ // Create an equation systems object. EquationSystems equation_systems (mesh); - // Declare the system "Navier-Stokes" and its variables. + // Declare the system "Linear Elasticity" and its variables. ElasticitySystem & system = equation_systems.add_system<ElasticitySystem> ("Linear Elasticity"); @@ -195,10 +195,10 @@ a_system->add_variable("u_accel", FIRST, LAGRANGE); a_system->add_variable("v_accel", FIRST, LAGRANGE); a_system->add_variable("w_accel", FIRST, LAGRANGE); - } - if (time_solver == std::string("newmark")) - system.time_solver = libmesh_make_unique<NewmarkSolver>(system); + system.time_solver = libmesh_make_unique<NewmarkSolver>(system); + } + else if( time_solver == std::string("euler") ) { #+end_src |
From: edgar <edg...@cr...> - 2021-05-12 16:58:01
|
On 2021-05-10 15:59, John Peterson wrote: > On Thu, May 6, 2021 at 9:48 PM edgar <edg...@cr...> wrote: > We renumbered our examples once (many years ago) so it's possible that > references to "ex13" that you see have simply never been updated > properly. > If you can point us to them I'll take a look. I see. I found these: ┌──── │ find . -type f -exec grep --color -nH --null -e 'example 13' \{\} + └──── ┌──── │ ./fem_system/ex1/fem_system_ex1.C\026:// example 13 can be solved using the │ ./systems_of_equations/ex3/systems_of_equations_ex3.C\025:// example 13 can be solved using a scalar Lagrange multiplier │ ./vector_fe/ex2/vector_fe_ex2.C\026:// example 13 can be solved using the └──── |
From: John P. <jwp...@gm...> - 2021-05-10 15:59:41
|
On Thu, May 6, 2021 at 9:48 PM edgar <edg...@cr...> wrote: > Hi! > > There is a small typo in > examples/miscellaneous/ex13/miscellaneous_ex13.C. It reads: > > Miscellaneous Example 12 > > Thanks for letting us know, I fixed the documentation in libmesh commit 8c2c45793. I am guessing that it should be 13. Is this the example 13 to which many > other examples refer? Thanks! > We renumbered our examples once (many years ago) so it's possible that references to "ex13" that you see have simply never been updated properly. If you can point us to them I'll take a look. -- John |
From: edgar <edg...@cr...> - 2021-05-07 02:48:37
|
Hi! There is a small typo in examples/miscellaneous/ex13/miscellaneous_ex13.C. It reads: Miscellaneous Example 12 I am guessing that it should be 13. Is this the example 13 to which many other examples refer? Thanks! |