You can subscribe to this list here.
2013 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(19) |
Jul
(6) |
Aug
|
Sep
(68) |
Oct
(38) |
Nov
(63) |
Dec
(1) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2014 |
Jan
(36) |
Feb
(42) |
Mar
(5) |
Apr
(23) |
May
(26) |
Jun
(22) |
Jul
(27) |
Aug
(4) |
Sep
|
Oct
(62) |
Nov
(7) |
Dec
(13) |
2015 |
Jan
(9) |
Feb
(52) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(30) |
Aug
(94) |
Sep
(8) |
Oct
(24) |
Nov
(8) |
Dec
(4) |
2016 |
Jan
|
Feb
|
Mar
(5) |
Apr
(1) |
May
(4) |
Jun
|
Jul
|
Aug
(4) |
Sep
(3) |
Oct
(7) |
Nov
|
Dec
(2) |
2017 |
Jan
(14) |
Feb
(14) |
Mar
(4) |
Apr
|
May
|
Jun
(8) |
Jul
(4) |
Aug
(2) |
Sep
(16) |
Oct
(4) |
Nov
(9) |
Dec
|
2018 |
Jan
(3) |
Feb
(16) |
Mar
(2) |
Apr
(2) |
May
(2) |
Jun
|
Jul
(13) |
Aug
(10) |
Sep
(6) |
Oct
(21) |
Nov
(7) |
Dec
(19) |
2019 |
Jan
(6) |
Feb
(1) |
Mar
(15) |
Apr
(30) |
May
(16) |
Jun
(32) |
Jul
(10) |
Aug
(6) |
Sep
(8) |
Oct
(2) |
Nov
(15) |
Dec
(25) |
2020 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
|
Aug
(11) |
Sep
(18) |
Oct
(7) |
Nov
(1) |
Dec
(1) |
2021 |
Jan
(1) |
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
|
Sep
(4) |
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
(6) |
Mar
(3) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(9) |
Dec
(8) |
2023 |
Jan
(7) |
Feb
(1) |
Mar
(1) |
Apr
|
May
|
Jun
(5) |
Jul
(1) |
Aug
|
Sep
|
Oct
(10) |
Nov
(10) |
Dec
(6) |
2024 |
Jan
|
Feb
(9) |
Mar
(4) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Fackler, P. <fac...@or...> - 2023-01-20 17:01:08
|
Any progress on this? Any info/help needed? Thanks, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Fackler, Philip <fac...@or...> Sent: Thursday, December 8, 2022 09:07 To: Junchao Zhang <jun...@gm...> Cc: xol...@li... <xol...@li...>; pet...@mc... <pet...@mc...>; Blondel, Sophie <sbl...@ut...>; Roth, Philip <ro...@or...> Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device. Great! Thank you! Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang <jun...@gm...> Sent: Wednesday, December 7, 2022 18:47 To: Fackler, Philip <fac...@or...> Cc: xol...@li... <xol...@li...>; pet...@mc... <pet...@mc...>; Blondel, Sophie <sbl...@ut...>; Roth, Philip <ro...@or...> Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device. Hi, Philip, I could reproduce the error. I need to find a way to debug it. Thanks. /home/jczhang/xolotl/test/system/SystemTestCase.cpp(317): fatal error: in "System/PSI_1": absolute value of diffNorm{0.19704848134353209} exceeds 1e-10 *** 1 failure is detected in the test module "Regression" --Junchao Zhang On Tue, Dec 6, 2022 at 10:10 AM Fackler, Philip <fac...@or...<mailto:fac...@or...>> wrote: I think it would be simpler to use the develop branch for this issue. But you can still just build the SystemTester. Then (if you changed the PSI_1 case) run: ./test/system/SystemTester -t System/PSI_1 -- -v (No need for multiple MPI ranks) Thanks, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang <jun...@gm...<mailto:jun...@gm...>> Sent: Monday, December 5, 2022 15:40 To: Fackler, Philip <fac...@or...<mailto:fac...@or...>> Cc: xol...@li...<mailto:xol...@li...> <xol...@li...<mailto:xol...@li...>>; pet...@mc...<mailto:pet...@mc...> <pet...@mc...<mailto:pet...@mc...>>; Blondel, Sophie <sbl...@ut...<mailto:sbl...@ut...>>; Roth, Philip <ro...@or...<mailto:ro...@or...>> Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device. I configured with xolotl branch feature-petsc-kokkos, and typed `make` under ~/xolotl-build/. Though there were errors, a lot of *Tester were built. [ 62%] Built target xolotlViz [ 63%] Linking CXX executable TemperatureProfileHandlerTester [ 64%] Linking CXX executable TemperatureGradientHandlerTester [ 64%] Built target TemperatureProfileHandlerTester [ 64%] Built target TemperatureConstantHandlerTester [ 64%] Built target TemperatureGradientHandlerTester [ 65%] Linking CXX executable HeatEquationHandlerTester [ 65%] Built target HeatEquationHandlerTester [ 66%] Linking CXX executable FeFitFluxHandlerTester [ 66%] Linking CXX executable W111FitFluxHandlerTester [ 67%] Linking CXX executable FuelFitFluxHandlerTester [ 67%] Linking CXX executable W211FitFluxHandlerTester Which Tester should I use to run with the parameter file benchmarks/params_system_PSI_2.txt? And how many ranks should I use? Could you give an example command line? Thanks. --Junchao Zhang On Mon, Dec 5, 2022 at 2:22 PM Junchao Zhang <jun...@gm...<mailto:jun...@gm...>> wrote: Hello, Philip, Do I still need to use the feature-petsc-kokkos branch? --Junchao Zhang On Mon, Dec 5, 2022 at 11:08 AM Fackler, Philip <fac...@or...<mailto:fac...@or...>> wrote: Junchao, Thank you for working on this. If you open the parameter file for, say, the PSI_2 system test case (benchmarks/params_system_PSI_2.txt), simply add -dm_mat_type aijkokkos -dm_vec_type kokkos` to the "petscArgs=" field (or the corresponding cusparse/cuda option). Thanks, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang <jun...@gm...<mailto:jun...@gm...>> Sent: Thursday, December 1, 2022 17:05 To: Fackler, Philip <fac...@or...<mailto:fac...@or...>> Cc: xol...@li...<mailto:xol...@li...> <xol...@li...<mailto:xol...@li...>>; pet...@mc...<mailto:pet...@mc...> <pet...@mc...<mailto:pet...@mc...>>; Blondel, Sophie <sbl...@ut...<mailto:sbl...@ut...>>; Roth, Philip <ro...@or...<mailto:ro...@or...>> Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device. Hi, Philip, Sorry for the long delay. I could not get something useful from the -log_view output. Since I have already built xolotl, could you give me instructions on how to do a xolotl test to reproduce the divergence with petsc GPU backends (but fine on CPU)? Thank you. --Junchao Zhang On Wed, Nov 16, 2022 at 1:38 PM Fackler, Philip <fac...@or...<mailto:fac...@or...>> wrote: ------------------------------------------------------------------ PETSc Performance Summary: ------------------------------------------------------------------ Unknown Name on a named PC0115427 with 1 processor, by 4pf Wed Nov 16 14:36:46 2022 Using Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a GIT Date: 2022-10-28 14:39:41 +0000 Max Max/Min Avg Total Time (sec): 6.023e+00 1.000 6.023e+00 Objects: 1.020e+02 1.000 1.020e+02 Flops: 1.080e+09 1.000 1.080e+09 1.080e+09 Flops/sec: 1.793e+08 1.000 1.793e+08 1.793e+08 MPI Msg Count: 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Msg Len (bytes): 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Reductions: 0.000e+00 0.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 6.0226e+00 100.0% 1.0799e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors) CpuToGpu Count: total number of CPU to GPU copies per processor CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor) GpuToCpu Count: total number of GPU to CPU copies per processor GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor) GPU %F: percent flops on GPU in this event ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F ------------------------------------------------------------------------------------------------------------------------ --------------------------------------- --- Event Stage 0: Main Stage BuildTwoSided 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 DMCreateMat 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFSetGraph 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFSetUp 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFPack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFUnpack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecDot 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecMDot 775 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecNorm 1728 1.0 nan nan 1.92e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecScale 1983 1.0 nan nan 6.24e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecCopy 780 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecSet 4955 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecAXPY 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecAYPX 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecAXPBYCZ 643 1.0 nan nan 1.79e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecWAXPY 502 1.0 nan nan 5.58e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecMAXPY 1159 1.0 nan nan 3.68e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecScatterBegin 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan -nan 2 5.14e-03 0 0.00e+00 0 VecScatterEnd 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecReduceArith 380 1.0 nan nan 4.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecReduceComm 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecNormalize 965 1.0 nan nan 1.61e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 TSStep 20 1.0 5.8699e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 97100 0 0 0 97100 0 0 0 184 -nan 2 5.14e-03 0 0.00e+00 54 TSFunctionEval 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00 63 1 0 0 0 63 1 0 0 0 -nan -nan 1 3.36e-04 0 0.00e+00 100 TSJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 97 MatMult 1930 1.0 nan nan 4.46e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 41 0 0 0 1 41 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 MatMultTranspose 1 1.0 nan nan 3.44e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 MatSolve 965 1.0 nan nan 5.04e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 5 0 0 0 1 5 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatSOR 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatLUFactorSym 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatLUFactorNum 190 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 11 0 0 0 1 11 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatScale 190 1.0 nan nan 3.26e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 MatAssemblyBegin 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatAssemblyEnd 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatGetRowIJ 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatCreateSubMats 380 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatGetOrdering 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatZeroEntries 379 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatSetPreallCOO 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatSetValuesCOO 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 KSPSetUp 760 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 KSPSolve 190 1.0 5.8052e-01 1.0 9.30e+08 1.0 0.0e+00 0.0e+00 0.0e+00 10 86 0 0 0 10 86 0 0 0 1602 -nan 1 4.80e-03 0 0.00e+00 46 KSPGMRESOrthog 775 1.0 nan nan 2.27e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 SNESSolve 71 1.0 5.7117e+00 1.0 1.07e+09 1.0 0.0e+00 0.0e+00 0.0e+00 95 99 0 0 0 95 99 0 0 0 188 -nan 1 4.80e-03 0 0.00e+00 53 SNESSetUp 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SNESFunctionEval 573 1.0 nan nan 2.23e+07 1.0 0.0e+00 0.0e+00 0.0e+00 60 2 0 0 0 60 2 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 SNESJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 97 SNESLineSearch 190 1.0 nan nan 1.05e+08 1.0 0.0e+00 0.0e+00 0.0e+00 53 10 0 0 0 53 10 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 PCSetUp 570 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 11 0 0 0 2 11 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 PCApply 965 1.0 nan nan 6.14e+08 1.0 0.0e+00 0.0e+00 0.0e+00 8 57 0 0 0 8 57 0 0 0 -nan -nan 1 4.80e-03 0 0.00e+00 19 KSPSolve_FS_0 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 KSPSolve_FS_1 965 1.0 nan nan 1.66e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 15 0 0 0 2 15 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 --- Event Stage 1: Unknown ------------------------------------------------------------------------------------------------------------------------ --------------------------------------- Object Type Creations Destructions. Reports information only for process 0. --- Event Stage 0: Main Stage Container 5 5 Distributed Mesh 2 2 Index Set 11 11 IS L to G Mapping 1 1 Star Forest Graph 7 7 Discrete System 2 2 Weak Form 2 2 Vector 49 49 TSAdapt 1 1 TS 1 1 DMTS 1 1 SNES 1 1 DMSNES 3 3 SNESLineSearch 1 1 Krylov Solver 4 4 DMKSP interface 1 1 Matrix 4 4 Preconditioner 4 4 Viewer 2 1 --- Event Stage 1: Unknown ======================================================================================================================== Average time to get PetscTime(): 3.14e-08 #PETSc Option Table entries: -log_view -log_view_gpu_times #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with 64 bit PetscInt Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8 Configure options: PETSC_DIR=/home/4pf/repos/petsc PETSC_ARCH=arch-kokkos-cuda-no-tpls --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-cuda --with-debugging=0 --with-shared-libraries --prefix=/home/4pf/build/petsc/cuda-no-tpls/install --with-64-bit-indices --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --CUDAOPTFLAGS=-O3 --with-kokkos-dir=/home/4pf/build/kokkos/cuda/install --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/cuda-no-tpls/install ----------------------------------------- Libraries compiled on 2022-11-01 21:01:08 on PC0115427 Machine characteristics: Linux-5.15.0-52-generic-x86_64-with-glibc2.35 Using PETSc directory: /home/4pf/build/petsc/cuda-no-tpls/install Using PETSc arch: ----------------------------------------- Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector -fvisibility=hidden -O3 ----------------------------------------- Using include paths: -I/home/4pf/build/petsc/cuda-no-tpls/install/include -I/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/include -I/home/4pf/build/kokkos/cuda/install/include -I/usr/local/cuda-11.8/include ----------------------------------------- Using C linker: mpicc Using libraries: -Wl,-rpath,/home/4pf/build/petsc/cuda-no-tpls/install/lib -L/home/4pf/build/petsc/cuda-no-tpls/install/lib -lpetsc -Wl,-rpath,/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib -L/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib -Wl,-rpath,/home/4pf/build/kokkos/cuda/install/lib -L/home/4pf/build/kokkos/cuda/install/lib -Wl,-rpath,/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64/stubs -lkokkoskernels -lkokkoscontainers -lkokkoscore -llapack -lblas -lm -lcudart -lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda -lquadmath -lstdc++ -ldl ----------------------------------------- Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang <jun...@gm...<mailto:jun...@gm...>> Sent: Tuesday, November 15, 2022 13:03 To: Fackler, Philip <fac...@or...<mailto:fac...@or...>> Cc: xol...@li...<mailto:xol...@li...> <xol...@li...<mailto:xol...@li...>>; pet...@mc...<mailto:pet...@mc...> <pet...@mc...<mailto:pet...@mc...>>; Blondel, Sophie <sbl...@ut...<mailto:sbl...@ut...>>; Roth, Philip <ro...@or...<mailto:ro...@or...>> Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device. Can you paste -log_view result so I can see what functions are used? --Junchao Zhang On Tue, Nov 15, 2022 at 10:24 AM Fackler, Philip <fac...@or...<mailto:fac...@or...>> wrote: Yes, most (but not all) of our system test cases fail with the kokkos/cuda or cuda backends. All of them pass with the CPU-only kokkos backend. Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang <jun...@gm...<mailto:jun...@gm...>> Sent: Monday, November 14, 2022 19:34 To: Fackler, Philip <fac...@or...<mailto:fac...@or...>> Cc: xol...@li...<mailto:xol...@li...> <xol...@li...<mailto:xol...@li...>>; pet...@mc...<mailto:pet...@mc...> <pet...@mc...<mailto:pet...@mc...>>; Blondel, Sophie <sbl...@ut...<mailto:sbl...@ut...>>; Zhang, Junchao <jc...@mc...<mailto:jc...@mc...>>; Roth, Philip <ro...@or...<mailto:ro...@or...>> Subject: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device. Hi, Philip, Sorry to hear that. It seems you could run the same code on CPUs but not no GPUs (with either petsc/Kokkos backend or petsc/cuda backend, is it right? --Junchao Zhang On Mon, Nov 14, 2022 at 12:13 PM Fackler, Philip via petsc-users <pet...@mc...<mailto:pet...@mc...>> wrote: This is an issue I've brought up before (and discussed in-person with Richard). I wanted to bring it up again because I'm hitting the limits of what I know to do, and I need help figuring this out. The problem can be reproduced using Xolotl's "develop" branch built against a petsc build with kokkos and kokkos-kernels enabled. Then, either add the relevant kokkos options to the "petscArgs=" line in the system test parameter file(s), or just replace the system test parameter files with the ones from the "feature-petsc-kokkos" branch. See here the files that begin with "params_system_". Note that those files use the "kokkos" options, but the problem is similar using the corresponding cuda/cusparse options. I've already tried building kokkos-kernels with no TPLs and got slightly different results, but the same problem. Any help would be appreciated. Thanks, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory |
From: Fackler, P. <fac...@or...> - 2023-01-20 16:55:52
|
The following is the log_view output for the ported case using 4 MPI tasks. **************************************************************************************************************************************************************** *** WIDEN YOUR WINDOW TO 160 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** **************************************************************************************************************************************************************** ------------------------------------------------------------------ PETSc Performance Summary: ------------------------------------------------------------------ Unknown Name on a named iguazu with 4 processors, by 4pf Fri Jan 20 11:53:04 2023 Using Petsc Release Version 3.18.3, unknown Max Max/Min Avg Total Time (sec): 1.447e+01 1.000 1.447e+01 Objects: 1.229e+03 1.003 1.226e+03 Flops: 5.053e+09 1.217 4.593e+09 1.837e+10 Flops/sec: 3.492e+08 1.217 3.174e+08 1.269e+09 MPI Msg Count: 1.977e+04 1.067 1.895e+04 7.580e+04 MPI Msg Len (bytes): 7.374e+07 1.088 3.727e+03 2.825e+08 MPI Reductions: 2.065e+03 1.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 1.4471e+01 100.0% 1.8371e+10 100.0% 7.580e+04 100.0% 3.727e+03 100.0% 2.046e+03 99.1% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors) CpuToGpu Count: total number of CPU to GPU copies per processor CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor) GpuToCpu Count: total number of GPU to CPU copies per processor GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor) GPU %F: percent flops on GPU in this event ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F ------------------------------------------------------------------------------------------------------------------------ --------------------------------------- --- Event Stage 0: Main Stage BuildTwoSided 257 1.0 nan nan 0.00e+00 0.0 4.4e+02 8.0e+00 2.6e+02 1 0 1 0 12 1 0 1 0 13 -nan -nan 0 0.00e+00 0 0.00e+00 0 BuildTwoSidedF 210 1.0 nan nan 0.00e+00 0.0 1.5e+02 4.2e+04 2.1e+02 1 0 0 2 10 1 0 0 2 10 -nan -nan 0 0.00e+00 0 0.00e+00 0 DMCreateMat 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 7.0e+00 10 0 0 0 0 10 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFSetGraph 69 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFSetUp 47 1.0 nan nan 0.00e+00 0.0 7.3e+02 2.1e+03 4.7e+01 0 0 1 1 2 0 0 1 1 2 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFBcastBegin 222 1.0 nan nan 0.00e+00 0.0 2.3e+03 1.9e+04 0.0e+00 0 0 3 16 0 0 0 3 16 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFBcastEnd 222 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFReduceBegin 254 1.0 nan nan 0.00e+00 0.0 1.5e+03 1.2e+04 0.0e+00 0 0 2 6 0 0 0 2 6 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFReduceEnd 254 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFFetchOpBegin 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFFetchOpEnd 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFPack 8091 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFUnpack 8092 1.0 nan nan 4.78e+04 1.5 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecDot 60 1.0 nan nan 4.30e+06 1.2 0.0e+00 0.0e+00 6.0e+01 0 0 0 0 3 0 0 0 0 3 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecMDot 398 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+02 0 0 0 0 19 0 0 0 0 19 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecNorm 641 1.0 nan nan 4.45e+07 1.2 0.0e+00 0.0e+00 6.4e+02 1 1 0 0 31 1 1 0 0 31 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecScale 601 1.0 nan nan 2.08e+07 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecCopy 3735 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecSet 2818 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecAXPY 123 1.0 nan nan 8.68e+06 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecAYPX 6764 1.0 nan nan 1.90e+08 1.2 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecAXPBYCZ 2388 1.0 nan nan 1.83e+08 1.2 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecWAXPY 60 1.0 nan nan 4.30e+06 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecMAXPY 681 1.0 nan nan 1.36e+08 1.2 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecAssemblyBegin 7 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecAssemblyEnd 7 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecPointwiseMult 4449 1.0 nan nan 6.06e+07 1.2 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecScatterBegin 7614 1.0 nan nan 0.00e+00 0.0 7.1e+04 2.9e+03 1.3e+01 0 0 94 73 1 0 0 94 73 1 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecScatterEnd 7614 1.0 nan nan 4.78e+04 1.5 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecReduceArith 120 1.0 nan nan 8.60e+06 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecReduceComm 60 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+01 0 0 0 0 3 0 0 0 0 3 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecNormalize 401 1.0 nan nan 4.09e+07 1.2 0.0e+00 0.0e+00 4.0e+02 0 1 0 0 19 0 1 0 0 20 -nan -nan 0 0.00e+00 0 0.00e+00 100 TSStep 20 1.0 1.2908e+01 1.0 5.05e+09 1.2 7.6e+04 3.7e+03 2.0e+03 89 100 100 98 96 89 100 100 98 97 1423 -nan 0 0.00e+00 0 0.00e+00 99 TSFunctionEval 140 1.0 nan nan 1.00e+07 1.2 1.1e+03 3.7e+04 0.0e+00 1 0 1 15 0 1 0 1 15 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 TSJacobianEval 60 1.0 nan nan 1.67e+07 1.2 4.8e+02 3.7e+04 6.0e+01 2 0 1 6 3 2 0 1 6 3 -nan -nan 0 0.00e+00 0 0.00e+00 87 MatMult 4934 1.0 nan nan 4.16e+09 1.2 5.1e+04 2.7e+03 4.0e+00 15 82 68 49 0 15 82 68 49 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 MatMultAdd 1104 1.0 nan nan 9.00e+07 1.2 8.8e+03 1.4e+02 0.0e+00 1 2 12 0 0 1 2 12 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 MatMultTranspose 1104 1.0 nan nan 9.01e+07 1.2 8.8e+03 1.4e+02 1.0e+00 1 2 12 0 0 1 2 12 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 MatSolve 368 0.0 nan nan 3.57e+04 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatSOR 60 1.0 nan nan 3.12e+07 1.2 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatLUFactorSym 2 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatLUFactorNum 2 1.0 nan nan 4.24e+02 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatConvert 8 1.0 nan nan 0.00e+00 0.0 8.0e+01 1.2e+03 4.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatScale 66 1.0 nan nan 1.48e+07 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 99 MatResidual 1104 1.0 nan nan 1.01e+09 1.2 1.2e+04 2.9e+03 0.0e+00 4 20 16 12 0 4 20 16 12 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 MatAssemblyBegin 590 1.0 nan nan 0.00e+00 0.0 1.5e+02 4.2e+04 2.0e+02 1 0 0 2 10 1 0 0 2 10 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatAssemblyEnd 590 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 1.4e+02 2 0 0 0 7 2 0 0 0 7 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatGetRowIJ 2 0.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatCreateSubMat 122 1.0 nan nan 0.00e+00 0.0 6.3e+01 1.8e+02 1.7e+02 2 0 0 0 8 2 0 0 0 8 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatGetOrdering 2 0.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatCoarsen 3 1.0 nan nan 0.00e+00 0.0 5.0e+02 1.3e+03 1.2e+02 0 0 1 0 6 0 0 1 0 6 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatZeroEntries 61 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatAXPY 6 1.0 nan nan 1.37e+06 1.2 0.0e+00 0.0e+00 1.8e+01 1 0 0 0 1 1 0 0 0 1 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatTranspose 6 1.0 nan nan 0.00e+00 0.0 2.2e+02 2.9e+04 4.8e+01 1 0 0 2 2 1 0 0 2 2 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatMatMultSym 4 1.0 nan nan 0.00e+00 0.0 2.2e+02 1.7e+03 2.8e+01 0 0 0 0 1 0 0 0 0 1 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatMatMultNum 4 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatPtAPSymbolic 5 1.0 nan nan 0.00e+00 0.0 6.2e+02 5.2e+03 4.4e+01 3 0 1 1 2 3 0 1 1 2 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatPtAPNumeric 181 1.0 nan nan 0.00e+00 0.0 3.3e+03 1.8e+04 0.0e+00 56 0 4 21 0 56 0 4 21 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatGetLocalMat 185 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatSetPreallCOO 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatSetValuesCOO 60 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 KSPSetUp 483 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 2.2e+01 0 0 0 0 1 0 0 0 0 1 -nan -nan 0 0.00e+00 0 0.00e+00 0 KSPSolve 60 1.0 1.1843e+01 1.0 4.91e+09 1.2 7.3e+04 2.9e+03 1.2e+03 82 97 97 75 60 82 97 97 75 60 1506 -nan 0 0.00e+00 0 0.00e+00 99 KSPGMRESOrthog 398 1.0 nan nan 7.97e+07 1.2 0.0e+00 0.0e+00 4.0e+02 1 2 0 0 19 1 2 0 0 19 -nan -nan 0 0.00e+00 0 0.00e+00 100 SNESSolve 60 1.0 1.2842e+01 1.0 5.01e+09 1.2 7.5e+04 3.6e+03 2.0e+03 89 99 100 96 95 89 99 100 96 96 1419 -nan 0 0.00e+00 0 0.00e+00 99 SNESSetUp 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SNESFunctionEval 120 1.0 nan nan 3.01e+07 1.2 9.6e+02 3.7e+04 0.0e+00 1 1 1 13 0 1 1 1 13 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 SNESJacobianEval 60 1.0 nan nan 1.67e+07 1.2 4.8e+02 3.7e+04 6.0e+01 2 0 1 6 3 2 0 1 6 3 -nan -nan 0 0.00e+00 0 0.00e+00 87 SNESLineSearch 60 1.0 nan nan 6.99e+07 1.2 9.6e+02 1.9e+04 2.4e+02 1 1 1 6 12 1 1 1 6 12 -nan -nan 0 0.00e+00 0 0.00e+00 100 PCSetUp_GAMG+ 60 1.0 nan nan 3.53e+07 1.2 5.2e+03 1.4e+04 4.3e+02 62 1 7 25 21 62 1 7 25 21 -nan -nan 0 0.00e+00 0 0.00e+00 96 PCGAMGCreateG 3 1.0 nan nan 1.32e+06 1.2 2.2e+02 2.9e+04 4.2e+01 1 0 0 2 2 1 0 0 2 2 -nan -nan 0 0.00e+00 0 0.00e+00 0 GAMG Coarsen 3 1.0 nan nan 0.00e+00 0.0 5.0e+02 1.3e+03 1.2e+02 1 0 1 0 6 1 0 1 0 6 -nan -nan 0 0.00e+00 0 0.00e+00 0 GAMG MIS/Agg 3 1.0 nan nan 0.00e+00 0.0 5.0e+02 1.3e+03 1.2e+02 0 0 1 0 6 0 0 1 0 6 -nan -nan 0 0.00e+00 0 0.00e+00 0 PCGAMGProl 3 1.0 nan nan 0.00e+00 0.0 7.8e+01 7.8e+02 4.8e+01 0 0 0 0 2 0 0 0 0 2 -nan -nan 0 0.00e+00 0 0.00e+00 0 GAMG Prol-col 3 1.0 nan nan 0.00e+00 0.0 5.2e+01 5.8e+02 2.1e+01 0 0 0 0 1 0 0 0 0 1 -nan -nan 0 0.00e+00 0 0.00e+00 0 GAMG Prol-lift 3 1.0 nan nan 0.00e+00 0.0 2.6e+01 1.2e+03 1.5e+01 0 0 0 0 1 0 0 0 0 1 -nan -nan 0 0.00e+00 0 0.00e+00 0 PCGAMGOptProl 3 1.0 nan nan 3.40e+07 1.2 5.8e+02 2.4e+03 1.1e+02 1 1 1 0 6 1 1 1 0 6 -nan -nan 0 0.00e+00 0 0.00e+00 100 GAMG smooth 3 1.0 nan nan 2.85e+05 1.2 1.9e+02 1.9e+03 3.0e+01 0 0 0 0 1 0 0 0 0 1 -nan -nan 0 0.00e+00 0 0.00e+00 43 PCGAMGCreateL 3 1.0 nan nan 0.00e+00 0.0 4.8e+02 6.5e+03 8.0e+01 3 0 1 1 4 3 0 1 1 4 -nan -nan 0 0.00e+00 0 0.00e+00 0 GAMG PtAP 3 1.0 nan nan 0.00e+00 0.0 4.5e+02 7.1e+03 2.7e+01 3 0 1 1 1 3 0 1 1 1 -nan -nan 0 0.00e+00 0 0.00e+00 0 GAMG Reduce 1 1.0 nan nan 0.00e+00 0.0 3.6e+01 3.7e+01 5.3e+01 0 0 0 0 3 0 0 0 0 3 -nan -nan 0 0.00e+00 0 0.00e+00 0 PCGAMG Gal l00 60 1.0 nan nan 0.00e+00 0.0 1.1e+03 1.4e+04 9.0e+00 46 0 1 6 0 46 0 1 6 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 PCGAMG Opt l00 1 1.0 nan nan 0.00e+00 0.0 4.8e+01 1.7e+02 7.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 PCGAMG Gal l01 60 1.0 nan nan 0.00e+00 0.0 1.6e+03 2.9e+04 9.0e+00 13 0 2 16 0 13 0 2 16 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 PCGAMG Opt l01 1 1.0 nan nan 0.00e+00 0.0 7.2e+01 4.8e+03 7.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 PCGAMG Gal l02 60 1.0 nan nan 0.00e+00 0.0 1.1e+03 1.2e+03 1.7e+01 0 0 1 0 1 0 0 1 0 1 -nan -nan 0 0.00e+00 0 0.00e+00 0 PCGAMG Opt l02 1 1.0 nan nan 0.00e+00 0.0 7.2e+01 2.2e+02 7.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 PCSetUp 182 1.0 nan nan 3.53e+07 1.2 5.3e+03 1.4e+04 7.7e+02 64 1 7 27 37 64 1 7 27 38 -nan -nan 0 0.00e+00 0 0.00e+00 96 PCSetUpOnBlocks 368 1.0 nan nan 4.24e+02 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 PCApply 60 1.0 nan nan 4.85e+09 1.2 7.3e+04 2.9e+03 1.1e+03 81 96 96 75 54 81 96 96 75 54 -nan -nan 0 0.00e+00 0 0.00e+00 99 KSPSolve_FS_0 60 1.0 nan nan 3.12e+07 1.2 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 KSPSolve_FS_1 60 1.0 nan nan 4.79e+09 1.2 7.2e+04 2.9e+03 1.1e+03 81 95 96 75 54 81 95 96 75 54 -nan -nan 0 0.00e+00 0 0.00e+00 100 --- Event Stage 1: Unknown ------------------------------------------------------------------------------------------------------------------------ --------------------------------------- Object Type Creations Destructions. Reports information only for process 0. --- Event Stage 0: Main Stage Container 14 14 Distributed Mesh 9 9 Index Set 120 120 IS L to G Mapping 10 10 Star Forest Graph 87 87 Discrete System 9 9 Weak Form 9 9 Vector 761 761 TSAdapt 1 1 TS 1 1 DMTS 1 1 SNES 1 1 DMSNES 3 3 SNESLineSearch 1 1 Krylov Solver 11 11 DMKSP interface 1 1 Matrix 171 171 Matrix Coarsen 3 3 Preconditioner 11 11 Viewer 2 1 PetscRandom 3 3 --- Event Stage 1: Unknown ======================================================================================================================== Average time to get PetscTime(): 3.82e-08 Average time for MPI_Barrier(): 2.2968e-06 Average time for zero size MPI_Send(): 3.371e-06 #PETSc Option Table entries: -log_view #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with 64 bit PetscInt Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8 Configure options: PETSC_DIR=/home2/4pf/petsc PETSC_ARCH=arch-kokkos-serial --prefix=/home2/4pf/.local/serial --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-cudac=0 --with-cuda=0 --with-shared-libraries --with-64-bit-indices --with-debugging=0 --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --with-kokkos-dir=/home2/4pf/.local/serial --with-kokkos-kernels-dir=/home2/4pf/.local/serial --download-f2cblaslapack ----------------------------------------- Libraries compiled on 2023-01-06 18:21:31 on iguazu Machine characteristics: Linux-4.18.0-383.el8.x86_64-x86_64-with-glibc2.28 Using PETSc directory: /home2/4pf/.local/serial Using PETSc arch: ----------------------------------------- Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -fstack-protector -fvisibility=hidden -O3 ----------------------------------------- Using include paths: -I/home2/4pf/.local/serial/include ----------------------------------------- Using C linker: mpicc Using libraries: -Wl,-rpath,/home2/4pf/.local/serial/lib -L/home2/4pf/.local/serial/lib -lpetsc -Wl,-rpath,/home2/4pf/.local/serial/lib64 -L/home2/4pf/.local/serial/lib64 -Wl,-rpath,/home2/4pf/.local/serial/lib -L/home2/4pf/.local/serial/lib -lkokkoskernels -lkokkoscontainers -lkokkoscore -lf2clapack -lf2cblas -lm -lX11 -lquadmath -lstdc++ -ldl ----------------------------------------- --- Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Zhang, Junchao <jc...@mc...> Sent: Tuesday, January 17, 2023 17:25 To: Fackler, Philip <fac...@or...>; xol...@li... <xol...@li...>; pet...@mc... <pet...@mc...> Cc: Mills, Richard Tran <rt...@an...>; Blondel, Sophie <sbl...@ut...>; Roth, Philip <ro...@or...> Subject: [EXTERNAL] Re: Performance problem using COO interface Hi, Philip, Could you add -log_view and see what functions are used in the solve? Since it is CPU-only, perhaps with -log_view of different runs, we can easily see which functions slowed down. --Junchao Zhang ________________________________ From: Fackler, Philip <fac...@or...> Sent: Tuesday, January 17, 2023 4:13 PM To: xol...@li... <xol...@li...>; pet...@mc... <pet...@mc...> Cc: Mills, Richard Tran <rt...@an...>; Zhang, Junchao <jc...@mc...>; Blondel, Sophie <sbl...@ut...>; Roth, Philip <ro...@or...> Subject: Performance problem using COO interface In Xolotl's feature-petsc-kokkos branch I have ported the code to use petsc's COO interface for creating the Jacobian matrix (and the Kokkos interface for interacting with Vec entries). As the attached plots show for one case, while the code for computing the RHSFunction and RHSJacobian perform similarly (or slightly better) after the port, the performance for the solve as a whole is significantly worse. Note: This is all CPU-only (so kokkos and kokkos-kernels are built with only the serial backend). The dev version is using MatSetValuesStencil with the default implementations for Mat and Vec. The port version is using MatSetValuesCOO and is run with -dm_mat_type aijkokkos -dm_vec_type kokkos. The port/def version is using MatSetValuesCOO and is run with -dm_vec_type kokkos (using the default Mat implementation). So, this seems to be due be a performance difference in the petsc implementations. Please advise. Is this a known issue? Or am I missing something? Thank you for the help, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory |
From: Zhang, J. <jc...@mc...> - 2023-01-17 22:57:53
|
Hi, Philip, Could you add -log_view and see what functions are used in the solve? Since it is CPU-only, perhaps with -log_view of different runs, we can easily see which functions slowed down. --Junchao Zhang ________________________________ From: Fackler, Philip <fac...@or...> Sent: Tuesday, January 17, 2023 4:13 PM To: xol...@li... <xol...@li...>; pet...@mc... <pet...@mc...> Cc: Mills, Richard Tran <rt...@an...>; Zhang, Junchao <jc...@mc...>; Blondel, Sophie <sbl...@ut...>; Roth, Philip <ro...@or...> Subject: Performance problem using COO interface In Xolotl's feature-petsc-kokkos branch I have ported the code to use petsc's COO interface for creating the Jacobian matrix (and the Kokkos interface for interacting with Vec entries). As the attached plots show for one case, while the code for computing the RHSFunction and RHSJacobian perform similarly (or slightly better) after the port, the performance for the solve as a whole is significantly worse. Note: This is all CPU-only (so kokkos and kokkos-kernels are built with only the serial backend). The dev version is using MatSetValuesStencil with the default implementations for Mat and Vec. The port version is using MatSetValuesCOO and is run with -dm_mat_type aijkokkos -dm_vec_type kokkos. The port/def version is using MatSetValuesCOO and is run with -dm_vec_type kokkos (using the default Mat implementation). So, this seems to be due be a performance difference in the petsc implementations. Please advise. Is this a known issue? Or am I missing something? Thank you for the help, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory |
From: Fackler, P. <fac...@or...> - 2023-01-17 22:29:50
|
In Xolotl's feature-petsc-kokkos branch I have ported the code to use petsc's COO interface for creating the Jacobian matrix (and the Kokkos interface for interacting with Vec entries). As the attached plots show for one case, while the code for computing the RHSFunction and RHSJacobian perform similarly (or slightly better) after the port, the performance for the solve as a whole is significantly worse. Note: This is all CPU-only (so kokkos and kokkos-kernels are built with only the serial backend). The dev version is using MatSetValuesStencil with the default implementations for Mat and Vec. The port version is using MatSetValuesCOO and is run with -dm_mat_type aijkokkos -dm_vec_type kokkos. The port/def version is using MatSetValuesCOO and is run with -dm_vec_type kokkos (using the default Mat implementation). So, this seems to be due be a performance difference in the petsc implementations. Please advise. Is this a known issue? Or am I missing something? Thank you for the help, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory |
From: Fackler, P. <fac...@or...> - 2022-12-08 14:07:57
|
Great! Thank you! Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang <jun...@gm...> Sent: Wednesday, December 7, 2022 18:47 To: Fackler, Philip <fac...@or...> Cc: xol...@li... <xol...@li...>; pet...@mc... <pet...@mc...>; Blondel, Sophie <sbl...@ut...>; Roth, Philip <ro...@or...> Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device. Hi, Philip, I could reproduce the error. I need to find a way to debug it. Thanks. /home/jczhang/xolotl/test/system/SystemTestCase.cpp(317): fatal error: in "System/PSI_1": absolute value of diffNorm{0.19704848134353209} exceeds 1e-10 *** 1 failure is detected in the test module "Regression" --Junchao Zhang On Tue, Dec 6, 2022 at 10:10 AM Fackler, Philip <fac...@or...<mailto:fac...@or...>> wrote: I think it would be simpler to use the develop branch for this issue. But you can still just build the SystemTester. Then (if you changed the PSI_1 case) run: ./test/system/SystemTester -t System/PSI_1 -- -v (No need for multiple MPI ranks) Thanks, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang <jun...@gm...<mailto:jun...@gm...>> Sent: Monday, December 5, 2022 15:40 To: Fackler, Philip <fac...@or...<mailto:fac...@or...>> Cc: xol...@li...<mailto:xol...@li...> <xol...@li...<mailto:xol...@li...>>; pet...@mc...<mailto:pet...@mc...> <pet...@mc...<mailto:pet...@mc...>>; Blondel, Sophie <sbl...@ut...<mailto:sbl...@ut...>>; Roth, Philip <ro...@or...<mailto:ro...@or...>> Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device. I configured with xolotl branch feature-petsc-kokkos, and typed `make` under ~/xolotl-build/. Though there were errors, a lot of *Tester were built. [ 62%] Built target xolotlViz [ 63%] Linking CXX executable TemperatureProfileHandlerTester [ 64%] Linking CXX executable TemperatureGradientHandlerTester [ 64%] Built target TemperatureProfileHandlerTester [ 64%] Built target TemperatureConstantHandlerTester [ 64%] Built target TemperatureGradientHandlerTester [ 65%] Linking CXX executable HeatEquationHandlerTester [ 65%] Built target HeatEquationHandlerTester [ 66%] Linking CXX executable FeFitFluxHandlerTester [ 66%] Linking CXX executable W111FitFluxHandlerTester [ 67%] Linking CXX executable FuelFitFluxHandlerTester [ 67%] Linking CXX executable W211FitFluxHandlerTester Which Tester should I use to run with the parameter file benchmarks/params_system_PSI_2.txt? And how many ranks should I use? Could you give an example command line? Thanks. --Junchao Zhang On Mon, Dec 5, 2022 at 2:22 PM Junchao Zhang <jun...@gm...<mailto:jun...@gm...>> wrote: Hello, Philip, Do I still need to use the feature-petsc-kokkos branch? --Junchao Zhang On Mon, Dec 5, 2022 at 11:08 AM Fackler, Philip <fac...@or...<mailto:fac...@or...>> wrote: Junchao, Thank you for working on this. If you open the parameter file for, say, the PSI_2 system test case (benchmarks/params_system_PSI_2.txt), simply add -dm_mat_type aijkokkos -dm_vec_type kokkos` to the "petscArgs=" field (or the corresponding cusparse/cuda option). Thanks, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang <jun...@gm...<mailto:jun...@gm...>> Sent: Thursday, December 1, 2022 17:05 To: Fackler, Philip <fac...@or...<mailto:fac...@or...>> Cc: xol...@li...<mailto:xol...@li...> <xol...@li...<mailto:xol...@li...>>; pet...@mc...<mailto:pet...@mc...> <pet...@mc...<mailto:pet...@mc...>>; Blondel, Sophie <sbl...@ut...<mailto:sbl...@ut...>>; Roth, Philip <ro...@or...<mailto:ro...@or...>> Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device. Hi, Philip, Sorry for the long delay. I could not get something useful from the -log_view output. Since I have already built xolotl, could you give me instructions on how to do a xolotl test to reproduce the divergence with petsc GPU backends (but fine on CPU)? Thank you. --Junchao Zhang On Wed, Nov 16, 2022 at 1:38 PM Fackler, Philip <fac...@or...<mailto:fac...@or...>> wrote: ------------------------------------------------------------------ PETSc Performance Summary: ------------------------------------------------------------------ Unknown Name on a named PC0115427 with 1 processor, by 4pf Wed Nov 16 14:36:46 2022 Using Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a GIT Date: 2022-10-28 14:39:41 +0000 Max Max/Min Avg Total Time (sec): 6.023e+00 1.000 6.023e+00 Objects: 1.020e+02 1.000 1.020e+02 Flops: 1.080e+09 1.000 1.080e+09 1.080e+09 Flops/sec: 1.793e+08 1.000 1.793e+08 1.793e+08 MPI Msg Count: 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Msg Len (bytes): 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Reductions: 0.000e+00 0.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 6.0226e+00 100.0% 1.0799e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors) CpuToGpu Count: total number of CPU to GPU copies per processor CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor) GpuToCpu Count: total number of GPU to CPU copies per processor GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor) GPU %F: percent flops on GPU in this event ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F ------------------------------------------------------------------------------------------------------------------------ --------------------------------------- --- Event Stage 0: Main Stage BuildTwoSided 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 DMCreateMat 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFSetGraph 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFSetUp 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFPack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFUnpack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecDot 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecMDot 775 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecNorm 1728 1.0 nan nan 1.92e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecScale 1983 1.0 nan nan 6.24e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecCopy 780 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecSet 4955 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecAXPY 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecAYPX 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecAXPBYCZ 643 1.0 nan nan 1.79e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecWAXPY 502 1.0 nan nan 5.58e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecMAXPY 1159 1.0 nan nan 3.68e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecScatterBegin 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan -nan 2 5.14e-03 0 0.00e+00 0 VecScatterEnd 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecReduceArith 380 1.0 nan nan 4.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecReduceComm 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecNormalize 965 1.0 nan nan 1.61e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 TSStep 20 1.0 5.8699e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 97100 0 0 0 97100 0 0 0 184 -nan 2 5.14e-03 0 0.00e+00 54 TSFunctionEval 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00 63 1 0 0 0 63 1 0 0 0 -nan -nan 1 3.36e-04 0 0.00e+00 100 TSJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 97 MatMult 1930 1.0 nan nan 4.46e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 41 0 0 0 1 41 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 MatMultTranspose 1 1.0 nan nan 3.44e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 MatSolve 965 1.0 nan nan 5.04e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 5 0 0 0 1 5 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatSOR 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatLUFactorSym 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatLUFactorNum 190 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 11 0 0 0 1 11 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatScale 190 1.0 nan nan 3.26e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 MatAssemblyBegin 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatAssemblyEnd 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatGetRowIJ 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatCreateSubMats 380 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatGetOrdering 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatZeroEntries 379 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatSetPreallCOO 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatSetValuesCOO 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 KSPSetUp 760 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 KSPSolve 190 1.0 5.8052e-01 1.0 9.30e+08 1.0 0.0e+00 0.0e+00 0.0e+00 10 86 0 0 0 10 86 0 0 0 1602 -nan 1 4.80e-03 0 0.00e+00 46 KSPGMRESOrthog 775 1.0 nan nan 2.27e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 SNESSolve 71 1.0 5.7117e+00 1.0 1.07e+09 1.0 0.0e+00 0.0e+00 0.0e+00 95 99 0 0 0 95 99 0 0 0 188 -nan 1 4.80e-03 0 0.00e+00 53 SNESSetUp 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SNESFunctionEval 573 1.0 nan nan 2.23e+07 1.0 0.0e+00 0.0e+00 0.0e+00 60 2 0 0 0 60 2 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 SNESJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 97 SNESLineSearch 190 1.0 nan nan 1.05e+08 1.0 0.0e+00 0.0e+00 0.0e+00 53 10 0 0 0 53 10 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 PCSetUp 570 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 11 0 0 0 2 11 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 PCApply 965 1.0 nan nan 6.14e+08 1.0 0.0e+00 0.0e+00 0.0e+00 8 57 0 0 0 8 57 0 0 0 -nan -nan 1 4.80e-03 0 0.00e+00 19 KSPSolve_FS_0 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 KSPSolve_FS_1 965 1.0 nan nan 1.66e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 15 0 0 0 2 15 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 --- Event Stage 1: Unknown ------------------------------------------------------------------------------------------------------------------------ --------------------------------------- Object Type Creations Destructions. Reports information only for process 0. --- Event Stage 0: Main Stage Container 5 5 Distributed Mesh 2 2 Index Set 11 11 IS L to G Mapping 1 1 Star Forest Graph 7 7 Discrete System 2 2 Weak Form 2 2 Vector 49 49 TSAdapt 1 1 TS 1 1 DMTS 1 1 SNES 1 1 DMSNES 3 3 SNESLineSearch 1 1 Krylov Solver 4 4 DMKSP interface 1 1 Matrix 4 4 Preconditioner 4 4 Viewer 2 1 --- Event Stage 1: Unknown ======================================================================================================================== Average time to get PetscTime(): 3.14e-08 #PETSc Option Table entries: -log_view -log_view_gpu_times #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with 64 bit PetscInt Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8 Configure options: PETSC_DIR=/home/4pf/repos/petsc PETSC_ARCH=arch-kokkos-cuda-no-tpls --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-cuda --with-debugging=0 --with-shared-libraries --prefix=/home/4pf/build/petsc/cuda-no-tpls/install --with-64-bit-indices --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --CUDAOPTFLAGS=-O3 --with-kokkos-dir=/home/4pf/build/kokkos/cuda/install --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/cuda-no-tpls/install ----------------------------------------- Libraries compiled on 2022-11-01 21:01:08 on PC0115427 Machine characteristics: Linux-5.15.0-52-generic-x86_64-with-glibc2.35 Using PETSc directory: /home/4pf/build/petsc/cuda-no-tpls/install Using PETSc arch: ----------------------------------------- Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector -fvisibility=hidden -O3 ----------------------------------------- Using include paths: -I/home/4pf/build/petsc/cuda-no-tpls/install/include -I/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/include -I/home/4pf/build/kokkos/cuda/install/include -I/usr/local/cuda-11.8/include ----------------------------------------- Using C linker: mpicc Using libraries: -Wl,-rpath,/home/4pf/build/petsc/cuda-no-tpls/install/lib -L/home/4pf/build/petsc/cuda-no-tpls/install/lib -lpetsc -Wl,-rpath,/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib -L/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib -Wl,-rpath,/home/4pf/build/kokkos/cuda/install/lib -L/home/4pf/build/kokkos/cuda/install/lib -Wl,-rpath,/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64/stubs -lkokkoskernels -lkokkoscontainers -lkokkoscore -llapack -lblas -lm -lcudart -lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda -lquadmath -lstdc++ -ldl ----------------------------------------- Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang <jun...@gm...<mailto:jun...@gm...>> Sent: Tuesday, November 15, 2022 13:03 To: Fackler, Philip <fac...@or...<mailto:fac...@or...>> Cc: xol...@li...<mailto:xol...@li...> <xol...@li...<mailto:xol...@li...>>; pet...@mc...<mailto:pet...@mc...> <pet...@mc...<mailto:pet...@mc...>>; Blondel, Sophie <sbl...@ut...<mailto:sbl...@ut...>>; Roth, Philip <ro...@or...<mailto:ro...@or...>> Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device. Can you paste -log_view result so I can see what functions are used? --Junchao Zhang On Tue, Nov 15, 2022 at 10:24 AM Fackler, Philip <fac...@or...<mailto:fac...@or...>> wrote: Yes, most (but not all) of our system test cases fail with the kokkos/cuda or cuda backends. All of them pass with the CPU-only kokkos backend. Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang <jun...@gm...<mailto:jun...@gm...>> Sent: Monday, November 14, 2022 19:34 To: Fackler, Philip <fac...@or...<mailto:fac...@or...>> Cc: xol...@li...<mailto:xol...@li...> <xol...@li...<mailto:xol...@li...>>; pet...@mc...<mailto:pet...@mc...> <pet...@mc...<mailto:pet...@mc...>>; Blondel, Sophie <sbl...@ut...<mailto:sbl...@ut...>>; Zhang, Junchao <jc...@mc...<mailto:jc...@mc...>>; Roth, Philip <ro...@or...<mailto:ro...@or...>> Subject: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device. Hi, Philip, Sorry to hear that. It seems you could run the same code on CPUs but not no GPUs (with either petsc/Kokkos backend or petsc/cuda backend, is it right? --Junchao Zhang On Mon, Nov 14, 2022 at 12:13 PM Fackler, Philip via petsc-users <pet...@mc...<mailto:pet...@mc...>> wrote: This is an issue I've brought up before (and discussed in-person with Richard). I wanted to bring it up again because I'm hitting the limits of what I know to do, and I need help figuring this out. The problem can be reproduced using Xolotl's "develop" branch built against a petsc build with kokkos and kokkos-kernels enabled. Then, either add the relevant kokkos options to the "petscArgs=" line in the system test parameter file(s), or just replace the system test parameter files with the ones from the "feature-petsc-kokkos" branch. See here the files that begin with "params_system_". Note that those files use the "kokkos" options, but the problem is similar using the corresponding cuda/cusparse options. I've already tried building kokkos-kernels with no TPLs and got slightly different results, but the same problem. Any help would be appreciated. Thanks, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory |
From: Junchao Z. <jun...@gm...> - 2022-12-07 23:47:37
|
Hi, Philip, I could reproduce the error. I need to find a way to debug it. Thanks. /home/jczhang/xolotl/test/system/SystemTestCase.cpp(317): fatal error: in "System/PSI_1": absolute value of diffNorm{0.19704848134353209} exceeds 1e-10 *** 1 failure is detected in the test module "Regression" --Junchao Zhang On Tue, Dec 6, 2022 at 10:10 AM Fackler, Philip <fac...@or...> wrote: > I think it would be simpler to use the develop branch for this issue. But > you can still just build the SystemTester. Then (if you changed the PSI_1 > case) run: > > ./test/system/SystemTester -t System/PSI_1 -- -v > > (No need for multiple MPI ranks) > > Thanks, > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > ------------------------------ > *From:* Junchao Zhang <jun...@gm...> > *Sent:* Monday, December 5, 2022 15:40 > *To:* Fackler, Philip <fac...@or...> > *Cc:* xol...@li... < > xol...@li...>; pet...@mc... < > pet...@mc...>; Blondel, Sophie <sbl...@ut...>; Roth, > Philip <ro...@or...> > *Subject:* Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and > Vec diverging when running on CUDA device. > > I configured with xolotl branch feature-petsc-kokkos, and typed `make` > under ~/xolotl-build/. Though there were errors, a lot of *Tester were > built. > > [ 62%] Built target xolotlViz > [ 63%] Linking CXX executable TemperatureProfileHandlerTester > [ 64%] Linking CXX executable TemperatureGradientHandlerTester > [ 64%] Built target TemperatureProfileHandlerTester > [ 64%] Built target TemperatureConstantHandlerTester > [ 64%] Built target TemperatureGradientHandlerTester > [ 65%] Linking CXX executable HeatEquationHandlerTester > [ 65%] Built target HeatEquationHandlerTester > [ 66%] Linking CXX executable FeFitFluxHandlerTester > [ 66%] Linking CXX executable W111FitFluxHandlerTester > [ 67%] Linking CXX executable FuelFitFluxHandlerTester > [ 67%] Linking CXX executable W211FitFluxHandlerTester > > Which Tester should I use to run with the parameter file > benchmarks/params_system_PSI_2.txt? And how many ranks should I use? > Could you give an example command line? > Thanks. > > --Junchao Zhang > > > On Mon, Dec 5, 2022 at 2:22 PM Junchao Zhang <jun...@gm...> > wrote: > > Hello, Philip, > Do I still need to use the feature-petsc-kokkos branch? > --Junchao Zhang > > > On Mon, Dec 5, 2022 at 11:08 AM Fackler, Philip <fac...@or...> > wrote: > > Junchao, > > Thank you for working on this. If you open the parameter file for, say, > the PSI_2 system test case (benchmarks/params_system_PSI_2.txt), simply add -dm_mat_type > aijkokkos -dm_vec_type kokkos` to the "petscArgs=" field (or the > corresponding cusparse/cuda option). > > Thanks, > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > ------------------------------ > *From:* Junchao Zhang <jun...@gm...> > *Sent:* Thursday, December 1, 2022 17:05 > *To:* Fackler, Philip <fac...@or...> > *Cc:* xol...@li... < > xol...@li...>; pet...@mc... < > pet...@mc...>; Blondel, Sophie <sbl...@ut...>; Roth, > Philip <ro...@or...> > *Subject:* Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and > Vec diverging when running on CUDA device. > > Hi, Philip, > Sorry for the long delay. I could not get something useful from the > -log_view output. Since I have already built xolotl, could you give me > instructions on how to do a xolotl test to reproduce the divergence with > petsc GPU backends (but fine on CPU)? > Thank you. > --Junchao Zhang > > > On Wed, Nov 16, 2022 at 1:38 PM Fackler, Philip <fac...@or...> > wrote: > > ------------------------------------------------------------------ PETSc > Performance Summary: > ------------------------------------------------------------------ > > Unknown Name on a named PC0115427 with 1 processor, by 4pf Wed Nov 16 > 14:36:46 2022 > Using Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a GIT Date: > 2022-10-28 14:39:41 +0000 > > Max Max/Min Avg Total > Time (sec): 6.023e+00 1.000 6.023e+00 > Objects: 1.020e+02 1.000 1.020e+02 > Flops: 1.080e+09 1.000 1.080e+09 1.080e+09 > Flops/sec: 1.793e+08 1.000 1.793e+08 1.793e+08 > MPI Msg Count: 0.000e+00 0.000 0.000e+00 0.000e+00 > MPI Msg Len (bytes): 0.000e+00 0.000 0.000e+00 0.000e+00 > MPI Reductions: 0.000e+00 0.000 > > Flop counting convention: 1 flop = 1 real number operation of type > (multiply/divide/add/subtract) > e.g., VecAXPY() for real vectors of length N > --> 2N flops > and VecAXPY() for complex vectors of length N > --> 8N flops > > Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages > --- -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total Count > %Total Avg %Total Count %Total > 0: Main Stage: 6.0226e+00 100.0% 1.0799e+09 100.0% 0.000e+00 > 0.0% 0.000e+00 0.0% 0.000e+00 0.0% > > > ------------------------------------------------------------------------------------------------------------------------ > See the 'Profiling' chapter of the users' manual for details on > interpreting output. > Phase summary info: > Count: number of times phase was executed > Time and Flop: Max - maximum over all processors > Ratio - ratio of maximum to minimum over all processors > Mess: number of messages sent > AvgLen: average message length (bytes) > Reduct: number of global reductions > Global: entire computation > Stage: stages of a computation. Set stages with PetscLogStagePush() and > PetscLogStagePop(). > %T - percent time in this phase %F - percent flop in this > phase > %M - percent messages in this phase %L - percent message lengths > in this phase > %R - percent reductions in this phase > Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over > all processors) > GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU > time over all processors) > CpuToGpu Count: total number of CPU to GPU copies per processor > CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per > processor) > GpuToCpu Count: total number of GPU to CPU copies per processor > GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per > processor) > GPU %F: percent flops on GPU in this event > > ------------------------------------------------------------------------------------------------------------------------ > Event Count Time (sec) Flop > --- Global --- --- Stage ---- Total > GPU - CpuToGpu - - GpuToCpu - GPU > > Max Ratio Max Ratio Max Ratio Mess AvgLen > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > Mflop/s Count Size Count Size %F > > > ------------------------------------------------------------------------------------------------------------------------ > --------------------------------------- > > > --- Event Stage 0: Main Stage > > BuildTwoSided 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > DMCreateMat 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFSetGraph 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFSetUp 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFPack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFUnpack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecDot 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecMDot 775 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecNorm 1728 1.0 nan nan 1.92e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecScale 1983 1.0 nan nan 6.24e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecCopy 780 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecSet 4955 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 2 0 0 0 0 2 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecAXPY 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecAYPX 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecAXPBYCZ 643 1.0 nan nan 1.79e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecWAXPY 502 1.0 nan nan 5.58e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecMAXPY 1159 1.0 nan nan 3.68e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecScatterBegin 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan > -nan 2 5.14e-03 0 0.00e+00 0 > > VecScatterEnd 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecReduceArith 380 1.0 nan nan 4.23e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecReduceComm 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecNormalize 965 1.0 nan nan 1.61e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > TSStep 20 1.0 5.8699e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 97100 0 0 0 97100 0 0 0 184 > -nan 2 5.14e-03 0 0.00e+00 54 > > TSFunctionEval 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 63 1 0 0 0 63 1 0 0 0 -nan > -nan 1 3.36e-04 0 0.00e+00 100 > > TSJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 97 > > MatMult 1930 1.0 nan nan 4.46e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 41 0 0 0 1 41 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > MatMultTranspose 1 1.0 nan nan 3.44e+05 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > MatSolve 965 1.0 nan nan 5.04e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 5 0 0 0 1 5 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatSOR 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatLUFactorSym 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatLUFactorNum 190 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 11 0 0 0 1 11 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatScale 190 1.0 nan nan 3.26e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > MatAssemblyBegin 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatAssemblyEnd 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatGetRowIJ 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatCreateSubMats 380 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatGetOrdering 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatZeroEntries 379 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatSetPreallCOO 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatSetValuesCOO 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > KSPSetUp 760 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > KSPSolve 190 1.0 5.8052e-01 1.0 9.30e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 10 86 0 0 0 10 86 0 0 0 1602 > -nan 1 4.80e-03 0 0.00e+00 46 > > KSPGMRESOrthog 775 1.0 nan nan 2.27e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 2 0 0 0 1 2 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > SNESSolve 71 1.0 5.7117e+00 1.0 1.07e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 95 99 0 0 0 95 99 0 0 0 188 > -nan 1 4.80e-03 0 0.00e+00 53 > > SNESSetUp 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SNESFunctionEval 573 1.0 nan nan 2.23e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 60 2 0 0 0 60 2 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > SNESJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 97 > > SNESLineSearch 190 1.0 nan nan 1.05e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 53 10 0 0 0 53 10 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > PCSetUp 570 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 11 0 0 0 2 11 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > PCApply 965 1.0 nan nan 6.14e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 8 57 0 0 0 8 57 0 0 0 -nan > -nan 1 4.80e-03 0 0.00e+00 19 > > KSPSolve_FS_0 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > KSPSolve_FS_1 965 1.0 nan nan 1.66e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 15 0 0 0 2 15 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > > --- Event Stage 1: Unknown > > > ------------------------------------------------------------------------------------------------------------------------ > --------------------------------------- > > > Object Type Creations Destructions. Reports information only > for process 0. > > --- Event Stage 0: Main Stage > > Container 5 5 > Distributed Mesh 2 2 > Index Set 11 11 > IS L to G Mapping 1 1 > Star Forest Graph 7 7 > Discrete System 2 2 > Weak Form 2 2 > Vector 49 49 > TSAdapt 1 1 > TS 1 1 > DMTS 1 1 > SNES 1 1 > DMSNES 3 3 > SNESLineSearch 1 1 > Krylov Solver 4 4 > DMKSP interface 1 1 > Matrix 4 4 > Preconditioner 4 4 > Viewer 2 1 > > --- Event Stage 1: Unknown > > > ======================================================================================================================== > Average time to get PetscTime(): 3.14e-08 > #PETSc Option Table entries: > -log_view > -log_view_gpu_times > #End of PETSc Option Table entries > Compiled without FORTRAN kernels > Compiled with 64 bit PetscInt > Compiled with full precision matrices (default) > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 > sizeof(PetscScalar) 8 sizeof(PetscInt) 8 > Configure options: PETSC_DIR=/home/4pf/repos/petsc > PETSC_ARCH=arch-kokkos-cuda-no-tpls --with-cc=mpicc --with-cxx=mpicxx > --with-fc=0 --with-cuda --with-debugging=0 --with-shared-libraries > --prefix=/home/4pf/build/petsc/cuda-no-tpls/install --with-64-bit-indices > --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --CUDAOPTFLAGS=-O3 > --with-kokkos-dir=/home/4pf/build/kokkos/cuda/install > --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/cuda-no-tpls/install > > ----------------------------------------- > Libraries compiled on 2022-11-01 21:01:08 on PC0115427 > Machine characteristics: Linux-5.15.0-52-generic-x86_64-with-glibc2.35 > Using PETSc directory: /home/4pf/build/petsc/cuda-no-tpls/install > Using PETSc arch: > ----------------------------------------- > > Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas > -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector > -fvisibility=hidden -O3 > ----------------------------------------- > > Using include paths: -I/home/4pf/build/petsc/cuda-no-tpls/install/include > -I/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/include > -I/home/4pf/build/kokkos/cuda/install/include -I/usr/local/cuda-11.8/include > ----------------------------------------- > > Using C linker: mpicc > Using libraries: -Wl,-rpath,/home/4pf/build/petsc/cuda-no-tpls/install/lib > -L/home/4pf/build/petsc/cuda-no-tpls/install/lib -lpetsc > -Wl,-rpath,/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib > -L/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib > -Wl,-rpath,/home/4pf/build/kokkos/cuda/install/lib > -L/home/4pf/build/kokkos/cuda/install/lib > -Wl,-rpath,/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64 > -L/usr/local/cuda-11.8/lib64/stubs -lkokkoskernels -lkokkoscontainers > -lkokkoscore -llapack -lblas -lm -lcudart -lnvToolsExt -lcufft -lcublas > -lcusparse -lcusolver -lcurand -lcuda -lquadmath -lstdc++ -ldl > ----------------------------------------- > > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > ------------------------------ > *From:* Junchao Zhang <jun...@gm...> > *Sent:* Tuesday, November 15, 2022 13:03 > *To:* Fackler, Philip <fac...@or...> > *Cc:* xol...@li... < > xol...@li...>; pet...@mc... < > pet...@mc...>; Blondel, Sophie <sbl...@ut...>; Roth, > Philip <ro...@or...> > *Subject:* Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and > Vec diverging when running on CUDA device. > > Can you paste -log_view result so I can see what functions are used? > > --Junchao Zhang > > > On Tue, Nov 15, 2022 at 10:24 AM Fackler, Philip <fac...@or...> > wrote: > > Yes, most (but not all) of our system test cases fail with the kokkos/cuda > or cuda backends. All of them pass with the CPU-only kokkos backend. > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > ------------------------------ > *From:* Junchao Zhang <jun...@gm...> > *Sent:* Monday, November 14, 2022 19:34 > *To:* Fackler, Philip <fac...@or...> > *Cc:* xol...@li... < > xol...@li...>; pet...@mc... < > pet...@mc...>; Blondel, Sophie <sbl...@ut...>; Zhang, > Junchao <jc...@mc...>; Roth, Philip <ro...@or...> > *Subject:* [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec > diverging when running on CUDA device. > > Hi, Philip, > Sorry to hear that. It seems you could run the same code on CPUs but > not no GPUs (with either petsc/Kokkos backend or petsc/cuda backend, is it > right? > > --Junchao Zhang > > > On Mon, Nov 14, 2022 at 12:13 PM Fackler, Philip via petsc-users < > pet...@mc...> wrote: > > This is an issue I've brought up before (and discussed in-person with > Richard). I wanted to bring it up again because I'm hitting the limits of > what I know to do, and I need help figuring this out. > > The problem can be reproduced using Xolotl's "develop" branch built > against a petsc build with kokkos and kokkos-kernels enabled. Then, either > add the relevant kokkos options to the "petscArgs=" line in the system test > parameter file(s), or just replace the system test parameter files with the > ones from the "feature-petsc-kokkos" branch. See here the files that > begin with "params_system_". > > Note that those files use the "kokkos" options, but the problem is similar > using the corresponding cuda/cusparse options. I've already tried building > kokkos-kernels with no TPLs and got slightly different results, but the > same problem. > > Any help would be appreciated. > > Thanks, > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > > |
From: Fackler, P. <fac...@or...> - 2022-12-06 16:10:33
|
I think it would be simpler to use the develop branch for this issue. But you can still just build the SystemTester. Then (if you changed the PSI_1 case) run: ./test/system/SystemTester -t System/PSI_1 -- -v (No need for multiple MPI ranks) Thanks, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang <jun...@gm...> Sent: Monday, December 5, 2022 15:40 To: Fackler, Philip <fac...@or...> Cc: xol...@li... <xol...@li...>; pet...@mc... <pet...@mc...>; Blondel, Sophie <sbl...@ut...>; Roth, Philip <ro...@or...> Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device. I configured with xolotl branch feature-petsc-kokkos, and typed `make` under ~/xolotl-build/. Though there were errors, a lot of *Tester were built. [ 62%] Built target xolotlViz [ 63%] Linking CXX executable TemperatureProfileHandlerTester [ 64%] Linking CXX executable TemperatureGradientHandlerTester [ 64%] Built target TemperatureProfileHandlerTester [ 64%] Built target TemperatureConstantHandlerTester [ 64%] Built target TemperatureGradientHandlerTester [ 65%] Linking CXX executable HeatEquationHandlerTester [ 65%] Built target HeatEquationHandlerTester [ 66%] Linking CXX executable FeFitFluxHandlerTester [ 66%] Linking CXX executable W111FitFluxHandlerTester [ 67%] Linking CXX executable FuelFitFluxHandlerTester [ 67%] Linking CXX executable W211FitFluxHandlerTester Which Tester should I use to run with the parameter file benchmarks/params_system_PSI_2.txt? And how many ranks should I use? Could you give an example command line? Thanks. --Junchao Zhang On Mon, Dec 5, 2022 at 2:22 PM Junchao Zhang <jun...@gm...<mailto:jun...@gm...>> wrote: Hello, Philip, Do I still need to use the feature-petsc-kokkos branch? --Junchao Zhang On Mon, Dec 5, 2022 at 11:08 AM Fackler, Philip <fac...@or...<mailto:fac...@or...>> wrote: Junchao, Thank you for working on this. If you open the parameter file for, say, the PSI_2 system test case (benchmarks/params_system_PSI_2.txt), simply add -dm_mat_type aijkokkos -dm_vec_type kokkos` to the "petscArgs=" field (or the corresponding cusparse/cuda option). Thanks, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang <jun...@gm...<mailto:jun...@gm...>> Sent: Thursday, December 1, 2022 17:05 To: Fackler, Philip <fac...@or...<mailto:fac...@or...>> Cc: xol...@li...<mailto:xol...@li...> <xol...@li...<mailto:xol...@li...>>; pet...@mc...<mailto:pet...@mc...> <pet...@mc...<mailto:pet...@mc...>>; Blondel, Sophie <sbl...@ut...<mailto:sbl...@ut...>>; Roth, Philip <ro...@or...<mailto:ro...@or...>> Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device. Hi, Philip, Sorry for the long delay. I could not get something useful from the -log_view output. Since I have already built xolotl, could you give me instructions on how to do a xolotl test to reproduce the divergence with petsc GPU backends (but fine on CPU)? Thank you. --Junchao Zhang On Wed, Nov 16, 2022 at 1:38 PM Fackler, Philip <fac...@or...<mailto:fac...@or...>> wrote: ------------------------------------------------------------------ PETSc Performance Summary: ------------------------------------------------------------------ Unknown Name on a named PC0115427 with 1 processor, by 4pf Wed Nov 16 14:36:46 2022 Using Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a GIT Date: 2022-10-28 14:39:41 +0000 Max Max/Min Avg Total Time (sec): 6.023e+00 1.000 6.023e+00 Objects: 1.020e+02 1.000 1.020e+02 Flops: 1.080e+09 1.000 1.080e+09 1.080e+09 Flops/sec: 1.793e+08 1.000 1.793e+08 1.793e+08 MPI Msg Count: 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Msg Len (bytes): 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Reductions: 0.000e+00 0.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 6.0226e+00 100.0% 1.0799e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors) CpuToGpu Count: total number of CPU to GPU copies per processor CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor) GpuToCpu Count: total number of GPU to CPU copies per processor GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor) GPU %F: percent flops on GPU in this event ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F ------------------------------------------------------------------------------------------------------------------------ --------------------------------------- --- Event Stage 0: Main Stage BuildTwoSided 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 DMCreateMat 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFSetGraph 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFSetUp 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFPack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFUnpack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecDot 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecMDot 775 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecNorm 1728 1.0 nan nan 1.92e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecScale 1983 1.0 nan nan 6.24e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecCopy 780 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecSet 4955 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecAXPY 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecAYPX 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecAXPBYCZ 643 1.0 nan nan 1.79e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecWAXPY 502 1.0 nan nan 5.58e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecMAXPY 1159 1.0 nan nan 3.68e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecScatterBegin 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan -nan 2 5.14e-03 0 0.00e+00 0 VecScatterEnd 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecReduceArith 380 1.0 nan nan 4.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecReduceComm 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecNormalize 965 1.0 nan nan 1.61e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 TSStep 20 1.0 5.8699e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 97100 0 0 0 97100 0 0 0 184 -nan 2 5.14e-03 0 0.00e+00 54 TSFunctionEval 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00 63 1 0 0 0 63 1 0 0 0 -nan -nan 1 3.36e-04 0 0.00e+00 100 TSJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 97 MatMult 1930 1.0 nan nan 4.46e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 41 0 0 0 1 41 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 MatMultTranspose 1 1.0 nan nan 3.44e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 MatSolve 965 1.0 nan nan 5.04e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 5 0 0 0 1 5 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatSOR 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatLUFactorSym 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatLUFactorNum 190 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 11 0 0 0 1 11 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatScale 190 1.0 nan nan 3.26e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 MatAssemblyBegin 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatAssemblyEnd 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatGetRowIJ 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatCreateSubMats 380 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatGetOrdering 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatZeroEntries 379 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatSetPreallCOO 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatSetValuesCOO 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 KSPSetUp 760 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 KSPSolve 190 1.0 5.8052e-01 1.0 9.30e+08 1.0 0.0e+00 0.0e+00 0.0e+00 10 86 0 0 0 10 86 0 0 0 1602 -nan 1 4.80e-03 0 0.00e+00 46 KSPGMRESOrthog 775 1.0 nan nan 2.27e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 SNESSolve 71 1.0 5.7117e+00 1.0 1.07e+09 1.0 0.0e+00 0.0e+00 0.0e+00 95 99 0 0 0 95 99 0 0 0 188 -nan 1 4.80e-03 0 0.00e+00 53 SNESSetUp 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SNESFunctionEval 573 1.0 nan nan 2.23e+07 1.0 0.0e+00 0.0e+00 0.0e+00 60 2 0 0 0 60 2 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 SNESJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 97 SNESLineSearch 190 1.0 nan nan 1.05e+08 1.0 0.0e+00 0.0e+00 0.0e+00 53 10 0 0 0 53 10 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 PCSetUp 570 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 11 0 0 0 2 11 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 PCApply 965 1.0 nan nan 6.14e+08 1.0 0.0e+00 0.0e+00 0.0e+00 8 57 0 0 0 8 57 0 0 0 -nan -nan 1 4.80e-03 0 0.00e+00 19 KSPSolve_FS_0 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 KSPSolve_FS_1 965 1.0 nan nan 1.66e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 15 0 0 0 2 15 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 --- Event Stage 1: Unknown ------------------------------------------------------------------------------------------------------------------------ --------------------------------------- Object Type Creations Destructions. Reports information only for process 0. --- Event Stage 0: Main Stage Container 5 5 Distributed Mesh 2 2 Index Set 11 11 IS L to G Mapping 1 1 Star Forest Graph 7 7 Discrete System 2 2 Weak Form 2 2 Vector 49 49 TSAdapt 1 1 TS 1 1 DMTS 1 1 SNES 1 1 DMSNES 3 3 SNESLineSearch 1 1 Krylov Solver 4 4 DMKSP interface 1 1 Matrix 4 4 Preconditioner 4 4 Viewer 2 1 --- Event Stage 1: Unknown ======================================================================================================================== Average time to get PetscTime(): 3.14e-08 #PETSc Option Table entries: -log_view -log_view_gpu_times #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with 64 bit PetscInt Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8 Configure options: PETSC_DIR=/home/4pf/repos/petsc PETSC_ARCH=arch-kokkos-cuda-no-tpls --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-cuda --with-debugging=0 --with-shared-libraries --prefix=/home/4pf/build/petsc/cuda-no-tpls/install --with-64-bit-indices --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --CUDAOPTFLAGS=-O3 --with-kokkos-dir=/home/4pf/build/kokkos/cuda/install --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/cuda-no-tpls/install ----------------------------------------- Libraries compiled on 2022-11-01 21:01:08 on PC0115427 Machine characteristics: Linux-5.15.0-52-generic-x86_64-with-glibc2.35 Using PETSc directory: /home/4pf/build/petsc/cuda-no-tpls/install Using PETSc arch: ----------------------------------------- Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector -fvisibility=hidden -O3 ----------------------------------------- Using include paths: -I/home/4pf/build/petsc/cuda-no-tpls/install/include -I/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/include -I/home/4pf/build/kokkos/cuda/install/include -I/usr/local/cuda-11.8/include ----------------------------------------- Using C linker: mpicc Using libraries: -Wl,-rpath,/home/4pf/build/petsc/cuda-no-tpls/install/lib -L/home/4pf/build/petsc/cuda-no-tpls/install/lib -lpetsc -Wl,-rpath,/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib -L/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib -Wl,-rpath,/home/4pf/build/kokkos/cuda/install/lib -L/home/4pf/build/kokkos/cuda/install/lib -Wl,-rpath,/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64/stubs -lkokkoskernels -lkokkoscontainers -lkokkoscore -llapack -lblas -lm -lcudart -lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda -lquadmath -lstdc++ -ldl ----------------------------------------- Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang <jun...@gm...<mailto:jun...@gm...>> Sent: Tuesday, November 15, 2022 13:03 To: Fackler, Philip <fac...@or...<mailto:fac...@or...>> Cc: xol...@li...<mailto:xol...@li...> <xol...@li...<mailto:xol...@li...>>; pet...@mc...<mailto:pet...@mc...> <pet...@mc...<mailto:pet...@mc...>>; Blondel, Sophie <sbl...@ut...<mailto:sbl...@ut...>>; Roth, Philip <ro...@or...<mailto:ro...@or...>> Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device. Can you paste -log_view result so I can see what functions are used? --Junchao Zhang On Tue, Nov 15, 2022 at 10:24 AM Fackler, Philip <fac...@or...<mailto:fac...@or...>> wrote: Yes, most (but not all) of our system test cases fail with the kokkos/cuda or cuda backends. All of them pass with the CPU-only kokkos backend. Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang <jun...@gm...<mailto:jun...@gm...>> Sent: Monday, November 14, 2022 19:34 To: Fackler, Philip <fac...@or...<mailto:fac...@or...>> Cc: xol...@li...<mailto:xol...@li...> <xol...@li...<mailto:xol...@li...>>; pet...@mc...<mailto:pet...@mc...> <pet...@mc...<mailto:pet...@mc...>>; Blondel, Sophie <sbl...@ut...<mailto:sbl...@ut...>>; Zhang, Junchao <jc...@mc...<mailto:jc...@mc...>>; Roth, Philip <ro...@or...<mailto:ro...@or...>> Subject: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device. Hi, Philip, Sorry to hear that. It seems you could run the same code on CPUs but not no GPUs (with either petsc/Kokkos backend or petsc/cuda backend, is it right? --Junchao Zhang On Mon, Nov 14, 2022 at 12:13 PM Fackler, Philip via petsc-users <pet...@mc...<mailto:pet...@mc...>> wrote: This is an issue I've brought up before (and discussed in-person with Richard). I wanted to bring it up again because I'm hitting the limits of what I know to do, and I need help figuring this out. The problem can be reproduced using Xolotl's "develop" branch built against a petsc build with kokkos and kokkos-kernels enabled. Then, either add the relevant kokkos options to the "petscArgs=" line in the system test parameter file(s), or just replace the system test parameter files with the ones from the "feature-petsc-kokkos" branch. See here the files that begin with "params_system_". Note that those files use the "kokkos" options, but the problem is similar using the corresponding cuda/cusparse options. I've already tried building kokkos-kernels with no TPLs and got slightly different results, but the same problem. Any help would be appreciated. Thanks, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory |
From: Junchao Z. <jun...@gm...> - 2022-12-05 20:40:31
|
I configured with xolotl branch feature-petsc-kokkos, and typed `make` under ~/xolotl-build/. Though there were errors, a lot of *Tester were built. [ 62%] Built target xolotlViz [ 63%] Linking CXX executable TemperatureProfileHandlerTester [ 64%] Linking CXX executable TemperatureGradientHandlerTester [ 64%] Built target TemperatureProfileHandlerTester [ 64%] Built target TemperatureConstantHandlerTester [ 64%] Built target TemperatureGradientHandlerTester [ 65%] Linking CXX executable HeatEquationHandlerTester [ 65%] Built target HeatEquationHandlerTester [ 66%] Linking CXX executable FeFitFluxHandlerTester [ 66%] Linking CXX executable W111FitFluxHandlerTester [ 67%] Linking CXX executable FuelFitFluxHandlerTester [ 67%] Linking CXX executable W211FitFluxHandlerTester Which Tester should I use to run with the parameter file benchmarks/params_system_PSI_2.txt? And how many ranks should I use? Could you give an example command line? Thanks. --Junchao Zhang On Mon, Dec 5, 2022 at 2:22 PM Junchao Zhang <jun...@gm...> wrote: > Hello, Philip, > Do I still need to use the feature-petsc-kokkos branch? > --Junchao Zhang > > > On Mon, Dec 5, 2022 at 11:08 AM Fackler, Philip <fac...@or...> > wrote: > >> Junchao, >> >> Thank you for working on this. If you open the parameter file for, say, >> the PSI_2 system test case (benchmarks/params_system_PSI_2.txt), simply add -dm_mat_type >> aijkokkos -dm_vec_type kokkos` to the "petscArgs=" field (or the >> corresponding cusparse/cuda option). >> >> Thanks, >> >> >> *Philip Fackler * >> Research Software Engineer, Application Engineering Group >> Advanced Computing Systems Research Section >> Computer Science and Mathematics Division >> *Oak Ridge National Laboratory* >> ------------------------------ >> *From:* Junchao Zhang <jun...@gm...> >> *Sent:* Thursday, December 1, 2022 17:05 >> *To:* Fackler, Philip <fac...@or...> >> *Cc:* xol...@li... < >> xol...@li...>; pet...@mc... < >> pet...@mc...>; Blondel, Sophie <sbl...@ut...>; Roth, >> Philip <ro...@or...> >> *Subject:* Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and >> Vec diverging when running on CUDA device. >> >> Hi, Philip, >> Sorry for the long delay. I could not get something useful from the >> -log_view output. Since I have already built xolotl, could you give me >> instructions on how to do a xolotl test to reproduce the divergence with >> petsc GPU backends (but fine on CPU)? >> Thank you. >> --Junchao Zhang >> >> >> On Wed, Nov 16, 2022 at 1:38 PM Fackler, Philip <fac...@or...> >> wrote: >> >> ------------------------------------------------------------------ PETSc >> Performance Summary: >> ------------------------------------------------------------------ >> >> Unknown Name on a named PC0115427 with 1 processor, by 4pf Wed Nov 16 >> 14:36:46 2022 >> Using Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a GIT Date: >> 2022-10-28 14:39:41 +0000 >> >> Max Max/Min Avg Total >> Time (sec): 6.023e+00 1.000 6.023e+00 >> Objects: 1.020e+02 1.000 1.020e+02 >> Flops: 1.080e+09 1.000 1.080e+09 1.080e+09 >> Flops/sec: 1.793e+08 1.000 1.793e+08 1.793e+08 >> MPI Msg Count: 0.000e+00 0.000 0.000e+00 0.000e+00 >> MPI Msg Len (bytes): 0.000e+00 0.000 0.000e+00 0.000e+00 >> MPI Reductions: 0.000e+00 0.000 >> >> Flop counting convention: 1 flop = 1 real number operation of type >> (multiply/divide/add/subtract) >> e.g., VecAXPY() for real vectors of length N >> --> 2N flops >> and VecAXPY() for complex vectors of length N >> --> 8N flops >> >> Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages >> --- -- Message Lengths -- -- Reductions -- >> Avg %Total Avg %Total Count >> %Total Avg %Total Count %Total >> 0: Main Stage: 6.0226e+00 100.0% 1.0799e+09 100.0% 0.000e+00 >> 0.0% 0.000e+00 0.0% 0.000e+00 0.0% >> >> >> ------------------------------------------------------------------------------------------------------------------------ >> See the 'Profiling' chapter of the users' manual for details on >> interpreting output. >> Phase summary info: >> Count: number of times phase was executed >> Time and Flop: Max - maximum over all processors >> Ratio - ratio of maximum to minimum over all processors >> Mess: number of messages sent >> AvgLen: average message length (bytes) >> Reduct: number of global reductions >> Global: entire computation >> Stage: stages of a computation. Set stages with PetscLogStagePush() >> and PetscLogStagePop(). >> %T - percent time in this phase %F - percent flop in this >> phase >> %M - percent messages in this phase %L - percent message >> lengths in this phase >> %R - percent reductions in this phase >> Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time >> over all processors) >> GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU >> time over all processors) >> CpuToGpu Count: total number of CPU to GPU copies per processor >> CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per >> processor) >> GpuToCpu Count: total number of GPU to CPU copies per processor >> GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per >> processor) >> GPU %F: percent flops on GPU in this event >> >> ------------------------------------------------------------------------------------------------------------------------ >> Event Count Time (sec) Flop >> --- Global --- --- Stage ---- Total >> GPU - CpuToGpu - - GpuToCpu - GPU >> >> Max Ratio Max Ratio Max Ratio Mess AvgLen >> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >> Mflop/s Count Size Count Size %F >> >> >> ------------------------------------------------------------------------------------------------------------------------ >> --------------------------------------- >> >> >> --- Event Stage 0: Main Stage >> >> BuildTwoSided 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> DMCreateMat 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> SFSetGraph 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> SFSetUp 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> SFPack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> SFUnpack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> VecDot 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> VecMDot 775 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> VecNorm 1728 1.0 nan nan 1.92e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> VecScale 1983 1.0 nan nan 6.24e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> VecCopy 780 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> VecSet 4955 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 2 0 0 0 0 2 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> VecAXPY 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> VecAYPX 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> VecAXPBYCZ 643 1.0 nan nan 1.79e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> VecWAXPY 502 1.0 nan nan 5.58e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> VecMAXPY 1159 1.0 nan nan 3.68e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> VecScatterBegin 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan >> -nan 2 5.14e-03 0 0.00e+00 0 >> >> VecScatterEnd 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> VecReduceArith 380 1.0 nan nan 4.23e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> VecReduceComm 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> VecNormalize 965 1.0 nan nan 1.61e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> TSStep 20 1.0 5.8699e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 >> 0.0e+00 97100 0 0 0 97100 0 0 0 184 >> -nan 2 5.14e-03 0 0.00e+00 54 >> >> TSFunctionEval 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 63 1 0 0 0 63 1 0 0 0 -nan >> -nan 1 3.36e-04 0 0.00e+00 100 >> >> TSJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 97 >> >> MatMult 1930 1.0 nan nan 4.46e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 1 41 0 0 0 1 41 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> MatMultTranspose 1 1.0 nan nan 3.44e+05 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> MatSolve 965 1.0 nan nan 5.04e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 1 5 0 0 0 1 5 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatSOR 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatLUFactorSym 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatLUFactorNum 190 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 1 11 0 0 0 1 11 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatScale 190 1.0 nan nan 3.26e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> MatAssemblyBegin 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatAssemblyEnd 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatGetRowIJ 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatCreateSubMats 380 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatGetOrdering 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatZeroEntries 379 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatSetPreallCOO 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatSetValuesCOO 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> KSPSetUp 760 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> KSPSolve 190 1.0 5.8052e-01 1.0 9.30e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 10 86 0 0 0 10 86 0 0 0 1602 >> -nan 1 4.80e-03 0 0.00e+00 46 >> >> KSPGMRESOrthog 775 1.0 nan nan 2.27e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 1 2 0 0 0 1 2 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> SNESSolve 71 1.0 5.7117e+00 1.0 1.07e+09 1.0 0.0e+00 0.0e+00 >> 0.0e+00 95 99 0 0 0 95 99 0 0 0 188 >> -nan 1 4.80e-03 0 0.00e+00 53 >> >> SNESSetUp 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> SNESFunctionEval 573 1.0 nan nan 2.23e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 60 2 0 0 0 60 2 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> SNESJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 97 >> >> SNESLineSearch 190 1.0 nan nan 1.05e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 53 10 0 0 0 53 10 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> PCSetUp 570 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 11 0 0 0 2 11 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> PCApply 965 1.0 nan nan 6.14e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 8 57 0 0 0 8 57 0 0 0 -nan >> -nan 1 4.80e-03 0 0.00e+00 19 >> >> KSPSolve_FS_0 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> KSPSolve_FS_1 965 1.0 nan nan 1.66e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 15 0 0 0 2 15 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> >> --- Event Stage 1: Unknown >> >> >> ------------------------------------------------------------------------------------------------------------------------ >> --------------------------------------- >> >> >> Object Type Creations Destructions. Reports information only >> for process 0. >> >> --- Event Stage 0: Main Stage >> >> Container 5 5 >> Distributed Mesh 2 2 >> Index Set 11 11 >> IS L to G Mapping 1 1 >> Star Forest Graph 7 7 >> Discrete System 2 2 >> Weak Form 2 2 >> Vector 49 49 >> TSAdapt 1 1 >> TS 1 1 >> DMTS 1 1 >> SNES 1 1 >> DMSNES 3 3 >> SNESLineSearch 1 1 >> Krylov Solver 4 4 >> DMKSP interface 1 1 >> Matrix 4 4 >> Preconditioner 4 4 >> Viewer 2 1 >> >> --- Event Stage 1: Unknown >> >> >> ======================================================================================================================== >> Average time to get PetscTime(): 3.14e-08 >> #PETSc Option Table entries: >> -log_view >> -log_view_gpu_times >> #End of PETSc Option Table entries >> Compiled without FORTRAN kernels >> Compiled with 64 bit PetscInt >> Compiled with full precision matrices (default) >> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 >> sizeof(PetscScalar) 8 sizeof(PetscInt) 8 >> Configure options: PETSC_DIR=/home/4pf/repos/petsc >> PETSC_ARCH=arch-kokkos-cuda-no-tpls --with-cc=mpicc --with-cxx=mpicxx >> --with-fc=0 --with-cuda --with-debugging=0 --with-shared-libraries >> --prefix=/home/4pf/build/petsc/cuda-no-tpls/install --with-64-bit-indices >> --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --CUDAOPTFLAGS=-O3 >> --with-kokkos-dir=/home/4pf/build/kokkos/cuda/install >> --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/cuda-no-tpls/install >> >> ----------------------------------------- >> Libraries compiled on 2022-11-01 21:01:08 on PC0115427 >> Machine characteristics: Linux-5.15.0-52-generic-x86_64-with-glibc2.35 >> Using PETSc directory: /home/4pf/build/petsc/cuda-no-tpls/install >> Using PETSc arch: >> ----------------------------------------- >> >> Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas >> -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector >> -fvisibility=hidden -O3 >> ----------------------------------------- >> >> Using include paths: -I/home/4pf/build/petsc/cuda-no-tpls/install/include >> -I/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/include >> -I/home/4pf/build/kokkos/cuda/install/include -I/usr/local/cuda-11.8/include >> ----------------------------------------- >> >> Using C linker: mpicc >> Using libraries: >> -Wl,-rpath,/home/4pf/build/petsc/cuda-no-tpls/install/lib >> -L/home/4pf/build/petsc/cuda-no-tpls/install/lib -lpetsc >> -Wl,-rpath,/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib >> -L/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib >> -Wl,-rpath,/home/4pf/build/kokkos/cuda/install/lib >> -L/home/4pf/build/kokkos/cuda/install/lib >> -Wl,-rpath,/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64 >> -L/usr/local/cuda-11.8/lib64/stubs -lkokkoskernels -lkokkoscontainers >> -lkokkoscore -llapack -lblas -lm -lcudart -lnvToolsExt -lcufft -lcublas >> -lcusparse -lcusolver -lcurand -lcuda -lquadmath -lstdc++ -ldl >> ----------------------------------------- >> >> >> >> *Philip Fackler * >> Research Software Engineer, Application Engineering Group >> Advanced Computing Systems Research Section >> Computer Science and Mathematics Division >> *Oak Ridge National Laboratory* >> ------------------------------ >> *From:* Junchao Zhang <jun...@gm...> >> *Sent:* Tuesday, November 15, 2022 13:03 >> *To:* Fackler, Philip <fac...@or...> >> *Cc:* xol...@li... < >> xol...@li...>; pet...@mc... < >> pet...@mc...>; Blondel, Sophie <sbl...@ut...>; Roth, >> Philip <ro...@or...> >> *Subject:* Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and >> Vec diverging when running on CUDA device. >> >> Can you paste -log_view result so I can see what functions are used? >> >> --Junchao Zhang >> >> >> On Tue, Nov 15, 2022 at 10:24 AM Fackler, Philip <fac...@or...> >> wrote: >> >> Yes, most (but not all) of our system test cases fail with the >> kokkos/cuda or cuda backends. All of them pass with the CPU-only kokkos >> backend. >> >> >> *Philip Fackler * >> Research Software Engineer, Application Engineering Group >> Advanced Computing Systems Research Section >> Computer Science and Mathematics Division >> *Oak Ridge National Laboratory* >> ------------------------------ >> *From:* Junchao Zhang <jun...@gm...> >> *Sent:* Monday, November 14, 2022 19:34 >> *To:* Fackler, Philip <fac...@or...> >> *Cc:* xol...@li... < >> xol...@li...>; pet...@mc... < >> pet...@mc...>; Blondel, Sophie <sbl...@ut...>; Zhang, >> Junchao <jc...@mc...>; Roth, Philip <ro...@or...> >> *Subject:* [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec >> diverging when running on CUDA device. >> >> Hi, Philip, >> Sorry to hear that. It seems you could run the same code on CPUs but >> not no GPUs (with either petsc/Kokkos backend or petsc/cuda backend, is it >> right? >> >> --Junchao Zhang >> >> >> On Mon, Nov 14, 2022 at 12:13 PM Fackler, Philip via petsc-users < >> pet...@mc...> wrote: >> >> This is an issue I've brought up before (and discussed in-person with >> Richard). I wanted to bring it up again because I'm hitting the limits of >> what I know to do, and I need help figuring this out. >> >> The problem can be reproduced using Xolotl's "develop" branch built >> against a petsc build with kokkos and kokkos-kernels enabled. Then, either >> add the relevant kokkos options to the "petscArgs=" line in the system test >> parameter file(s), or just replace the system test parameter files with the >> ones from the "feature-petsc-kokkos" branch. See here the files that >> begin with "params_system_". >> >> Note that those files use the "kokkos" options, but the problem is >> similar using the corresponding cuda/cusparse options. I've already tried >> building kokkos-kernels with no TPLs and got slightly different results, >> but the same problem. >> >> Any help would be appreciated. >> >> Thanks, >> >> >> *Philip Fackler * >> Research Software Engineer, Application Engineering Group >> Advanced Computing Systems Research Section >> Computer Science and Mathematics Division >> *Oak Ridge National Laboratory* >> >> |
From: Junchao Z. <jun...@gm...> - 2022-12-05 20:23:18
|
Hello, Philip, Do I still need to use the feature-petsc-kokkos branch? --Junchao Zhang On Mon, Dec 5, 2022 at 11:08 AM Fackler, Philip <fac...@or...> wrote: > Junchao, > > Thank you for working on this. If you open the parameter file for, say, > the PSI_2 system test case (benchmarks/params_system_PSI_2.txt), simply add -dm_mat_type > aijkokkos -dm_vec_type kokkos` to the "petscArgs=" field (or the > corresponding cusparse/cuda option). > > Thanks, > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > ------------------------------ > *From:* Junchao Zhang <jun...@gm...> > *Sent:* Thursday, December 1, 2022 17:05 > *To:* Fackler, Philip <fac...@or...> > *Cc:* xol...@li... < > xol...@li...>; pet...@mc... < > pet...@mc...>; Blondel, Sophie <sbl...@ut...>; Roth, > Philip <ro...@or...> > *Subject:* Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and > Vec diverging when running on CUDA device. > > Hi, Philip, > Sorry for the long delay. I could not get something useful from the > -log_view output. Since I have already built xolotl, could you give me > instructions on how to do a xolotl test to reproduce the divergence with > petsc GPU backends (but fine on CPU)? > Thank you. > --Junchao Zhang > > > On Wed, Nov 16, 2022 at 1:38 PM Fackler, Philip <fac...@or...> > wrote: > > ------------------------------------------------------------------ PETSc > Performance Summary: > ------------------------------------------------------------------ > > Unknown Name on a named PC0115427 with 1 processor, by 4pf Wed Nov 16 > 14:36:46 2022 > Using Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a GIT Date: > 2022-10-28 14:39:41 +0000 > > Max Max/Min Avg Total > Time (sec): 6.023e+00 1.000 6.023e+00 > Objects: 1.020e+02 1.000 1.020e+02 > Flops: 1.080e+09 1.000 1.080e+09 1.080e+09 > Flops/sec: 1.793e+08 1.000 1.793e+08 1.793e+08 > MPI Msg Count: 0.000e+00 0.000 0.000e+00 0.000e+00 > MPI Msg Len (bytes): 0.000e+00 0.000 0.000e+00 0.000e+00 > MPI Reductions: 0.000e+00 0.000 > > Flop counting convention: 1 flop = 1 real number operation of type > (multiply/divide/add/subtract) > e.g., VecAXPY() for real vectors of length N > --> 2N flops > and VecAXPY() for complex vectors of length N > --> 8N flops > > Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages > --- -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total Count > %Total Avg %Total Count %Total > 0: Main Stage: 6.0226e+00 100.0% 1.0799e+09 100.0% 0.000e+00 > 0.0% 0.000e+00 0.0% 0.000e+00 0.0% > > > ------------------------------------------------------------------------------------------------------------------------ > See the 'Profiling' chapter of the users' manual for details on > interpreting output. > Phase summary info: > Count: number of times phase was executed > Time and Flop: Max - maximum over all processors > Ratio - ratio of maximum to minimum over all processors > Mess: number of messages sent > AvgLen: average message length (bytes) > Reduct: number of global reductions > Global: entire computation > Stage: stages of a computation. Set stages with PetscLogStagePush() and > PetscLogStagePop(). > %T - percent time in this phase %F - percent flop in this > phase > %M - percent messages in this phase %L - percent message lengths > in this phase > %R - percent reductions in this phase > Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over > all processors) > GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU > time over all processors) > CpuToGpu Count: total number of CPU to GPU copies per processor > CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per > processor) > GpuToCpu Count: total number of GPU to CPU copies per processor > GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per > processor) > GPU %F: percent flops on GPU in this event > > ------------------------------------------------------------------------------------------------------------------------ > Event Count Time (sec) Flop > --- Global --- --- Stage ---- Total > GPU - CpuToGpu - - GpuToCpu - GPU > > Max Ratio Max Ratio Max Ratio Mess AvgLen > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > Mflop/s Count Size Count Size %F > > > ------------------------------------------------------------------------------------------------------------------------ > --------------------------------------- > > > --- Event Stage 0: Main Stage > > BuildTwoSided 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > DMCreateMat 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFSetGraph 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFSetUp 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFPack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFUnpack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecDot 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecMDot 775 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecNorm 1728 1.0 nan nan 1.92e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecScale 1983 1.0 nan nan 6.24e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecCopy 780 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecSet 4955 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 2 0 0 0 0 2 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecAXPY 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecAYPX 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecAXPBYCZ 643 1.0 nan nan 1.79e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecWAXPY 502 1.0 nan nan 5.58e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecMAXPY 1159 1.0 nan nan 3.68e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecScatterBegin 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan > -nan 2 5.14e-03 0 0.00e+00 0 > > VecScatterEnd 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecReduceArith 380 1.0 nan nan 4.23e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecReduceComm 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecNormalize 965 1.0 nan nan 1.61e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > TSStep 20 1.0 5.8699e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 97100 0 0 0 97100 0 0 0 184 > -nan 2 5.14e-03 0 0.00e+00 54 > > TSFunctionEval 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 63 1 0 0 0 63 1 0 0 0 -nan > -nan 1 3.36e-04 0 0.00e+00 100 > > TSJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 97 > > MatMult 1930 1.0 nan nan 4.46e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 41 0 0 0 1 41 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > MatMultTranspose 1 1.0 nan nan 3.44e+05 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > MatSolve 965 1.0 nan nan 5.04e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 5 0 0 0 1 5 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatSOR 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatLUFactorSym 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatLUFactorNum 190 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 11 0 0 0 1 11 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatScale 190 1.0 nan nan 3.26e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > MatAssemblyBegin 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatAssemblyEnd 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatGetRowIJ 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatCreateSubMats 380 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatGetOrdering 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatZeroEntries 379 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatSetPreallCOO 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatSetValuesCOO 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > KSPSetUp 760 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > KSPSolve 190 1.0 5.8052e-01 1.0 9.30e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 10 86 0 0 0 10 86 0 0 0 1602 > -nan 1 4.80e-03 0 0.00e+00 46 > > KSPGMRESOrthog 775 1.0 nan nan 2.27e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 2 0 0 0 1 2 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > SNESSolve 71 1.0 5.7117e+00 1.0 1.07e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 95 99 0 0 0 95 99 0 0 0 188 > -nan 1 4.80e-03 0 0.00e+00 53 > > SNESSetUp 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SNESFunctionEval 573 1.0 nan nan 2.23e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 60 2 0 0 0 60 2 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > SNESJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 97 > > SNESLineSearch 190 1.0 nan nan 1.05e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 53 10 0 0 0 53 10 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > PCSetUp 570 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 11 0 0 0 2 11 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > PCApply 965 1.0 nan nan 6.14e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 8 57 0 0 0 8 57 0 0 0 -nan > -nan 1 4.80e-03 0 0.00e+00 19 > > KSPSolve_FS_0 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > KSPSolve_FS_1 965 1.0 nan nan 1.66e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 15 0 0 0 2 15 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > > --- Event Stage 1: Unknown > > > ------------------------------------------------------------------------------------------------------------------------ > --------------------------------------- > > > Object Type Creations Destructions. Reports information only > for process 0. > > --- Event Stage 0: Main Stage > > Container 5 5 > Distributed Mesh 2 2 > Index Set 11 11 > IS L to G Mapping 1 1 > Star Forest Graph 7 7 > Discrete System 2 2 > Weak Form 2 2 > Vector 49 49 > TSAdapt 1 1 > TS 1 1 > DMTS 1 1 > SNES 1 1 > DMSNES 3 3 > SNESLineSearch 1 1 > Krylov Solver 4 4 > DMKSP interface 1 1 > Matrix 4 4 > Preconditioner 4 4 > Viewer 2 1 > > --- Event Stage 1: Unknown > > > ======================================================================================================================== > Average time to get PetscTime(): 3.14e-08 > #PETSc Option Table entries: > -log_view > -log_view_gpu_times > #End of PETSc Option Table entries > Compiled without FORTRAN kernels > Compiled with 64 bit PetscInt > Compiled with full precision matrices (default) > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 > sizeof(PetscScalar) 8 sizeof(PetscInt) 8 > Configure options: PETSC_DIR=/home/4pf/repos/petsc > PETSC_ARCH=arch-kokkos-cuda-no-tpls --with-cc=mpicc --with-cxx=mpicxx > --with-fc=0 --with-cuda --with-debugging=0 --with-shared-libraries > --prefix=/home/4pf/build/petsc/cuda-no-tpls/install --with-64-bit-indices > --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --CUDAOPTFLAGS=-O3 > --with-kokkos-dir=/home/4pf/build/kokkos/cuda/install > --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/cuda-no-tpls/install > > ----------------------------------------- > Libraries compiled on 2022-11-01 21:01:08 on PC0115427 > Machine characteristics: Linux-5.15.0-52-generic-x86_64-with-glibc2.35 > Using PETSc directory: /home/4pf/build/petsc/cuda-no-tpls/install > Using PETSc arch: > ----------------------------------------- > > Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas > -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector > -fvisibility=hidden -O3 > ----------------------------------------- > > Using include paths: -I/home/4pf/build/petsc/cuda-no-tpls/install/include > -I/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/include > -I/home/4pf/build/kokkos/cuda/install/include -I/usr/local/cuda-11.8/include > ----------------------------------------- > > Using C linker: mpicc > Using libraries: -Wl,-rpath,/home/4pf/build/petsc/cuda-no-tpls/install/lib > -L/home/4pf/build/petsc/cuda-no-tpls/install/lib -lpetsc > -Wl,-rpath,/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib > -L/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib > -Wl,-rpath,/home/4pf/build/kokkos/cuda/install/lib > -L/home/4pf/build/kokkos/cuda/install/lib > -Wl,-rpath,/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64 > -L/usr/local/cuda-11.8/lib64/stubs -lkokkoskernels -lkokkoscontainers > -lkokkoscore -llapack -lblas -lm -lcudart -lnvToolsExt -lcufft -lcublas > -lcusparse -lcusolver -lcurand -lcuda -lquadmath -lstdc++ -ldl > ----------------------------------------- > > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > ------------------------------ > *From:* Junchao Zhang <jun...@gm...> > *Sent:* Tuesday, November 15, 2022 13:03 > *To:* Fackler, Philip <fac...@or...> > *Cc:* xol...@li... < > xol...@li...>; pet...@mc... < > pet...@mc...>; Blondel, Sophie <sbl...@ut...>; Roth, > Philip <ro...@or...> > *Subject:* Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and > Vec diverging when running on CUDA device. > > Can you paste -log_view result so I can see what functions are used? > > --Junchao Zhang > > > On Tue, Nov 15, 2022 at 10:24 AM Fackler, Philip <fac...@or...> > wrote: > > Yes, most (but not all) of our system test cases fail with the kokkos/cuda > or cuda backends. All of them pass with the CPU-only kokkos backend. > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > ------------------------------ > *From:* Junchao Zhang <jun...@gm...> > *Sent:* Monday, November 14, 2022 19:34 > *To:* Fackler, Philip <fac...@or...> > *Cc:* xol...@li... < > xol...@li...>; pet...@mc... < > pet...@mc...>; Blondel, Sophie <sbl...@ut...>; Zhang, > Junchao <jc...@mc...>; Roth, Philip <ro...@or...> > *Subject:* [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec > diverging when running on CUDA device. > > Hi, Philip, > Sorry to hear that. It seems you could run the same code on CPUs but > not no GPUs (with either petsc/Kokkos backend or petsc/cuda backend, is it > right? > > --Junchao Zhang > > > On Mon, Nov 14, 2022 at 12:13 PM Fackler, Philip via petsc-users < > pet...@mc...> wrote: > > This is an issue I've brought up before (and discussed in-person with > Richard). I wanted to bring it up again because I'm hitting the limits of > what I know to do, and I need help figuring this out. > > The problem can be reproduced using Xolotl's "develop" branch built > against a petsc build with kokkos and kokkos-kernels enabled. Then, either > add the relevant kokkos options to the "petscArgs=" line in the system test > parameter file(s), or just replace the system test parameter files with the > ones from the "feature-petsc-kokkos" branch. See here the files that > begin with "params_system_". > > Note that those files use the "kokkos" options, but the problem is similar > using the corresponding cuda/cusparse options. I've already tried building > kokkos-kernels with no TPLs and got slightly different results, but the > same problem. > > Any help would be appreciated. > > Thanks, > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > > |
From: Fackler, P. <fac...@or...> - 2022-12-05 17:08:26
|
Junchao, Thank you for working on this. If you open the parameter file for, say, the PSI_2 system test case (benchmarks/params_system_PSI_2.txt), simply add -dm_mat_type aijkokkos -dm_vec_type kokkos?` to the "petscArgs=" field (or the corresponding cusparse/cuda option). Thanks, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang <jun...@gm...> Sent: Thursday, December 1, 2022 17:05 To: Fackler, Philip <fac...@or...> Cc: xol...@li... <xol...@li...>; pet...@mc... <pet...@mc...>; Blondel, Sophie <sbl...@ut...>; Roth, Philip <ro...@or...> Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device. Hi, Philip, Sorry for the long delay. I could not get something useful from the -log_view output. Since I have already built xolotl, could you give me instructions on how to do a xolotl test to reproduce the divergence with petsc GPU backends (but fine on CPU)? Thank you. --Junchao Zhang On Wed, Nov 16, 2022 at 1:38 PM Fackler, Philip <fac...@or...<mailto:fac...@or...>> wrote: ------------------------------------------------------------------ PETSc Performance Summary: ------------------------------------------------------------------ Unknown Name on a named PC0115427 with 1 processor, by 4pf Wed Nov 16 14:36:46 2022 Using Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a GIT Date: 2022-10-28 14:39:41 +0000 Max Max/Min Avg Total Time (sec): 6.023e+00 1.000 6.023e+00 Objects: 1.020e+02 1.000 1.020e+02 Flops: 1.080e+09 1.000 1.080e+09 1.080e+09 Flops/sec: 1.793e+08 1.000 1.793e+08 1.793e+08 MPI Msg Count: 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Msg Len (bytes): 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Reductions: 0.000e+00 0.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 6.0226e+00 100.0% 1.0799e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors) CpuToGpu Count: total number of CPU to GPU copies per processor CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor) GpuToCpu Count: total number of GPU to CPU copies per processor GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor) GPU %F: percent flops on GPU in this event ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F ------------------------------------------------------------------------------------------------------------------------ --------------------------------------- --- Event Stage 0: Main Stage BuildTwoSided 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 DMCreateMat 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFSetGraph 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFSetUp 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFPack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFUnpack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecDot 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecMDot 775 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecNorm 1728 1.0 nan nan 1.92e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecScale 1983 1.0 nan nan 6.24e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecCopy 780 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecSet 4955 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecAXPY 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecAYPX 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecAXPBYCZ 643 1.0 nan nan 1.79e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecWAXPY 502 1.0 nan nan 5.58e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecMAXPY 1159 1.0 nan nan 3.68e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecScatterBegin 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan -nan 2 5.14e-03 0 0.00e+00 0 VecScatterEnd 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecReduceArith 380 1.0 nan nan 4.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecReduceComm 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecNormalize 965 1.0 nan nan 1.61e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 TSStep 20 1.0 5.8699e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 97100 0 0 0 97100 0 0 0 184 -nan 2 5.14e-03 0 0.00e+00 54 TSFunctionEval 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00 63 1 0 0 0 63 1 0 0 0 -nan -nan 1 3.36e-04 0 0.00e+00 100 TSJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 97 MatMult 1930 1.0 nan nan 4.46e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 41 0 0 0 1 41 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 MatMultTranspose 1 1.0 nan nan 3.44e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 MatSolve 965 1.0 nan nan 5.04e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 5 0 0 0 1 5 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatSOR 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatLUFactorSym 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatLUFactorNum 190 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 11 0 0 0 1 11 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatScale 190 1.0 nan nan 3.26e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 MatAssemblyBegin 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatAssemblyEnd 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatGetRowIJ 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatCreateSubMats 380 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatGetOrdering 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatZeroEntries 379 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatSetPreallCOO 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatSetValuesCOO 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 KSPSetUp 760 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 KSPSolve 190 1.0 5.8052e-01 1.0 9.30e+08 1.0 0.0e+00 0.0e+00 0.0e+00 10 86 0 0 0 10 86 0 0 0 1602 -nan 1 4.80e-03 0 0.00e+00 46 KSPGMRESOrthog 775 1.0 nan nan 2.27e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 SNESSolve 71 1.0 5.7117e+00 1.0 1.07e+09 1.0 0.0e+00 0.0e+00 0.0e+00 95 99 0 0 0 95 99 0 0 0 188 -nan 1 4.80e-03 0 0.00e+00 53 SNESSetUp 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SNESFunctionEval 573 1.0 nan nan 2.23e+07 1.0 0.0e+00 0.0e+00 0.0e+00 60 2 0 0 0 60 2 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 SNESJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 97 SNESLineSearch 190 1.0 nan nan 1.05e+08 1.0 0.0e+00 0.0e+00 0.0e+00 53 10 0 0 0 53 10 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 PCSetUp 570 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 11 0 0 0 2 11 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 PCApply 965 1.0 nan nan 6.14e+08 1.0 0.0e+00 0.0e+00 0.0e+00 8 57 0 0 0 8 57 0 0 0 -nan -nan 1 4.80e-03 0 0.00e+00 19 KSPSolve_FS_0 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 KSPSolve_FS_1 965 1.0 nan nan 1.66e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 15 0 0 0 2 15 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 --- Event Stage 1: Unknown ------------------------------------------------------------------------------------------------------------------------ --------------------------------------- Object Type Creations Destructions. Reports information only for process 0. --- Event Stage 0: Main Stage Container 5 5 Distributed Mesh 2 2 Index Set 11 11 IS L to G Mapping 1 1 Star Forest Graph 7 7 Discrete System 2 2 Weak Form 2 2 Vector 49 49 TSAdapt 1 1 TS 1 1 DMTS 1 1 SNES 1 1 DMSNES 3 3 SNESLineSearch 1 1 Krylov Solver 4 4 DMKSP interface 1 1 Matrix 4 4 Preconditioner 4 4 Viewer 2 1 --- Event Stage 1: Unknown ======================================================================================================================== Average time to get PetscTime(): 3.14e-08 #PETSc Option Table entries: -log_view -log_view_gpu_times #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with 64 bit PetscInt Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8 Configure options: PETSC_DIR=/home/4pf/repos/petsc PETSC_ARCH=arch-kokkos-cuda-no-tpls --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-cuda --with-debugging=0 --with-shared-libraries --prefix=/home/4pf/build/petsc/cuda-no-tpls/install --with-64-bit-indices --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --CUDAOPTFLAGS=-O3 --with-kokkos-dir=/home/4pf/build/kokkos/cuda/install --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/cuda-no-tpls/install ----------------------------------------- Libraries compiled on 2022-11-01 21:01:08 on PC0115427 Machine characteristics: Linux-5.15.0-52-generic-x86_64-with-glibc2.35 Using PETSc directory: /home/4pf/build/petsc/cuda-no-tpls/install Using PETSc arch: ----------------------------------------- Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector -fvisibility=hidden -O3 ----------------------------------------- Using include paths: -I/home/4pf/build/petsc/cuda-no-tpls/install/include -I/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/include -I/home/4pf/build/kokkos/cuda/install/include -I/usr/local/cuda-11.8/include ----------------------------------------- Using C linker: mpicc Using libraries: -Wl,-rpath,/home/4pf/build/petsc/cuda-no-tpls/install/lib -L/home/4pf/build/petsc/cuda-no-tpls/install/lib -lpetsc -Wl,-rpath,/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib -L/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib -Wl,-rpath,/home/4pf/build/kokkos/cuda/install/lib -L/home/4pf/build/kokkos/cuda/install/lib -Wl,-rpath,/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64/stubs -lkokkoskernels -lkokkoscontainers -lkokkoscore -llapack -lblas -lm -lcudart -lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda -lquadmath -lstdc++ -ldl ----------------------------------------- Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang <jun...@gm...<mailto:jun...@gm...>> Sent: Tuesday, November 15, 2022 13:03 To: Fackler, Philip <fac...@or...<mailto:fac...@or...>> Cc: xol...@li...<mailto:xol...@li...> <xol...@li...<mailto:xol...@li...>>; pet...@mc...<mailto:pet...@mc...> <pet...@mc...<mailto:pet...@mc...>>; Blondel, Sophie <sbl...@ut...<mailto:sbl...@ut...>>; Roth, Philip <ro...@or...<mailto:ro...@or...>> Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device. Can you paste -log_view result so I can see what functions are used? --Junchao Zhang On Tue, Nov 15, 2022 at 10:24 AM Fackler, Philip <fac...@or...<mailto:fac...@or...>> wrote: Yes, most (but not all) of our system test cases fail with the kokkos/cuda or cuda backends. All of them pass with the CPU-only kokkos backend. Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang <jun...@gm...<mailto:jun...@gm...>> Sent: Monday, November 14, 2022 19:34 To: Fackler, Philip <fac...@or...<mailto:fac...@or...>> Cc: xol...@li...<mailto:xol...@li...> <xol...@li...<mailto:xol...@li...>>; pet...@mc...<mailto:pet...@mc...> <pet...@mc...<mailto:pet...@mc...>>; Blondel, Sophie <sbl...@ut...<mailto:sbl...@ut...>>; Zhang, Junchao <jc...@mc...<mailto:jc...@mc...>>; Roth, Philip <ro...@or...<mailto:ro...@or...>> Subject: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device. Hi, Philip, Sorry to hear that. It seems you could run the same code on CPUs but not no GPUs (with either petsc/Kokkos backend or petsc/cuda backend, is it right? --Junchao Zhang On Mon, Nov 14, 2022 at 12:13 PM Fackler, Philip via petsc-users <pet...@mc...<mailto:pet...@mc...>> wrote: This is an issue I've brought up before (and discussed in-person with Richard). I wanted to bring it up again because I'm hitting the limits of what I know to do, and I need help figuring this out. The problem can be reproduced using Xolotl's "develop" branch built against a petsc build with kokkos and kokkos-kernels enabled. Then, either add the relevant kokkos options to the "petscArgs=" line in the system test parameter file(s), or just replace the system test parameter files with the ones from the "feature-petsc-kokkos" branch. See here the files that begin with "params_system_". Note that those files use the "kokkos" options, but the problem is similar using the corresponding cuda/cusparse options. I've already tried building kokkos-kernels with no TPLs and got slightly different results, but the same problem. Any help would be appreciated. Thanks, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory |
From: Mark A. <mf...@lb...> - 2022-12-02 12:49:44
|
Maybe Philip could narrow this down by using not GMRES/SOR solvers? Try GMRES/jacobi Try bicg/sor If one of those fixes the problem it might help or at least get Philip moving. Mark On Thu, Dec 1, 2022 at 5:06 PM Junchao Zhang <jun...@gm...> wrote: > Hi, Philip, > Sorry for the long delay. I could not get something useful from the > -log_view output. Since I have already built xolotl, could you give me > instructions on how to do a xolotl test to reproduce the divergence with > petsc GPU backends (but fine on CPU)? > Thank you. > --Junchao Zhang > > > On Wed, Nov 16, 2022 at 1:38 PM Fackler, Philip <fac...@or...> > wrote: > >> ------------------------------------------------------------------ PETSc >> Performance Summary: >> ------------------------------------------------------------------ >> >> Unknown Name on a named PC0115427 with 1 processor, by 4pf Wed Nov 16 >> 14:36:46 2022 >> Using Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a GIT Date: >> 2022-10-28 14:39:41 +0000 >> >> Max Max/Min Avg Total >> Time (sec): 6.023e+00 1.000 6.023e+00 >> Objects: 1.020e+02 1.000 1.020e+02 >> Flops: 1.080e+09 1.000 1.080e+09 1.080e+09 >> Flops/sec: 1.793e+08 1.000 1.793e+08 1.793e+08 >> MPI Msg Count: 0.000e+00 0.000 0.000e+00 0.000e+00 >> MPI Msg Len (bytes): 0.000e+00 0.000 0.000e+00 0.000e+00 >> MPI Reductions: 0.000e+00 0.000 >> >> Flop counting convention: 1 flop = 1 real number operation of type >> (multiply/divide/add/subtract) >> e.g., VecAXPY() for real vectors of length N >> --> 2N flops >> and VecAXPY() for complex vectors of length N >> --> 8N flops >> >> Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages >> --- -- Message Lengths -- -- Reductions -- >> Avg %Total Avg %Total Count >> %Total Avg %Total Count %Total >> 0: Main Stage: 6.0226e+00 100.0% 1.0799e+09 100.0% 0.000e+00 >> 0.0% 0.000e+00 0.0% 0.000e+00 0.0% >> >> >> ------------------------------------------------------------------------------------------------------------------------ >> See the 'Profiling' chapter of the users' manual for details on >> interpreting output. >> Phase summary info: >> Count: number of times phase was executed >> Time and Flop: Max - maximum over all processors >> Ratio - ratio of maximum to minimum over all processors >> Mess: number of messages sent >> AvgLen: average message length (bytes) >> Reduct: number of global reductions >> Global: entire computation >> Stage: stages of a computation. Set stages with PetscLogStagePush() >> and PetscLogStagePop(). >> %T - percent time in this phase %F - percent flop in this >> phase >> %M - percent messages in this phase %L - percent message >> lengths in this phase >> %R - percent reductions in this phase >> Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time >> over all processors) >> GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU >> time over all processors) >> CpuToGpu Count: total number of CPU to GPU copies per processor >> CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per >> processor) >> GpuToCpu Count: total number of GPU to CPU copies per processor >> GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per >> processor) >> GPU %F: percent flops on GPU in this event >> >> ------------------------------------------------------------------------------------------------------------------------ >> Event Count Time (sec) Flop >> --- Global --- --- Stage ---- Total >> GPU - CpuToGpu - - GpuToCpu - GPU >> >> Max Ratio Max Ratio Max Ratio Mess AvgLen >> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >> Mflop/s Count Size Count Size %F >> >> >> ------------------------------------------------------------------------------------------------------------------------ >> --------------------------------------- >> >> >> --- Event Stage 0: Main Stage >> >> BuildTwoSided 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> DMCreateMat 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> SFSetGraph 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> SFSetUp 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> SFPack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> SFUnpack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> VecDot 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> VecMDot 775 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> VecNorm 1728 1.0 nan nan 1.92e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> VecScale 1983 1.0 nan nan 6.24e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> VecCopy 780 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> VecSet 4955 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 2 0 0 0 0 2 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> VecAXPY 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> VecAYPX 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> VecAXPBYCZ 643 1.0 nan nan 1.79e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> VecWAXPY 502 1.0 nan nan 5.58e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> VecMAXPY 1159 1.0 nan nan 3.68e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> VecScatterBegin 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan >> -nan 2 5.14e-03 0 0.00e+00 0 >> >> VecScatterEnd 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> VecReduceArith 380 1.0 nan nan 4.23e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> VecReduceComm 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> VecNormalize 965 1.0 nan nan 1.61e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> TSStep 20 1.0 5.8699e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 >> 0.0e+00 97100 0 0 0 97100 0 0 0 184 >> -nan 2 5.14e-03 0 0.00e+00 54 >> >> TSFunctionEval 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 63 1 0 0 0 63 1 0 0 0 -nan >> -nan 1 3.36e-04 0 0.00e+00 100 >> >> TSJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 97 >> >> MatMult 1930 1.0 nan nan 4.46e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 1 41 0 0 0 1 41 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> MatMultTranspose 1 1.0 nan nan 3.44e+05 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> MatSolve 965 1.0 nan nan 5.04e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 1 5 0 0 0 1 5 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatSOR 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatLUFactorSym 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatLUFactorNum 190 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 1 11 0 0 0 1 11 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatScale 190 1.0 nan nan 3.26e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> MatAssemblyBegin 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatAssemblyEnd 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatGetRowIJ 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatCreateSubMats 380 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatGetOrdering 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatZeroEntries 379 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatSetPreallCOO 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatSetValuesCOO 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> KSPSetUp 760 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> KSPSolve 190 1.0 5.8052e-01 1.0 9.30e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 10 86 0 0 0 10 86 0 0 0 1602 >> -nan 1 4.80e-03 0 0.00e+00 46 >> >> KSPGMRESOrthog 775 1.0 nan nan 2.27e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 1 2 0 0 0 1 2 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> SNESSolve 71 1.0 5.7117e+00 1.0 1.07e+09 1.0 0.0e+00 0.0e+00 >> 0.0e+00 95 99 0 0 0 95 99 0 0 0 188 >> -nan 1 4.80e-03 0 0.00e+00 53 >> >> SNESSetUp 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> SNESFunctionEval 573 1.0 nan nan 2.23e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 60 2 0 0 0 60 2 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> SNESJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 97 >> >> SNESLineSearch 190 1.0 nan nan 1.05e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 53 10 0 0 0 53 10 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> PCSetUp 570 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 11 0 0 0 2 11 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> PCApply 965 1.0 nan nan 6.14e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 8 57 0 0 0 8 57 0 0 0 -nan >> -nan 1 4.80e-03 0 0.00e+00 19 >> >> KSPSolve_FS_0 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> KSPSolve_FS_1 965 1.0 nan nan 1.66e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 15 0 0 0 2 15 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> >> --- Event Stage 1: Unknown >> >> >> ------------------------------------------------------------------------------------------------------------------------ >> --------------------------------------- >> >> >> Object Type Creations Destructions. Reports information only >> for process 0. >> >> --- Event Stage 0: Main Stage >> >> Container 5 5 >> Distributed Mesh 2 2 >> Index Set 11 11 >> IS L to G Mapping 1 1 >> Star Forest Graph 7 7 >> Discrete System 2 2 >> Weak Form 2 2 >> Vector 49 49 >> TSAdapt 1 1 >> TS 1 1 >> DMTS 1 1 >> SNES 1 1 >> DMSNES 3 3 >> SNESLineSearch 1 1 >> Krylov Solver 4 4 >> DMKSP interface 1 1 >> Matrix 4 4 >> Preconditioner 4 4 >> Viewer 2 1 >> >> --- Event Stage 1: Unknown >> >> >> ======================================================================================================================== >> Average time to get PetscTime(): 3.14e-08 >> #PETSc Option Table entries: >> -log_view >> -log_view_gpu_times >> #End of PETSc Option Table entries >> Compiled without FORTRAN kernels >> Compiled with 64 bit PetscInt >> Compiled with full precision matrices (default) >> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 >> sizeof(PetscScalar) 8 sizeof(PetscInt) 8 >> Configure options: PETSC_DIR=/home/4pf/repos/petsc >> PETSC_ARCH=arch-kokkos-cuda-no-tpls --with-cc=mpicc --with-cxx=mpicxx >> --with-fc=0 --with-cuda --with-debugging=0 --with-shared-libraries >> --prefix=/home/4pf/build/petsc/cuda-no-tpls/install --with-64-bit-indices >> --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --CUDAOPTFLAGS=-O3 >> --with-kokkos-dir=/home/4pf/build/kokkos/cuda/install >> --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/cuda-no-tpls/install >> >> ----------------------------------------- >> Libraries compiled on 2022-11-01 21:01:08 on PC0115427 >> Machine characteristics: Linux-5.15.0-52-generic-x86_64-with-glibc2.35 >> Using PETSc directory: /home/4pf/build/petsc/cuda-no-tpls/install >> Using PETSc arch: >> ----------------------------------------- >> >> Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas >> -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector >> -fvisibility=hidden -O3 >> ----------------------------------------- >> >> Using include paths: -I/home/4pf/build/petsc/cuda-no-tpls/install/include >> -I/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/include >> -I/home/4pf/build/kokkos/cuda/install/include -I/usr/local/cuda-11.8/include >> ----------------------------------------- >> >> Using C linker: mpicc >> Using libraries: >> -Wl,-rpath,/home/4pf/build/petsc/cuda-no-tpls/install/lib >> -L/home/4pf/build/petsc/cuda-no-tpls/install/lib -lpetsc >> -Wl,-rpath,/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib >> -L/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib >> -Wl,-rpath,/home/4pf/build/kokkos/cuda/install/lib >> -L/home/4pf/build/kokkos/cuda/install/lib >> -Wl,-rpath,/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64 >> -L/usr/local/cuda-11.8/lib64/stubs -lkokkoskernels -lkokkoscontainers >> -lkokkoscore -llapack -lblas -lm -lcudart -lnvToolsExt -lcufft -lcublas >> -lcusparse -lcusolver -lcurand -lcuda -lquadmath -lstdc++ -ldl >> ----------------------------------------- >> >> >> >> *Philip Fackler * >> Research Software Engineer, Application Engineering Group >> Advanced Computing Systems Research Section >> Computer Science and Mathematics Division >> *Oak Ridge National Laboratory* >> ------------------------------ >> *From:* Junchao Zhang <jun...@gm...> >> *Sent:* Tuesday, November 15, 2022 13:03 >> *To:* Fackler, Philip <fac...@or...> >> *Cc:* xol...@li... < >> xol...@li...>; pet...@mc... < >> pet...@mc...>; Blondel, Sophie <sbl...@ut...>; Roth, >> Philip <ro...@or...> >> *Subject:* Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and >> Vec diverging when running on CUDA device. >> >> Can you paste -log_view result so I can see what functions are used? >> >> --Junchao Zhang >> >> >> On Tue, Nov 15, 2022 at 10:24 AM Fackler, Philip <fac...@or...> >> wrote: >> >> Yes, most (but not all) of our system test cases fail with the >> kokkos/cuda or cuda backends. All of them pass with the CPU-only kokkos >> backend. >> >> >> *Philip Fackler * >> Research Software Engineer, Application Engineering Group >> Advanced Computing Systems Research Section >> Computer Science and Mathematics Division >> *Oak Ridge National Laboratory* >> ------------------------------ >> *From:* Junchao Zhang <jun...@gm...> >> *Sent:* Monday, November 14, 2022 19:34 >> *To:* Fackler, Philip <fac...@or...> >> *Cc:* xol...@li... < >> xol...@li...>; pet...@mc... < >> pet...@mc...>; Blondel, Sophie <sbl...@ut...>; Zhang, >> Junchao <jc...@mc...>; Roth, Philip <ro...@or...> >> *Subject:* [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec >> diverging when running on CUDA device. >> >> Hi, Philip, >> Sorry to hear that. It seems you could run the same code on CPUs but >> not no GPUs (with either petsc/Kokkos backend or petsc/cuda backend, is it >> right? >> >> --Junchao Zhang >> >> >> On Mon, Nov 14, 2022 at 12:13 PM Fackler, Philip via petsc-users < >> pet...@mc...> wrote: >> >> This is an issue I've brought up before (and discussed in-person with >> Richard). I wanted to bring it up again because I'm hitting the limits of >> what I know to do, and I need help figuring this out. >> >> The problem can be reproduced using Xolotl's "develop" branch built >> against a petsc build with kokkos and kokkos-kernels enabled. Then, either >> add the relevant kokkos options to the "petscArgs=" line in the system test >> parameter file(s), or just replace the system test parameter files with the >> ones from the "feature-petsc-kokkos" branch. See here the files that >> begin with "params_system_". >> >> Note that those files use the "kokkos" options, but the problem is >> similar using the corresponding cuda/cusparse options. I've already tried >> building kokkos-kernels with no TPLs and got slightly different results, >> but the same problem. >> >> Any help would be appreciated. >> >> Thanks, >> >> >> *Philip Fackler * >> Research Software Engineer, Application Engineering Group >> Advanced Computing Systems Research Section >> Computer Science and Mathematics Division >> *Oak Ridge National Laboratory* >> >> |
From: Junchao Z. <jun...@gm...> - 2022-12-01 22:06:11
|
Hi, Philip, Sorry for the long delay. I could not get something useful from the -log_view output. Since I have already built xolotl, could you give me instructions on how to do a xolotl test to reproduce the divergence with petsc GPU backends (but fine on CPU)? Thank you. --Junchao Zhang On Wed, Nov 16, 2022 at 1:38 PM Fackler, Philip <fac...@or...> wrote: > ------------------------------------------------------------------ PETSc > Performance Summary: > ------------------------------------------------------------------ > > Unknown Name on a named PC0115427 with 1 processor, by 4pf Wed Nov 16 > 14:36:46 2022 > Using Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a GIT Date: > 2022-10-28 14:39:41 +0000 > > Max Max/Min Avg Total > Time (sec): 6.023e+00 1.000 6.023e+00 > Objects: 1.020e+02 1.000 1.020e+02 > Flops: 1.080e+09 1.000 1.080e+09 1.080e+09 > Flops/sec: 1.793e+08 1.000 1.793e+08 1.793e+08 > MPI Msg Count: 0.000e+00 0.000 0.000e+00 0.000e+00 > MPI Msg Len (bytes): 0.000e+00 0.000 0.000e+00 0.000e+00 > MPI Reductions: 0.000e+00 0.000 > > Flop counting convention: 1 flop = 1 real number operation of type > (multiply/divide/add/subtract) > e.g., VecAXPY() for real vectors of length N > --> 2N flops > and VecAXPY() for complex vectors of length N > --> 8N flops > > Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages > --- -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total Count > %Total Avg %Total Count %Total > 0: Main Stage: 6.0226e+00 100.0% 1.0799e+09 100.0% 0.000e+00 > 0.0% 0.000e+00 0.0% 0.000e+00 0.0% > > > ------------------------------------------------------------------------------------------------------------------------ > See the 'Profiling' chapter of the users' manual for details on > interpreting output. > Phase summary info: > Count: number of times phase was executed > Time and Flop: Max - maximum over all processors > Ratio - ratio of maximum to minimum over all processors > Mess: number of messages sent > AvgLen: average message length (bytes) > Reduct: number of global reductions > Global: entire computation > Stage: stages of a computation. Set stages with PetscLogStagePush() and > PetscLogStagePop(). > %T - percent time in this phase %F - percent flop in this > phase > %M - percent messages in this phase %L - percent message lengths > in this phase > %R - percent reductions in this phase > Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over > all processors) > GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU > time over all processors) > CpuToGpu Count: total number of CPU to GPU copies per processor > CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per > processor) > GpuToCpu Count: total number of GPU to CPU copies per processor > GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per > processor) > GPU %F: percent flops on GPU in this event > > ------------------------------------------------------------------------------------------------------------------------ > Event Count Time (sec) Flop > --- Global --- --- Stage ---- Total > GPU - CpuToGpu - - GpuToCpu - GPU > > Max Ratio Max Ratio Max Ratio Mess AvgLen > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > Mflop/s Count Size Count Size %F > > > ------------------------------------------------------------------------------------------------------------------------ > --------------------------------------- > > > --- Event Stage 0: Main Stage > > BuildTwoSided 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > DMCreateMat 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFSetGraph 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFSetUp 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFPack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFUnpack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecDot 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecMDot 775 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecNorm 1728 1.0 nan nan 1.92e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecScale 1983 1.0 nan nan 6.24e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecCopy 780 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecSet 4955 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 2 0 0 0 0 2 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecAXPY 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecAYPX 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecAXPBYCZ 643 1.0 nan nan 1.79e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecWAXPY 502 1.0 nan nan 5.58e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecMAXPY 1159 1.0 nan nan 3.68e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecScatterBegin 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan > -nan 2 5.14e-03 0 0.00e+00 0 > > VecScatterEnd 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecReduceArith 380 1.0 nan nan 4.23e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecReduceComm 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecNormalize 965 1.0 nan nan 1.61e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > TSStep 20 1.0 5.8699e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 97100 0 0 0 97100 0 0 0 184 > -nan 2 5.14e-03 0 0.00e+00 54 > > TSFunctionEval 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 63 1 0 0 0 63 1 0 0 0 -nan > -nan 1 3.36e-04 0 0.00e+00 100 > > TSJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 97 > > MatMult 1930 1.0 nan nan 4.46e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 41 0 0 0 1 41 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > MatMultTranspose 1 1.0 nan nan 3.44e+05 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > MatSolve 965 1.0 nan nan 5.04e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 5 0 0 0 1 5 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatSOR 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatLUFactorSym 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatLUFactorNum 190 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 11 0 0 0 1 11 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatScale 190 1.0 nan nan 3.26e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > MatAssemblyBegin 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatAssemblyEnd 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatGetRowIJ 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatCreateSubMats 380 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatGetOrdering 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatZeroEntries 379 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatSetPreallCOO 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatSetValuesCOO 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > KSPSetUp 760 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > KSPSolve 190 1.0 5.8052e-01 1.0 9.30e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 10 86 0 0 0 10 86 0 0 0 1602 > -nan 1 4.80e-03 0 0.00e+00 46 > > KSPGMRESOrthog 775 1.0 nan nan 2.27e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 2 0 0 0 1 2 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > SNESSolve 71 1.0 5.7117e+00 1.0 1.07e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 95 99 0 0 0 95 99 0 0 0 188 > -nan 1 4.80e-03 0 0.00e+00 53 > > SNESSetUp 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SNESFunctionEval 573 1.0 nan nan 2.23e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 60 2 0 0 0 60 2 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > SNESJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 97 > > SNESLineSearch 190 1.0 nan nan 1.05e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 53 10 0 0 0 53 10 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > PCSetUp 570 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 11 0 0 0 2 11 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > PCApply 965 1.0 nan nan 6.14e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 8 57 0 0 0 8 57 0 0 0 -nan > -nan 1 4.80e-03 0 0.00e+00 19 > > KSPSolve_FS_0 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > KSPSolve_FS_1 965 1.0 nan nan 1.66e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 15 0 0 0 2 15 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > > --- Event Stage 1: Unknown > > > ------------------------------------------------------------------------------------------------------------------------ > --------------------------------------- > > > Object Type Creations Destructions. Reports information only > for process 0. > > --- Event Stage 0: Main Stage > > Container 5 5 > Distributed Mesh 2 2 > Index Set 11 11 > IS L to G Mapping 1 1 > Star Forest Graph 7 7 > Discrete System 2 2 > Weak Form 2 2 > Vector 49 49 > TSAdapt 1 1 > TS 1 1 > DMTS 1 1 > SNES 1 1 > DMSNES 3 3 > SNESLineSearch 1 1 > Krylov Solver 4 4 > DMKSP interface 1 1 > Matrix 4 4 > Preconditioner 4 4 > Viewer 2 1 > > --- Event Stage 1: Unknown > > > ======================================================================================================================== > Average time to get PetscTime(): 3.14e-08 > #PETSc Option Table entries: > -log_view > -log_view_gpu_times > #End of PETSc Option Table entries > Compiled without FORTRAN kernels > Compiled with 64 bit PetscInt > Compiled with full precision matrices (default) > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 > sizeof(PetscScalar) 8 sizeof(PetscInt) 8 > Configure options: PETSC_DIR=/home/4pf/repos/petsc > PETSC_ARCH=arch-kokkos-cuda-no-tpls --with-cc=mpicc --with-cxx=mpicxx > --with-fc=0 --with-cuda --with-debugging=0 --with-shared-libraries > --prefix=/home/4pf/build/petsc/cuda-no-tpls/install --with-64-bit-indices > --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --CUDAOPTFLAGS=-O3 > --with-kokkos-dir=/home/4pf/build/kokkos/cuda/install > --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/cuda-no-tpls/install > > ----------------------------------------- > Libraries compiled on 2022-11-01 21:01:08 on PC0115427 > Machine characteristics: Linux-5.15.0-52-generic-x86_64-with-glibc2.35 > Using PETSc directory: /home/4pf/build/petsc/cuda-no-tpls/install > Using PETSc arch: > ----------------------------------------- > > Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas > -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector > -fvisibility=hidden -O3 > ----------------------------------------- > > Using include paths: -I/home/4pf/build/petsc/cuda-no-tpls/install/include > -I/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/include > -I/home/4pf/build/kokkos/cuda/install/include -I/usr/local/cuda-11.8/include > ----------------------------------------- > > Using C linker: mpicc > Using libraries: -Wl,-rpath,/home/4pf/build/petsc/cuda-no-tpls/install/lib > -L/home/4pf/build/petsc/cuda-no-tpls/install/lib -lpetsc > -Wl,-rpath,/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib > -L/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib > -Wl,-rpath,/home/4pf/build/kokkos/cuda/install/lib > -L/home/4pf/build/kokkos/cuda/install/lib > -Wl,-rpath,/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64 > -L/usr/local/cuda-11.8/lib64/stubs -lkokkoskernels -lkokkoscontainers > -lkokkoscore -llapack -lblas -lm -lcudart -lnvToolsExt -lcufft -lcublas > -lcusparse -lcusolver -lcurand -lcuda -lquadmath -lstdc++ -ldl > ----------------------------------------- > > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > ------------------------------ > *From:* Junchao Zhang <jun...@gm...> > *Sent:* Tuesday, November 15, 2022 13:03 > *To:* Fackler, Philip <fac...@or...> > *Cc:* xol...@li... < > xol...@li...>; pet...@mc... < > pet...@mc...>; Blondel, Sophie <sbl...@ut...>; Roth, > Philip <ro...@or...> > *Subject:* Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and > Vec diverging when running on CUDA device. > > Can you paste -log_view result so I can see what functions are used? > > --Junchao Zhang > > > On Tue, Nov 15, 2022 at 10:24 AM Fackler, Philip <fac...@or...> > wrote: > > Yes, most (but not all) of our system test cases fail with the kokkos/cuda > or cuda backends. All of them pass with the CPU-only kokkos backend. > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > ------------------------------ > *From:* Junchao Zhang <jun...@gm...> > *Sent:* Monday, November 14, 2022 19:34 > *To:* Fackler, Philip <fac...@or...> > *Cc:* xol...@li... < > xol...@li...>; pet...@mc... < > pet...@mc...>; Blondel, Sophie <sbl...@ut...>; Zhang, > Junchao <jc...@mc...>; Roth, Philip <ro...@or...> > *Subject:* [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec > diverging when running on CUDA device. > > Hi, Philip, > Sorry to hear that. It seems you could run the same code on CPUs but > not no GPUs (with either petsc/Kokkos backend or petsc/cuda backend, is it > right? > > --Junchao Zhang > > > On Mon, Nov 14, 2022 at 12:13 PM Fackler, Philip via petsc-users < > pet...@mc...> wrote: > > This is an issue I've brought up before (and discussed in-person with > Richard). I wanted to bring it up again because I'm hitting the limits of > what I know to do, and I need help figuring this out. > > The problem can be reproduced using Xolotl's "develop" branch built > against a petsc build with kokkos and kokkos-kernels enabled. Then, either > add the relevant kokkos options to the "petscArgs=" line in the system test > parameter file(s), or just replace the system test parameter files with the > ones from the "feature-petsc-kokkos" branch. See here the files that > begin with "params_system_". > > Note that those files use the "kokkos" options, but the problem is similar > using the corresponding cuda/cusparse options. I've already tried building > kokkos-kernels with no TPLs and got slightly different results, but the > same problem. > > Any help would be appreciated. > > Thanks, > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > > |
From: Fackler, P. <fac...@or...> - 2022-11-16 19:38:41
|
------------------------------------------------------------------ PETSc Performance Summary: ------------------------------------------------------------------ Unknown Name on a named PC0115427 with 1 processor, by 4pf Wed Nov 16 14:36:46 2022 Using Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a GIT Date: 2022-10-28 14:39:41 +0000 Max Max/Min Avg Total Time (sec): 6.023e+00 1.000 6.023e+00 Objects: 1.020e+02 1.000 1.020e+02 Flops: 1.080e+09 1.000 1.080e+09 1.080e+09 Flops/sec: 1.793e+08 1.000 1.793e+08 1.793e+08 MPI Msg Count: 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Msg Len (bytes): 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Reductions: 0.000e+00 0.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 6.0226e+00 100.0% 1.0799e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors) CpuToGpu Count: total number of CPU to GPU copies per processor CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor) GpuToCpu Count: total number of GPU to CPU copies per processor GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor) GPU %F: percent flops on GPU in this event ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F ------------------------------------------------------------------------------------------------------------------------ --------------------------------------- --- Event Stage 0: Main Stage BuildTwoSided 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 DMCreateMat 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFSetGraph 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFSetUp 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFPack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SFUnpack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecDot 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecMDot 775 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecNorm 1728 1.0 nan nan 1.92e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecScale 1983 1.0 nan nan 6.24e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecCopy 780 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecSet 4955 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecAXPY 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecAYPX 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecAXPBYCZ 643 1.0 nan nan 1.79e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecWAXPY 502 1.0 nan nan 5.58e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecMAXPY 1159 1.0 nan nan 3.68e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecScatterBegin 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan -nan 2 5.14e-03 0 0.00e+00 0 VecScatterEnd 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecReduceArith 380 1.0 nan nan 4.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 VecReduceComm 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 VecNormalize 965 1.0 nan nan 1.61e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 TSStep 20 1.0 5.8699e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 97100 0 0 0 97100 0 0 0 184 -nan 2 5.14e-03 0 0.00e+00 54 TSFunctionEval 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00 63 1 0 0 0 63 1 0 0 0 -nan -nan 1 3.36e-04 0 0.00e+00 100 TSJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 97 MatMult 1930 1.0 nan nan 4.46e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 41 0 0 0 1 41 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 MatMultTranspose 1 1.0 nan nan 3.44e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 MatSolve 965 1.0 nan nan 5.04e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 5 0 0 0 1 5 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatSOR 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatLUFactorSym 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatLUFactorNum 190 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 11 0 0 0 1 11 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatScale 190 1.0 nan nan 3.26e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 MatAssemblyBegin 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatAssemblyEnd 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatGetRowIJ 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatCreateSubMats 380 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatGetOrdering 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatZeroEntries 379 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatSetPreallCOO 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 MatSetValuesCOO 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 KSPSetUp 760 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 KSPSolve 190 1.0 5.8052e-01 1.0 9.30e+08 1.0 0.0e+00 0.0e+00 0.0e+00 10 86 0 0 0 10 86 0 0 0 1602 -nan 1 4.80e-03 0 0.00e+00 46 KSPGMRESOrthog 775 1.0 nan nan 2.27e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 SNESSolve 71 1.0 5.7117e+00 1.0 1.07e+09 1.0 0.0e+00 0.0e+00 0.0e+00 95 99 0 0 0 95 99 0 0 0 188 -nan 1 4.80e-03 0 0.00e+00 53 SNESSetUp 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 SNESFunctionEval 573 1.0 nan nan 2.23e+07 1.0 0.0e+00 0.0e+00 0.0e+00 60 2 0 0 0 60 2 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 SNESJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 97 SNESLineSearch 190 1.0 nan nan 1.05e+08 1.0 0.0e+00 0.0e+00 0.0e+00 53 10 0 0 0 53 10 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 100 PCSetUp 570 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 11 0 0 0 2 11 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 PCApply 965 1.0 nan nan 6.14e+08 1.0 0.0e+00 0.0e+00 0.0e+00 8 57 0 0 0 8 57 0 0 0 -nan -nan 1 4.80e-03 0 0.00e+00 19 KSPSolve_FS_0 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 KSPSolve_FS_1 965 1.0 nan nan 1.66e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 15 0 0 0 2 15 0 0 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 --- Event Stage 1: Unknown ------------------------------------------------------------------------------------------------------------------------ --------------------------------------- Object Type Creations Destructions. Reports information only for process 0. --- Event Stage 0: Main Stage Container 5 5 Distributed Mesh 2 2 Index Set 11 11 IS L to G Mapping 1 1 Star Forest Graph 7 7 Discrete System 2 2 Weak Form 2 2 Vector 49 49 TSAdapt 1 1 TS 1 1 DMTS 1 1 SNES 1 1 DMSNES 3 3 SNESLineSearch 1 1 Krylov Solver 4 4 DMKSP interface 1 1 Matrix 4 4 Preconditioner 4 4 Viewer 2 1 --- Event Stage 1: Unknown ======================================================================================================================== Average time to get PetscTime(): 3.14e-08 #PETSc Option Table entries: -log_view -log_view_gpu_times #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with 64 bit PetscInt Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8 Configure options: PETSC_DIR=/home/4pf/repos/petsc PETSC_ARCH=arch-kokkos-cuda-no-tpls --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-cuda --with-debugging=0 --with-shared-libraries --prefix=/home/4pf/build/petsc/cuda-no-tpls/install --with-64-bit-indices --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --CUDAOPTFLAGS=-O3 --with-kokkos-dir=/home/4pf/build/kokkos/cuda/install --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/cuda-no-tpls/install ----------------------------------------- Libraries compiled on 2022-11-01 21:01:08 on PC0115427 Machine characteristics: Linux-5.15.0-52-generic-x86_64-with-glibc2.35 Using PETSc directory: /home/4pf/build/petsc/cuda-no-tpls/install Using PETSc arch: ----------------------------------------- Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector -fvisibility=hidden -O3 ----------------------------------------- Using include paths: -I/home/4pf/build/petsc/cuda-no-tpls/install/include -I/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/include -I/home/4pf/build/kokkos/cuda/install/include -I/usr/local/cuda-11.8/include ----------------------------------------- Using C linker: mpicc Using libraries: -Wl,-rpath,/home/4pf/build/petsc/cuda-no-tpls/install/lib -L/home/4pf/build/petsc/cuda-no-tpls/install/lib -lpetsc -Wl,-rpath,/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib -L/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib -Wl,-rpath,/home/4pf/build/kokkos/cuda/install/lib -L/home/4pf/build/kokkos/cuda/install/lib -Wl,-rpath,/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64/stubs -lkokkoskernels -lkokkoscontainers -lkokkoscore -llapack -lblas -lm -lcudart -lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda -lquadmath -lstdc++ -ldl ----------------------------------------- Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang <jun...@gm...> Sent: Tuesday, November 15, 2022 13:03 To: Fackler, Philip <fac...@or...> Cc: xol...@li... <xol...@li...>; pet...@mc... <pet...@mc...>; Blondel, Sophie <sbl...@ut...>; Roth, Philip <ro...@or...> Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device. Can you paste -log_view result so I can see what functions are used? --Junchao Zhang On Tue, Nov 15, 2022 at 10:24 AM Fackler, Philip <fac...@or...<mailto:fac...@or...>> wrote: Yes, most (but not all) of our system test cases fail with the kokkos/cuda or cuda backends. All of them pass with the CPU-only kokkos backend. Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang <jun...@gm...<mailto:jun...@gm...>> Sent: Monday, November 14, 2022 19:34 To: Fackler, Philip <fac...@or...<mailto:fac...@or...>> Cc: xol...@li...<mailto:xol...@li...> <xol...@li...<mailto:xol...@li...>>; pet...@mc...<mailto:pet...@mc...> <pet...@mc...<mailto:pet...@mc...>>; Blondel, Sophie <sbl...@ut...<mailto:sbl...@ut...>>; Zhang, Junchao <jc...@mc...<mailto:jc...@mc...>>; Roth, Philip <ro...@or...<mailto:ro...@or...>> Subject: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device. Hi, Philip, Sorry to hear that. It seems you could run the same code on CPUs but not no GPUs (with either petsc/Kokkos backend or petsc/cuda backend, is it right? --Junchao Zhang On Mon, Nov 14, 2022 at 12:13 PM Fackler, Philip via petsc-users <pet...@mc...<mailto:pet...@mc...>> wrote: This is an issue I've brought up before (and discussed in-person with Richard). I wanted to bring it up again because I'm hitting the limits of what I know to do, and I need help figuring this out. The problem can be reproduced using Xolotl's "develop" branch built against a petsc build with kokkos and kokkos-kernels enabled. Then, either add the relevant kokkos options to the "petscArgs=" line in the system test parameter file(s), or just replace the system test parameter files with the ones from the "feature-petsc-kokkos" branch. See here the files that begin with "params_system_". Note that those files use the "kokkos" options, but the problem is similar using the corresponding cuda/cusparse options. I've already tried building kokkos-kernels with no TPLs and got slightly different results, but the same problem. Any help would be appreciated. Thanks, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory |
From: Junchao Z. <jun...@gm...> - 2022-11-15 18:04:04
|
Can you paste -log_view result so I can see what functions are used? --Junchao Zhang On Tue, Nov 15, 2022 at 10:24 AM Fackler, Philip <fac...@or...> wrote: > Yes, most (but not all) of our system test cases fail with the kokkos/cuda > or cuda backends. All of them pass with the CPU-only kokkos backend. > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > ------------------------------ > *From:* Junchao Zhang <jun...@gm...> > *Sent:* Monday, November 14, 2022 19:34 > *To:* Fackler, Philip <fac...@or...> > *Cc:* xol...@li... < > xol...@li...>; pet...@mc... < > pet...@mc...>; Blondel, Sophie <sbl...@ut...>; Zhang, > Junchao <jc...@mc...>; Roth, Philip <ro...@or...> > *Subject:* [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec > diverging when running on CUDA device. > > Hi, Philip, > Sorry to hear that. It seems you could run the same code on CPUs but > not no GPUs (with either petsc/Kokkos backend or petsc/cuda backend, is it > right? > > --Junchao Zhang > > > On Mon, Nov 14, 2022 at 12:13 PM Fackler, Philip via petsc-users < > pet...@mc...> wrote: > > This is an issue I've brought up before (and discussed in-person with > Richard). I wanted to bring it up again because I'm hitting the limits of > what I know to do, and I need help figuring this out. > > The problem can be reproduced using Xolotl's "develop" branch built > against a petsc build with kokkos and kokkos-kernels enabled. Then, either > add the relevant kokkos options to the "petscArgs=" line in the system test > parameter file(s), or just replace the system test parameter files with the > ones from the "feature-petsc-kokkos" branch. See here the files that > begin with "params_system_". > > Note that those files use the "kokkos" options, but the problem is similar > using the corresponding cuda/cusparse options. I've already tried building > kokkos-kernels with no TPLs and got slightly different results, but the > same problem. > > Any help would be appreciated. > > Thanks, > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > > |
From: Fackler, P. <fac...@or...> - 2022-11-15 16:24:45
|
Yes, most (but not all) of our system test cases fail with the kokkos/cuda or cuda backends. All of them pass with the CPU-only kokkos backend. Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Junchao Zhang <jun...@gm...> Sent: Monday, November 14, 2022 19:34 To: Fackler, Philip <fac...@or...> Cc: xol...@li... <xol...@li...>; pet...@mc... <pet...@mc...>; Blondel, Sophie <sbl...@ut...>; Zhang, Junchao <jc...@mc...>; Roth, Philip <ro...@or...> Subject: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device. Hi, Philip, Sorry to hear that. It seems you could run the same code on CPUs but not no GPUs (with either petsc/Kokkos backend or petsc/cuda backend, is it right? --Junchao Zhang On Mon, Nov 14, 2022 at 12:13 PM Fackler, Philip via petsc-users <pet...@mc...<mailto:pet...@mc...>> wrote: This is an issue I've brought up before (and discussed in-person with Richard). I wanted to bring it up again because I'm hitting the limits of what I know to do, and I need help figuring this out. The problem can be reproduced using Xolotl's "develop" branch built against a petsc build with kokkos and kokkos-kernels enabled. Then, either add the relevant kokkos options to the "petscArgs=" line in the system test parameter file(s), or just replace the system test parameter files with the ones from the "feature-petsc-kokkos" branch. See here the files that begin with "params_system_". Note that those files use the "kokkos" options, but the problem is similar using the corresponding cuda/cusparse options. I've already tried building kokkos-kernels with no TPLs and got slightly different results, but the same problem. Any help would be appreciated. Thanks, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory |
From: Zhang, J. <jc...@mc...> - 2022-11-15 05:14:50
|
Hi, Philip, Call you tell me instructions to build Xolotl to reproduce the error? -- Junchao Zhang ________________________________ From: Fackler, Philip <fac...@or...> Sent: Monday, November 14, 2022 12:24 PM To: xol...@li... <xol...@li...>; pet...@mc... <pet...@mc...> Cc: Mills, Richard Tran <rt...@an...>; Zhang, Junchao <jc...@mc...>; Blondel, Sophie <sbl...@ut...> Subject: Using multiple MPI ranks with COO interface crashes in some cases In Xolotl's "feature-petsc-kokkos" branch, I have moved our code to use the COO interface for preallocating and setting values in the Jacobian matrix. I have found that with some of our test cases, using more than one MPI rank results in a crash. Way down in the preconditioner code in petsc a Mat gets computed that has "null" for the "productsymbolic" member of its "ops". It's pretty far removed from where we compute the Jacobian entries, so I haven't been able (so far) to track it back to an error in my code. I'd appreciate some help with this from someone who is more familiar with the petsc guts so we can figure out what I'm doing wrong. (I'm assuming it's a bug in Xolotl.) Note that this is using the kokkos backend for Mat and Vec in petsc, but with a serial-only build of kokkos and kokkos-kernels. So, it's a CPU-only multiple MPI rank run. Here's a paste of the error output showing the relevant parts of the call stack: [ERROR] [0]PETSC ERROR: [ERROR] --------------------- Error Message -------------------------------------------------------------- [ERROR] [1]PETSC ERROR: [ERROR] --------------------- Error Message -------------------------------------------------------------- [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] No support for this operation for this object type [ERROR] [1]PETSC ERROR: [ERROR] No support for this operation for this object type [ERROR] [0]PETSC ERROR: [ERROR] No method productsymbolic for Mat of type (null) [ERROR] No method productsymbolic for Mat of type (null) [ERROR] [0]PETSC ERROR: [ERROR] [1]PETSC ERROR: [ERROR] See https://petsc.org/release/faq/ for trouble shooting. [ERROR] See https://petsc.org/release/faq/ for trouble shooting. [ERROR] [0]PETSC ERROR: [ERROR] [1]PETSC ERROR: [ERROR] Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a GIT Date: 2022-10-28 14:39:41 +0000 [ERROR] Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a GIT Date: 2022-10-28 14:39:41 +0000 [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] Unknown Name on a named PC0115427 by 4pf Mon Nov 14 13:22:01 2022 [ERROR] Unknown Name on a named PC0115427 by 4pf Mon Nov 14 13:22:01 2022 [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] Configure options PETSC_DIR=/home/4pf/repos/petsc PETSC_ARCH=arch-kokkos-serial-debug --with-debugging=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-cudac=0 --prefix=/home/4pf/build/petsc/serial-debug/install --with-64-bit-indices --with-shared-libraries --with-kokkos-dir=/home/4pf/build/kokkos/serial/install --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/serial/install [ERROR] Configure options PETSC_DIR=/home/4pf/repos/petsc PETSC_ARCH=arch-kokkos-serial-debug --with-debugging=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-cudac=0 --prefix=/home/4pf/build/petsc/serial-debug/install --with-64-bit-indices --with-shared-libraries --with-kokkos-dir=/home/4pf/build/kokkos/serial/install --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/serial/install [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] #1 MatProductSymbolic_MPIAIJKokkos_AB() at /home/4pf/repos/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:918 [ERROR] #1 MatProductSymbolic_MPIAIJKokkos_AB() at /home/4pf/repos/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:918 [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] #2 MatProductSymbolic_MPIAIJKokkos() at /home/4pf/repos/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:1138 [ERROR] #2 MatProductSymbolic_MPIAIJKokkos() at /home/4pf/repos/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:1138 [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] #3 MatProductSymbolic() at /home/4pf/repos/petsc/src/mat/interface/matproduct.c:793 [ERROR] #3 MatProductSymbolic() at /home/4pf/repos/petsc/src/mat/interface/matproduct.c:793 [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] #4 MatProduct_Private() at /home/4pf/repos/petsc/src/mat/interface/matrix.c:9820 [ERROR] #4 MatProduct_Private() at /home/4pf/repos/petsc/src/mat/interface/matrix.c:9820 [ERROR] [0]PETSC ERROR: [ERROR] [1]PETSC ERROR: [ERROR] #5 MatMatMult() at /home/4pf/repos/petsc/src/mat/interface/matrix.c:9897 [ERROR] #5 MatMatMult() at /home/4pf/repos/petsc/src/mat/interface/matrix.c:9897 [ERROR] [0]PETSC ERROR: [ERROR] [1]PETSC ERROR: [ERROR] #6 PCGAMGOptProlongator_AGG() at /home/4pf/repos/petsc/src/ksp/pc/impls/gamg/agg.c:769 [ERROR] #6 PCGAMGOptProlongator_AGG() at /home/4pf/repos/petsc/src/ksp/pc/impls/gamg/agg.c:769 [ERROR] [0]PETSC ERROR: [ERROR] [1]PETSC ERROR: [ERROR] #7 PCSetUp_GAMG() at /home/4pf/repos/petsc/src/ksp/pc/impls/gamg/gamg.c:639 [ERROR] #7 PCSetUp_GAMG() at /home/4pf/repos/petsc/src/ksp/pc/impls/gamg/gamg.c:639 [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] #8 PCSetUp() at /home/4pf/repos/petsc/src/ksp/pc/interface/precon.c:994 [ERROR] #8 PCSetUp() at /home/4pf/repos/petsc/src/ksp/pc/interface/precon.c:994 [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] #9 KSPSetUp() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:406 [ERROR] #9 KSPSetUp() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:406 [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] #10 KSPSolve_Private() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:825 [ERROR] #10 KSPSolve_Private() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:825 [ERROR] [0]PETSC ERROR: [ERROR] [1]PETSC ERROR: [ERROR] #11 KSPSolve() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:1071 [ERROR] #11 KSPSolve() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:1071 [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] #12 PCApply_FieldSplit() at /home/4pf/repos/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c:1246 [ERROR] #12 PCApply_FieldSplit() at /home/4pf/repos/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c:1246 [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] #13 PCApply() at /home/4pf/repos/petsc/src/ksp/pc/interface/precon.c:441 [ERROR] #13 PCApply() at /home/4pf/repos/petsc/src/ksp/pc/interface/precon.c:441 [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] #14 KSP_PCApply() at /home/4pf/repos/petsc/include/petsc/private/kspimpl.h:380 [ERROR] #14 KSP_PCApply() at /home/4pf/repos/petsc/include/petsc/private/kspimpl.h:380 [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] #15 KSPFGMRESCycle() at /home/4pf/repos/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c:152 [ERROR] #15 KSPFGMRESCycle() at /home/4pf/repos/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c:152 [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] #16 KSPSolve_FGMRES() at /home/4pf/repos/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c:273 [ERROR] #16 KSPSolve_FGMRES() at /home/4pf/repos/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c:273 [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] #17 KSPSolve_Private() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:899 [ERROR] #17 KSPSolve_Private() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:899 [ERROR] [0]PETSC ERROR: [ERROR] [1]PETSC ERROR: [ERROR] #18 KSPSolve() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:1071 [ERROR] #18 KSPSolve() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:1071 [ERROR] [0]PETSC ERROR: [ERROR] [1]PETSC ERROR: [ERROR] #19 SNESSolve_NEWTONLS() at /home/4pf/repos/petsc/src/snes/impls/ls/ls.c:210 [ERROR] #19 SNESSolve_NEWTONLS() at /home/4pf/repos/petsc/src/snes/impls/ls/ls.c:210 [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] #20 SNESSolve() at /home/4pf/repos/petsc/src/snes/interface/snes.c:4689 [ERROR] #20 SNESSolve() at /home/4pf/repos/petsc/src/snes/interface/snes.c:4689 [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] #21 TSStep_ARKIMEX() at /home/4pf/repos/petsc/src/ts/impls/arkimex/arkimex.c:791 [ERROR] #21 TSStep_ARKIMEX() at /home/4pf/repos/petsc/src/ts/impls/arkimex/arkimex.c:791 [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] #22 TSStep() at /home/4pf/repos/petsc/src/ts/interface/ts.c:3445 [ERROR] #22 TSStep() at /home/4pf/repos/petsc/src/ts/interface/ts.c:3445 [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] #23 TSSolve() at /home/4pf/repos/petsc/src/ts/interface/ts.c:3836 [ERROR] #23 TSSolve() at /home/4pf/repos/petsc/src/ts/interface/ts.c:3836 [ERROR] PetscSolver::solve: TSSolve failed. [ERROR] PetscSolver::solve: TSSolve failed. Aborting. Aborting. Thanks for the help, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory |
From: Zhang, J. <jc...@mc...> - 2022-11-15 01:02:50
|
Hi, Philip, Sorry to hear that. It seems you could run the same code on CPUs but not no GPUs (with either petsc/Kokkos backend or petsc/cuda backend, is it right? -- Junchao Zhang ________________________________ From: Fackler, Philip <fac...@or...> Sent: Monday, November 14, 2022 12:13 PM To: xol...@li... <xol...@li...>; pet...@mc... <pet...@mc...> Cc: Mills, Richard Tran <rt...@an...>; Zhang, Junchao <jc...@mc...>; Blondel, Sophie <sbl...@ut...>; Roth, Philip <ro...@or...> Subject: Kokkos backend for Mat and Vec diverging when running on CUDA device. This is an issue I've brought up before (and discussed in-person with Richard). I wanted to bring it up again because I'm hitting the limits of what I know to do, and I need help figuring this out. The problem can be reproduced using Xolotl's "develop" branch built against a petsc build with kokkos and kokkos-kernels enabled. Then, either add the relevant kokkos options to the "petscArgs=" line in the system test parameter file(s), or just replace the system test parameter files with the ones from the "feature-petsc-kokkos" branch. See here<https://github.com/ORNL-Fusion/xolotl/tree/feature-petsc-kokkos/benchmarks> the files that begin with "params_system_". Note that those files use the "kokkos" options, but the problem is similar using the corresponding cuda/cusparse options. I've already tried building kokkos-kernels with no TPLs and got slightly different results, but the same problem. Any help would be appreciated. Thanks, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory |
From: Junchao Z. <jun...@gm...> - 2022-11-15 00:34:23
|
Hi, Philip, Sorry to hear that. It seems you could run the same code on CPUs but not no GPUs (with either petsc/Kokkos backend or petsc/cuda backend, is it right? --Junchao Zhang On Mon, Nov 14, 2022 at 12:13 PM Fackler, Philip via petsc-users < pet...@mc...> wrote: > This is an issue I've brought up before (and discussed in-person with > Richard). I wanted to bring it up again because I'm hitting the limits of > what I know to do, and I need help figuring this out. > > The problem can be reproduced using Xolotl's "develop" branch built > against a petsc build with kokkos and kokkos-kernels enabled. Then, either > add the relevant kokkos options to the "petscArgs=" line in the system test > parameter file(s), or just replace the system test parameter files with the > ones from the "feature-petsc-kokkos" branch. See here > <https://github.com/ORNL-Fusion/xolotl/tree/feature-petsc-kokkos/benchmarks> > the files that begin with "params_system_". > > Note that those files use the "kokkos" options, but the problem is similar > using the corresponding cuda/cusparse options. I've already tried building > kokkos-kernels with no TPLs and got slightly different results, but the > same problem. > > Any help would be appreciated. > > Thanks, > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > |
From: Barry S. <bs...@pe...> - 2022-11-14 21:05:32
|
Mat of type (null) Either the entire matrix (header) data structure has gotten corrupted or the matrix type was never set. Can you run with valgrind to see if there is any memory corruption? > On Nov 14, 2022, at 1:24 PM, Fackler, Philip via petsc-users <pet...@mc...> wrote: > > In Xolotl's "feature-petsc-kokkos" branch, I have moved our code to use the COO interface for preallocating and setting values in the Jacobian matrix. I have found that with some of our test cases, using more than one MPI rank results in a crash. Way down in the preconditioner code in petsc a Mat gets computed that has "null" for the "productsymbolic" member of its "ops". It's pretty far removed from where we compute the Jacobian entries, so I haven't been able (so far) to track it back to an error in my code. I'd appreciate some help with this from someone who is more familiar with the petsc guts so we can figure out what I'm doing wrong. (I'm assuming it's a bug in Xolotl.) > > Note that this is using the kokkos backend for Mat and Vec in petsc, but with a serial-only build of kokkos and kokkos-kernels. So, it's a CPU-only multiple MPI rank run. > > Here's a paste of the error output showing the relevant parts of the call stack: > > [ERROR] [0]PETSC ERROR: > [ERROR] --------------------- Error Message -------------------------------------------------------------- > [ERROR] [1]PETSC ERROR: > [ERROR] --------------------- Error Message -------------------------------------------------------------- > [ERROR] [1]PETSC ERROR: > [ERROR] [0]PETSC ERROR: > [ERROR] No support for this operation for this object type > [ERROR] [1]PETSC ERROR: > [ERROR] No support for this operation for this object type > [ERROR] [0]PETSC ERROR: > [ERROR] No method productsymbolic for Mat of type (null) > [ERROR] No method productsymbolic for Mat of type (null) > [ERROR] [0]PETSC ERROR: > [ERROR] [1]PETSC ERROR: > [ERROR] See https://petsc.org/release/faq/ for trouble shooting. > [ERROR] See https://petsc.org/release/faq/ for trouble shooting. > [ERROR] [0]PETSC ERROR: > [ERROR] [1]PETSC ERROR: > [ERROR] Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a GIT Date: 2022-10-28 14:39:41 +0000 > [ERROR] Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a GIT Date: 2022-10-28 14:39:41 +0000 > [ERROR] [1]PETSC ERROR: > [ERROR] [0]PETSC ERROR: > [ERROR] Unknown Name on a named PC0115427 by 4pf Mon Nov 14 13:22:01 2022 > [ERROR] Unknown Name on a named PC0115427 by 4pf Mon Nov 14 13:22:01 2022 > [ERROR] [1]PETSC ERROR: > [ERROR] [0]PETSC ERROR: > [ERROR] Configure options PETSC_DIR=/home/4pf/repos/petsc PETSC_ARCH=arch-kokkos-serial-debug --with-debugging=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-cudac=0 --prefix=/home/4pf/build/petsc/serial-debug/install --with-64-bit-indices --with-shared-libraries --with-kokkos-dir=/home/4pf/build/kokkos/serial/install --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/serial/install > [ERROR] Configure options PETSC_DIR=/home/4pf/repos/petsc PETSC_ARCH=arch-kokkos-serial-debug --with-debugging=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-cudac=0 --prefix=/home/4pf/build/petsc/serial-debug/install --with-64-bit-indices --with-shared-libraries --with-kokkos-dir=/home/4pf/build/kokkos/serial/install --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/serial/install > [ERROR] [1]PETSC ERROR: > [ERROR] [0]PETSC ERROR: > [ERROR] #1 MatProductSymbolic_MPIAIJKokkos_AB() at /home/4pf/repos/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:918 > [ERROR] #1 MatProductSymbolic_MPIAIJKokkos_AB() at /home/4pf/repos/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:918 > [ERROR] [1]PETSC ERROR: > [ERROR] [0]PETSC ERROR: > [ERROR] #2 MatProductSymbolic_MPIAIJKokkos() at /home/4pf/repos/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:1138 > [ERROR] #2 MatProductSymbolic_MPIAIJKokkos() at /home/4pf/repos/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:1138 > [ERROR] [1]PETSC ERROR: > [ERROR] [0]PETSC ERROR: > [ERROR] #3 MatProductSymbolic() at /home/4pf/repos/petsc/src/mat/interface/matproduct.c:793 > [ERROR] #3 MatProductSymbolic() at /home/4pf/repos/petsc/src/mat/interface/matproduct.c:793 > [ERROR] [1]PETSC ERROR: > [ERROR] [0]PETSC ERROR: > [ERROR] #4 MatProduct_Private() at /home/4pf/repos/petsc/src/mat/interface/matrix.c:9820 > [ERROR] #4 MatProduct_Private() at /home/4pf/repos/petsc/src/mat/interface/matrix.c:9820 > [ERROR] [0]PETSC ERROR: > [ERROR] [1]PETSC ERROR: > [ERROR] #5 MatMatMult() at /home/4pf/repos/petsc/src/mat/interface/matrix.c:9897 > [ERROR] #5 MatMatMult() at /home/4pf/repos/petsc/src/mat/interface/matrix.c:9897 > [ERROR] [0]PETSC ERROR: > [ERROR] [1]PETSC ERROR: > [ERROR] #6 PCGAMGOptProlongator_AGG() at /home/4pf/repos/petsc/src/ksp/pc/impls/gamg/agg.c:769 > [ERROR] #6 PCGAMGOptProlongator_AGG() at /home/4pf/repos/petsc/src/ksp/pc/impls/gamg/agg.c:769 > [ERROR] [0]PETSC ERROR: > [ERROR] [1]PETSC ERROR: > [ERROR] #7 PCSetUp_GAMG() at /home/4pf/repos/petsc/src/ksp/pc/impls/gamg/gamg.c:639 > [ERROR] #7 PCSetUp_GAMG() at /home/4pf/repos/petsc/src/ksp/pc/impls/gamg/gamg.c:639 > [ERROR] [1]PETSC ERROR: > [ERROR] [0]PETSC ERROR: > [ERROR] #8 PCSetUp() at /home/4pf/repos/petsc/src/ksp/pc/interface/precon.c:994 > [ERROR] #8 PCSetUp() at /home/4pf/repos/petsc/src/ksp/pc/interface/precon.c:994 > [ERROR] [1]PETSC ERROR: > [ERROR] [0]PETSC ERROR: > [ERROR] #9 KSPSetUp() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:406 > [ERROR] #9 KSPSetUp() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:406 > [ERROR] [1]PETSC ERROR: > [ERROR] [0]PETSC ERROR: > [ERROR] #10 KSPSolve_Private() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:825 > [ERROR] #10 KSPSolve_Private() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:825 > [ERROR] [0]PETSC ERROR: > [ERROR] [1]PETSC ERROR: > [ERROR] #11 KSPSolve() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:1071 > [ERROR] #11 KSPSolve() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:1071 > [ERROR] [1]PETSC ERROR: > [ERROR] [0]PETSC ERROR: > [ERROR] #12 PCApply_FieldSplit() at /home/4pf/repos/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c:1246 > [ERROR] #12 PCApply_FieldSplit() at /home/4pf/repos/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c:1246 > [ERROR] [1]PETSC ERROR: > [ERROR] [0]PETSC ERROR: > [ERROR] #13 PCApply() at /home/4pf/repos/petsc/src/ksp/pc/interface/precon.c:441 > [ERROR] #13 PCApply() at /home/4pf/repos/petsc/src/ksp/pc/interface/precon.c:441 > [ERROR] [1]PETSC ERROR: > [ERROR] [0]PETSC ERROR: > [ERROR] #14 KSP_PCApply() at /home/4pf/repos/petsc/include/petsc/private/kspimpl.h:380 > [ERROR] #14 KSP_PCApply() at /home/4pf/repos/petsc/include/petsc/private/kspimpl.h:380 > [ERROR] [1]PETSC ERROR: > [ERROR] [0]PETSC ERROR: > [ERROR] #15 KSPFGMRESCycle() at /home/4pf/repos/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c:152 > [ERROR] #15 KSPFGMRESCycle() at /home/4pf/repos/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c:152 > [ERROR] [1]PETSC ERROR: > [ERROR] [0]PETSC ERROR: > [ERROR] #16 KSPSolve_FGMRES() at /home/4pf/repos/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c:273 > [ERROR] #16 KSPSolve_FGMRES() at /home/4pf/repos/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c:273 > [ERROR] [1]PETSC ERROR: > [ERROR] [0]PETSC ERROR: > [ERROR] #17 KSPSolve_Private() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:899 > [ERROR] #17 KSPSolve_Private() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:899 > [ERROR] [0]PETSC ERROR: > [ERROR] [1]PETSC ERROR: > [ERROR] #18 KSPSolve() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:1071 > [ERROR] #18 KSPSolve() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:1071 > [ERROR] [0]PETSC ERROR: > [ERROR] [1]PETSC ERROR: > [ERROR] #19 SNESSolve_NEWTONLS() at /home/4pf/repos/petsc/src/snes/impls/ls/ls.c:210 > [ERROR] #19 SNESSolve_NEWTONLS() at /home/4pf/repos/petsc/src/snes/impls/ls/ls.c:210 > [ERROR] [1]PETSC ERROR: > [ERROR] [0]PETSC ERROR: > [ERROR] #20 SNESSolve() at /home/4pf/repos/petsc/src/snes/interface/snes.c:4689 > [ERROR] #20 SNESSolve() at /home/4pf/repos/petsc/src/snes/interface/snes.c:4689 > [ERROR] [1]PETSC ERROR: > [ERROR] [0]PETSC ERROR: > [ERROR] #21 TSStep_ARKIMEX() at /home/4pf/repos/petsc/src/ts/impls/arkimex/arkimex.c:791 > [ERROR] #21 TSStep_ARKIMEX() at /home/4pf/repos/petsc/src/ts/impls/arkimex/arkimex.c:791 > [ERROR] [1]PETSC ERROR: > [ERROR] [0]PETSC ERROR: > [ERROR] #22 TSStep() at /home/4pf/repos/petsc/src/ts/interface/ts.c:3445 > [ERROR] #22 TSStep() at /home/4pf/repos/petsc/src/ts/interface/ts.c:3445 > [ERROR] [1]PETSC ERROR: > [ERROR] [0]PETSC ERROR: > [ERROR] #23 TSSolve() at /home/4pf/repos/petsc/src/ts/interface/ts.c:3836 > [ERROR] #23 TSSolve() at /home/4pf/repos/petsc/src/ts/interface/ts.c:3836 > [ERROR] PetscSolver::solve: TSSolve failed. > [ERROR] PetscSolver::solve: TSSolve failed. > Aborting. > Aborting. > > > > Thanks for the help, > > Philip Fackler > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > Oak Ridge National Laboratory |
From: Fackler, P. <fac...@or...> - 2022-11-14 18:29:18
|
This is an issue I've brought up before (and discussed in-person with Richard). I wanted to bring it up again because I'm hitting the limits of what I know to do, and I need help figuring this out. The problem can be reproduced using Xolotl's "develop" branch built against a petsc build with kokkos and kokkos-kernels enabled. Then, either add the relevant kokkos options to the "petscArgs=" line in the system test parameter file(s), or just replace the system test parameter files with the ones from the "feature-petsc-kokkos" branch. See here<https://github.com/ORNL-Fusion/xolotl/tree/feature-petsc-kokkos/benchmarks> the files that begin with "params_system_". Note that those files use the "kokkos" options, but the problem is similar using the corresponding cuda/cusparse options. I've already tried building kokkos-kernels with no TPLs and got slightly different results, but the same problem. Any help would be appreciated. Thanks, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory |
From: Fackler, P. <fac...@or...> - 2022-11-14 18:28:01
|
In Xolotl's "feature-petsc-kokkos" branch, I have moved our code to use the COO interface for preallocating and setting values in the Jacobian matrix. I have found that with some of our test cases, using more than one MPI rank results in a crash. Way down in the preconditioner code in petsc a Mat gets computed that has "null" for the "productsymbolic" member of its "ops". It's pretty far removed from where we compute the Jacobian entries, so I haven't been able (so far) to track it back to an error in my code. I'd appreciate some help with this from someone who is more familiar with the petsc guts so we can figure out what I'm doing wrong. (I'm assuming it's a bug in Xolotl.) Note that this is using the kokkos backend for Mat and Vec in petsc, but with a serial-only build of kokkos and kokkos-kernels. So, it's a CPU-only multiple MPI rank run. Here's a paste of the error output showing the relevant parts of the call stack: [ERROR] [0]PETSC ERROR: [ERROR] --------------------- Error Message -------------------------------------------------------------- [ERROR] [1]PETSC ERROR: [ERROR] --------------------- Error Message -------------------------------------------------------------- [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] No support for this operation for this object type [ERROR] [1]PETSC ERROR: [ERROR] No support for this operation for this object type [ERROR] [0]PETSC ERROR: [ERROR] No method productsymbolic for Mat of type (null) [ERROR] No method productsymbolic for Mat of type (null) [ERROR] [0]PETSC ERROR: [ERROR] [1]PETSC ERROR: [ERROR] See https://petsc.org/release/faq/ for trouble shooting. [ERROR] See https://petsc.org/release/faq/ for trouble shooting. [ERROR] [0]PETSC ERROR: [ERROR] [1]PETSC ERROR: [ERROR] Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a GIT Date: 2022-10-28 14:39:41 +0000 [ERROR] Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a GIT Date: 2022-10-28 14:39:41 +0000 [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] Unknown Name on a named PC0115427 by 4pf Mon Nov 14 13:22:01 2022 [ERROR] Unknown Name on a named PC0115427 by 4pf Mon Nov 14 13:22:01 2022 [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] Configure options PETSC_DIR=/home/4pf/repos/petsc PETSC_ARCH=arch-kokkos-serial-debug --with-debugging=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-cudac=0 --prefix=/home/4pf/build/petsc/serial-debug/install --with-64-bit-indices --with-shared-libraries --with-kokkos-dir=/home/4pf/build/kokkos/serial/install --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/serial/install [ERROR] Configure options PETSC_DIR=/home/4pf/repos/petsc PETSC_ARCH=arch-kokkos-serial-debug --with-debugging=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-cudac=0 --prefix=/home/4pf/build/petsc/serial-debug/install --with-64-bit-indices --with-shared-libraries --with-kokkos-dir=/home/4pf/build/kokkos/serial/install --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/serial/install [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] #1 MatProductSymbolic_MPIAIJKokkos_AB() at /home/4pf/repos/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:918 [ERROR] #1 MatProductSymbolic_MPIAIJKokkos_AB() at /home/4pf/repos/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:918 [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] #2 MatProductSymbolic_MPIAIJKokkos() at /home/4pf/repos/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:1138 [ERROR] #2 MatProductSymbolic_MPIAIJKokkos() at /home/4pf/repos/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:1138 [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] #3 MatProductSymbolic() at /home/4pf/repos/petsc/src/mat/interface/matproduct.c:793 [ERROR] #3 MatProductSymbolic() at /home/4pf/repos/petsc/src/mat/interface/matproduct.c:793 [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] #4 MatProduct_Private() at /home/4pf/repos/petsc/src/mat/interface/matrix.c:9820 [ERROR] #4 MatProduct_Private() at /home/4pf/repos/petsc/src/mat/interface/matrix.c:9820 [ERROR] [0]PETSC ERROR: [ERROR] [1]PETSC ERROR: [ERROR] #5 MatMatMult() at /home/4pf/repos/petsc/src/mat/interface/matrix.c:9897 [ERROR] #5 MatMatMult() at /home/4pf/repos/petsc/src/mat/interface/matrix.c:9897 [ERROR] [0]PETSC ERROR: [ERROR] [1]PETSC ERROR: [ERROR] #6 PCGAMGOptProlongator_AGG() at /home/4pf/repos/petsc/src/ksp/pc/impls/gamg/agg.c:769 [ERROR] #6 PCGAMGOptProlongator_AGG() at /home/4pf/repos/petsc/src/ksp/pc/impls/gamg/agg.c:769 [ERROR] [0]PETSC ERROR: [ERROR] [1]PETSC ERROR: [ERROR] #7 PCSetUp_GAMG() at /home/4pf/repos/petsc/src/ksp/pc/impls/gamg/gamg.c:639 [ERROR] #7 PCSetUp_GAMG() at /home/4pf/repos/petsc/src/ksp/pc/impls/gamg/gamg.c:639 [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] #8 PCSetUp() at /home/4pf/repos/petsc/src/ksp/pc/interface/precon.c:994 [ERROR] #8 PCSetUp() at /home/4pf/repos/petsc/src/ksp/pc/interface/precon.c:994 [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] #9 KSPSetUp() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:406 [ERROR] #9 KSPSetUp() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:406 [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] #10 KSPSolve_Private() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:825 [ERROR] #10 KSPSolve_Private() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:825 [ERROR] [0]PETSC ERROR: [ERROR] [1]PETSC ERROR: [ERROR] #11 KSPSolve() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:1071 [ERROR] #11 KSPSolve() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:1071 [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] #12 PCApply_FieldSplit() at /home/4pf/repos/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c:1246 [ERROR] #12 PCApply_FieldSplit() at /home/4pf/repos/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c:1246 [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] #13 PCApply() at /home/4pf/repos/petsc/src/ksp/pc/interface/precon.c:441 [ERROR] #13 PCApply() at /home/4pf/repos/petsc/src/ksp/pc/interface/precon.c:441 [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] #14 KSP_PCApply() at /home/4pf/repos/petsc/include/petsc/private/kspimpl.h:380 [ERROR] #14 KSP_PCApply() at /home/4pf/repos/petsc/include/petsc/private/kspimpl.h:380 [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] #15 KSPFGMRESCycle() at /home/4pf/repos/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c:152 [ERROR] #15 KSPFGMRESCycle() at /home/4pf/repos/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c:152 [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] #16 KSPSolve_FGMRES() at /home/4pf/repos/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c:273 [ERROR] #16 KSPSolve_FGMRES() at /home/4pf/repos/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c:273 [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] #17 KSPSolve_Private() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:899 [ERROR] #17 KSPSolve_Private() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:899 [ERROR] [0]PETSC ERROR: [ERROR] [1]PETSC ERROR: [ERROR] #18 KSPSolve() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:1071 [ERROR] #18 KSPSolve() at /home/4pf/repos/petsc/src/ksp/ksp/interface/itfunc.c:1071 [ERROR] [0]PETSC ERROR: [ERROR] [1]PETSC ERROR: [ERROR] #19 SNESSolve_NEWTONLS() at /home/4pf/repos/petsc/src/snes/impls/ls/ls.c:210 [ERROR] #19 SNESSolve_NEWTONLS() at /home/4pf/repos/petsc/src/snes/impls/ls/ls.c:210 [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] #20 SNESSolve() at /home/4pf/repos/petsc/src/snes/interface/snes.c:4689 [ERROR] #20 SNESSolve() at /home/4pf/repos/petsc/src/snes/interface/snes.c:4689 [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] #21 TSStep_ARKIMEX() at /home/4pf/repos/petsc/src/ts/impls/arkimex/arkimex.c:791 [ERROR] #21 TSStep_ARKIMEX() at /home/4pf/repos/petsc/src/ts/impls/arkimex/arkimex.c:791 [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] #22 TSStep() at /home/4pf/repos/petsc/src/ts/interface/ts.c:3445 [ERROR] #22 TSStep() at /home/4pf/repos/petsc/src/ts/interface/ts.c:3445 [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] #23 TSSolve() at /home/4pf/repos/petsc/src/ts/interface/ts.c:3836 [ERROR] #23 TSSolve() at /home/4pf/repos/petsc/src/ts/interface/ts.c:3836 [ERROR] PetscSolver::solve: TSSolve failed. [ERROR] PetscSolver::solve: TSSolve failed. Aborting. Aborting. Thanks for the help, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory |
From: Fackler, P. <fac...@or...> - 2022-03-07 19:48:36
|
Daniel, Xolotl needs the Kokkos libraries to be shared (not static). When you configure Kokkos, use "-DBUILD_SHARED_LIBS=ON" and (re-)install. That should fix the problem you're seeing. Thanks, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: 连 via Xolotl-psi-development <xol...@li...> Sent: Friday, March 4, 2022 03:21 To: xolotl-psi-development <xol...@li...> Subject: [EXTERNAL] [Xolotl-psi-development] a question in make hi every one! When I execute the make command, a question occured at 32%, also "Linking CXX shared library libxolotlCore.so". The screenshot I got like below: [cid:05D...@18...g] the code at "xolotl/core/CMakeFiles/xolotlCore.dir/build.make:658" is a compile command such as: cd /home/zhao/pan/xolotl/xolotl-build/xolotl/core && $(CMAKE_COMMAND) -E cmake_link_script CMakeFiles/xolotlCore.dir/link.txt --verbose=$(VERBOSE) then I found the link parameter in CMakeFiles/xolotlCore.dir/link.txt has -fPIC option which it mentioned to correct ths error. before I make, I installed all the package xolotl needs. and I configured xolotl with the command and got results below. cmake -DCMAKE_BUILD_TYPE=Release -DKokkos_DIR=/home/zhao/pan/xolotl/kokkos_build -DPETSC_DIR=/home/zhao/pan/xolotl/petsc-3.16.4 -DBOOST_ROOT=/home/zhao/pan/boostlib -DCMAKE_INSTALL_PREFIX=/home/zhao/pan/xolotl/xolotl-build/install /home/zhao/pan/xolotl/xolotl-source -- Found Git: /usr/bin/git (found version "2.25.1") -- The CXX compiler identification is GNU 8.4.0 -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/g++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found MPI_CXX: /usr/local/mpich/lib/libmpicxx.so (found version "4.0") -- Found MPI: TRUE (found version "4.0") -- Found PETSc: /home/zhao/pan/xolotl/petsc-3.16.4/test/lib/libpetsc.so -- Found Boost: /home/zhao/pan/boostlib/lib/cmake/Boost-1.74.0/BoostConfig.cmake (found version "1.74.0") found components: filesystem log_setup log program_options -- The C compiler identification is GNU 8.4.0 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/gcc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Found HDF5: /home/zhao/pan/xolotl/hdf5-112/lib/libhdf5.so;/usr/lib/x86_64-linux-gnu/libz.so;/usr/lib/x86_64-linux-gnu/libdl.so;/usr/lib/x86_64-linux-gnu/libm.so (found version "1.12.1") -- WARNING! Detected HDF5 installation does not support parallel I/O! -- Setup plsm -- configure -- install -- Enabled Kokkos devices: SERIAL -- Could NOT find PAPI (missing: PAPI_LIBRARIES PAPI_INCLUDE_DIRS) -- Visualization support needs explicit VTKm_DIR. -- Found Boost: /home/zhao/pan/boostlib/lib/cmake/Boost-1.74.0/BoostConfig.cmake (found version "1.74.0") found components: unit_test_framework -- Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE) -- Configuring done -- Generating done -- Build files have been written to: /home/zhao/pan/xolotl/xolotl-build So, anyone can help me? ________________________________ Best regards! Daniel Pan |
From: <pan...@qq...> - 2022-03-04 08:22:12
|
hi every one! When I execute the make command, a question occured at 32%, also "Linking CXX shared library libxolotlCore.so". The screenshot I got like below: the code at "xolotl/core/CMakeFiles/xolotlCore.dir/build.make:658" is a compile command such as: cd /home/zhao/pan/xolotl/xolotl-build/xolotl/core && $(CMAKE_COMMAND) -E cmake_link_script CMakeFiles/xolotlCore.dir/link.txt --verbose=$(VERBOSE) then I found the link parameter in CMakeFiles/xolotlCore.dir/link.txt has -fPIC option which it mentioned to correct ths error. before I make, I installed all the package xolotl needs. and I configured xolotl with the command and got results below. cmake -DCMAKE_BUILD_TYPE=Release -DKokkos_DIR=/home/zhao/pan/xolotl/kokkos_build -DPETSC_DIR=/home/zhao/pan/xolotl/petsc-3.16.4 -DBOOST_ROOT=/home/zhao/pan/boostlib -DCMAKE_INSTALL_PREFIX=/home/zhao/pan/xolotl/xolotl-build/install /home/zhao/pan/xolotl/xolotl-source -- Found Git: /usr/bin/git (found version "2.25.1") -- The CXX compiler identification is GNU 8.4.0 -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/g++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found MPI_CXX: /usr/local/mpich/lib/libmpicxx.so (found version "4.0") -- Found MPI: TRUE (found version "4.0") -- Found PETSc: /home/zhao/pan/xolotl/petsc-3.16.4/test/lib/libpetsc.so -- Found Boost: /home/zhao/pan/boostlib/lib/cmake/Boost-1.74.0/BoostConfig.cmake (found version "1.74.0") found components: filesystem log_setup log program_options -- The C compiler identification is GNU 8.4.0 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/gcc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Found HDF5: /home/zhao/pan/xolotl/hdf5-112/lib/libhdf5.so;/usr/lib/x86_64-linux-gnu/libz.so;/usr/lib/x86_64-linux-gnu/libdl.so;/usr/lib/x86_64-linux-gnu/libm.so (found version "1.12.1") -- WARNING! Detected HDF5 installation does not support parallel I/O! -- Setup plsm -- configure -- install -- Enabled Kokkos devices: SERIAL -- Could NOT find PAPI (missing: PAPI_LIBRARIES PAPI_INCLUDE_DIRS) -- Visualization support needs explicit VTKm_DIR. -- Found Boost: /home/zhao/pan/boostlib/lib/cmake/Boost-1.74.0/BoostConfig.cmake (found version "1.74.0") found components: unit_test_framework -- Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE) -- Configuring done -- Generating done -- Build files have been written to: /home/zhao/pan/xolotl/xolotl-build So, anyone can help me? Best regards! Daniel Pan |
From: <pan...@qq...> - 2022-03-01 15:43:56
|
subscribe xolotl 祝近安 潘登 13307314401 |
From: Fackler, P. <fac...@or...> - 2022-02-23 14:47:29
|
Thanks Jed, Satish, and Richard for the quick and thorough responses. Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: petsc-users <pet...@mc...> on behalf of Richard Tran Mills via petsc-users <pet...@mc...> Sent: Thursday, February 17, 2022 18:33 To: petsc-users <pet...@mc...> Cc: Blondel, Sophie <sbl...@ut...>; Roth, Philip <ro...@or...>; xol...@li... <xol...@li...> Subject: [EXTERNAL] Re: [petsc-users] Kokkos Interface for PETSc Hi Philip, Sorry to be a bit late in my reply. Jed has explained the gist of what's involved with using the Kokkos/Kokkos-kernels back-end for the PETSc solves, though, depending on exactly how Xolotl creates its vectors, there may be a bit of work required to ensure that the command-line options specifying the matrix and GPU types get applied to the right objects, and that non-GPU types are not being hardcoded somewhere (by a call like "DMSetMatType(dm,MATAIJ)"). In addition to looking at the -log_view output, since Xolotl uses TS you can specify "-ts_view" and look at the output that describes the solver hierarchy that Xolotl sets up. If matrix types are being set correctly, you'll see things like Mat Object: 1 MPI processes type: seqaijkokkos (I note that I've also sent a related message about getting Xolotl working with Kokkos back-ends on Summit to you, Sophie, and Phil in reply to old thread about this.) Were you also asking about how to use Kokkos for PETSc matrix assembly, or is that a question for later? Cheers, Richard On 2/15/22 09:07, Satish Balay via petsc-users wrote: Also - perhaps the following info might be useful Satish ---- balay@sb /home/balay/petsc (main=) $ git grep -l download-kokkos-kernels config/examples config/examples/arch-ci-freebsd-cxx-cmplx-pkgs-dbg.py config/examples/arch-ci-linux-cuda-double.py config/examples/arch-ci-linux-gcc-ifc-cmplx.py config/examples/arch-ci-linux-hip-double.py config/examples/arch-ci-linux-pkgs-dbg-ftn-interfaces.py config/examples/arch-ci-linux-pkgs-valgrind.py config/examples/arch-ci-osx-cxx-pkgs-opt.py config/examples/arch-nvhpc.py config/examples/arch-olcf-crusher.py config/examples/arch-olcf-spock.py balay@sb /home/balay/petsc (main=) $ git grep -l "requires:.*kokkos_kernels" src/ksp/ksp/tests/ex3.c src/ksp/ksp/tests/ex43.c src/ksp/ksp/tests/ex60.c src/ksp/ksp/tutorials/ex7.c src/mat/tests/ex123.c src/mat/tests/ex132.c src/mat/tests/ex2.c src/mat/tests/ex250.c src/mat/tests/ex251.c src/mat/tests/ex252.c src/mat/tests/ex254.c src/mat/tests/ex5.c src/mat/tests/ex62.c src/mat/tutorials/ex5k.kokkos.cxx src/snes/tests/ex13.c src/snes/tutorials/ex13.c src/snes/tutorials/ex3k.kokkos.cxx src/snes/tutorials/ex56.c src/ts/utils/dmplexlandau/tutorials/ex1.c src/ts/utils/dmplexlandau/tutorials/ex1f90.F90 src/ts/utils/dmplexlandau/tutorials/ex2.c src/vec/vec/tests/ex21.c src/vec/vec/tests/ex22.c src/vec/vec/tests/ex23.c src/vec/vec/tests/ex28.c src/vec/vec/tests/ex34.c src/vec/vec/tests/ex37.c src/vec/vec/tests/ex38.c src/vec/vec/tests/ex4.c src/vec/vec/tests/ex43.c src/vec/vec/tests/ex60.c src/vec/vec/tutorials/ex1.c balay@sb /home/balay/petsc (main=) $ On Tue, 15 Feb 2022, Satish Balay via petsc-users wrote: Also - best to use petsc repo - 'main' branch. And for install on crusher - check config/examples/arch-olcf-crusher.py Satish On Tue, 15 Feb 2022, Jed Brown wrote: We need to make these docs more explicit, but the short answer is configure with --download-kokkos --download-kokkos-kernels and run almost any example with -dm_mat_type aijkokkos -dm_vec_type kokkos. If you run with -log_view, you should see that all the flops take place on the device and there are few host->device transfers. Message packing is done on the device and it'll use GPU-aware MPI. There are a few examples of residual evaluation and matrix assembly on the device using Kokkos. You can also see libCEED examples for assembly on the device into Kokkos matrices and vectors without touching host memory. "Fackler, Philip via petsc-users" <pet...@mc...><mailto:pet...@mc...> writes: We're intending to transitioning the Xolotl interfaces with PETSc. I am hoping someone (can) point us to some documentation (and examples) for using PETSc's Kokkos-based interface. If this does not yet exist, then perhaps some slides (like the ones Richard Mills showed at the NE-SciDAC all-hands meeting) showing some examples could get us started. Thanks for any help that can be provided, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory |