perfmon2-devel Mailing List for perfmon2 (Page 2)
Status: Beta
Brought to you by:
seranian
You can subscribe to this list here.
2007 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(180) |
Dec
(100) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2008 |
Jan
(114) |
Feb
(87) |
Mar
(103) |
Apr
(165) |
May
(151) |
Jun
(81) |
Jul
(148) |
Aug
(79) |
Sep
(86) |
Oct
(85) |
Nov
(33) |
Dec
(86) |
2009 |
Jan
(77) |
Feb
(71) |
Mar
(41) |
Apr
(53) |
May
(56) |
Jun
(127) |
Jul
(90) |
Aug
(112) |
Sep
(59) |
Oct
(78) |
Nov
(75) |
Dec
(70) |
2010 |
Jan
(97) |
Feb
(94) |
Mar
(92) |
Apr
(47) |
May
(116) |
Jun
(68) |
Jul
(53) |
Aug
(40) |
Sep
(111) |
Oct
(54) |
Nov
(81) |
Dec
(20) |
2011 |
Jan
(37) |
Feb
(90) |
Mar
(94) |
Apr
(15) |
May
(21) |
Jun
(6) |
Jul
(31) |
Aug
(27) |
Sep
(12) |
Oct
(5) |
Nov
(3) |
Dec
(1) |
2012 |
Jan
|
Feb
(2) |
Mar
(7) |
Apr
(25) |
May
(1) |
Jun
(13) |
Jul
(4) |
Aug
(13) |
Sep
(27) |
Oct
(4) |
Nov
(11) |
Dec
(11) |
2013 |
Jan
(14) |
Feb
(4) |
Mar
(4) |
Apr
(8) |
May
(11) |
Jun
(12) |
Jul
(6) |
Aug
(9) |
Sep
(8) |
Oct
(9) |
Nov
(19) |
Dec
(15) |
2014 |
Jan
(6) |
Feb
(13) |
Mar
(9) |
Apr
(35) |
May
(8) |
Jun
(10) |
Jul
|
Aug
(8) |
Sep
(40) |
Oct
(8) |
Nov
|
Dec
(21) |
2015 |
Jan
(6) |
Feb
(6) |
Mar
(9) |
Apr
(7) |
May
(6) |
Jun
(18) |
Jul
(22) |
Aug
(3) |
Sep
(3) |
Oct
(4) |
Nov
(18) |
Dec
(4) |
2016 |
Jan
(15) |
Feb
(21) |
Mar
(13) |
Apr
|
May
|
Jun
(11) |
Jul
(3) |
Aug
|
Sep
|
Oct
|
Nov
(3) |
Dec
(2) |
2017 |
Jan
(1) |
Feb
(1) |
Mar
(4) |
Apr
(13) |
May
(6) |
Jun
(16) |
Jul
(1) |
Aug
(1) |
Sep
|
Oct
(2) |
Nov
(6) |
Dec
(10) |
2018 |
Jan
(6) |
Feb
(3) |
Mar
|
Apr
(6) |
May
(12) |
Jun
(4) |
Jul
|
Aug
(4) |
Sep
(1) |
Oct
(9) |
Nov
|
Dec
|
2019 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
(5) |
Aug
|
Sep
|
Oct
(1) |
Nov
(9) |
Dec
|
2020 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
(14) |
Oct
|
Nov
|
Dec
|
2021 |
Jan
|
Feb
(1) |
Mar
(2) |
Apr
|
May
(4) |
Jun
(2) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
(1) |
Dec
|
2022 |
Jan
(1) |
Feb
(4) |
Mar
(1) |
Apr
|
May
(7) |
Jun
(4) |
Jul
(4) |
Aug
(13) |
Sep
|
Oct
(1) |
Nov
|
Dec
(6) |
2023 |
Jan
(2) |
Feb
|
Mar
(4) |
Apr
(2) |
May
(3) |
Jun
(3) |
Jul
(3) |
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
2024 |
Jan
(2) |
Feb
|
Mar
(3) |
Apr
(5) |
May
|
Jun
|
Jul
(4) |
Aug
(2) |
Sep
(2) |
Oct
|
Nov
|
Dec
|
2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Stephane E. <er...@go...> - 2023-06-01 19:28:51
|
Hi, Sorry for the late reply. I had a problem with the mailing list and did not see your msg earlier. How are you programming this event? Is this via perf stat or something else. You seem to be running on Intel CascadeLakeX. It supports the event you want. However in libpfm4, right now, it is called cpu_clk_thread_unhalted.thread_p. In the Intel event tables (JSON) you have two versions of the event. One is trying to force the event on the fixed counter that supports it. This is cpu_clk_unhalted.thread. But they also have cpu_clk_unhalted.thread_p (_p means programmable counter). And you can access these via perf stat -e. To access the libpfm4 events with perf you need to compile the perf tol with libpfm4 support and then you can do: perf stat --pfm-events ..... The mapping between Intel JSON and libpfm4 is as follows Intel: cpu_clk_unhalted.thread -> libpfm4: unhalted_core_cycles Intel: cpu_clk_unhalted.thread_p -> libpfm4: cpu_clk_thread_unhalted.thread_p In the end it does not make any difference which one you chose because they both count the same thing. The kernel will program the event in whichever counter is available. Hope this helps. I am getting the error: invalid event attribute for cpu_clk_unhalted.thread. |
From: Anthony D. <ada...@ic...> - 2023-05-31 17:09:49
|
Thanks for bringing this up Will. We will reach out to AMD for feedback. thanks, Anthony On Wed, May 31, 2023 at 10:41 AM William Cohen <wc...@re...> wrote: > > Hi, > > We have been doing some testing of libpfm on AMD Zen 4 processors. One of the the issues that came up that libpfm identified AMD Genoa processors as Zen 4, but did not identify the AMD Bergamo processors as Zen4 processors. This appears to be due to the following code in lib/pfmlib_amd64.c: > > } else if (cfg->family == 25) { /* family 19h */ > if (cfg->model <= 0x0f || (cfg->model >= 0x20 && cfg->model <= 0x5f)) { > rev = PFM_PMU_AMD64_FAM19H_ZEN3; > } else if (cfg->model == 17) { > rev = PFM_PMU_AMD64_FAM19H_ZEN4; > } > } > > The Genoa processor has the listed model number of 17, but the Bergamo has a different module number (160) and does not match as a result. I took a look at the kernel's perf code to see if there was additional model numbers listed in there. The kernel is using capabilities bits to determine what features to enable for zen4 so didn't have additional model numbers listed for zen4 processors. Looking around there doesn't seem to be a good place enumerating the possible AMD zen4 processor model numbers. It would be nice to avoid the having to tweak model number matchine for zen4 every time a new zen4 model comes out. > > -Will > > -- > You received this message because you are subscribed to the Google Groups "perfapi-devel" group. > To unsubscribe from this group and stop receiving emails from it, send an email to per...@ic.... > To view this discussion on the web visit https://groups.google.com/a/icl.utk.edu/d/msgid/perfapi-devel/fa2305a6-930e-e3a4-96b7-17dc52e0e6e7%40redhat.com. |
From: William C. <wc...@re...> - 2023-05-31 14:41:51
|
Hi, We have been doing some testing of libpfm on AMD Zen 4 processors. One of the the issues that came up that libpfm identified AMD Genoa processors as Zen 4, but did not identify the AMD Bergamo processors as Zen4 processors. This appears to be due to the following code in lib/pfmlib_amd64.c: } else if (cfg->family == 25) { /* family 19h */ if (cfg->model <= 0x0f || (cfg->model >= 0x20 && cfg->model <= 0x5f)) { rev = PFM_PMU_AMD64_FAM19H_ZEN3; } else if (cfg->model == 17) { rev = PFM_PMU_AMD64_FAM19H_ZEN4; } } The Genoa processor has the listed model number of 17, but the Bergamo has a different module number (160) and does not match as a result. I took a look at the kernel's perf code to see if there was additional model numbers listed in there. The kernel is using capabilities bits to determine what features to enable for zen4 so didn't have additional model numbers listed for zen4 processors. Looking around there doesn't seem to be a good place enumerating the possible AMD zen4 processor model numbers. It would be nice to avoid the having to tweak model number matchine for zen4 every time a new zen4 model comes out. -Will |
From: Anson T. <Ans...@an...> - 2023-05-10 01:09:19
|
Hello, I posted my original question here: https://github.com/wcohen/libpfm4/issues/6 [https://opengraph.githubassets.com/1290fe3fc6d6baa927a722781d8b2fb351a03999b012a93b8d224af93d91f17f/wcohen/libpfm4/issues/6]<https://github.com/wcohen/libpfm4/issues/6> Invalid event attribute for cpu_clk_unhalted.thread · Issue #6 · wcohen/libpfm4<https://github.com/wcohen/libpfm4/issues/6> Hello, I am getting the error: invalid event attribute for cpu_clk_unhalted.thread. However, I do get counter values which are in the expected range even though an error message is printed out. How... github.com It works when I run the following from the command line, and it appears when I use perf list: perf stat --e cpu_clk_unhalted.thread ls /proc/sys/kernel/perf_event_paranoid is set to 0 I'm using Intel(R) Xeon(R) Gold 6252N CPU @ 2.30GHz |
From: Stephane E. <er...@go...> - 2023-04-14 23:39:34
|
Will, Patch applied. Thanks. On Fri, Apr 14, 2023 at 1:33 PM William Cohen <wc...@re...> wrote: > When trying to build the new libpfm-4.13.0 as an rpm for various > architectures I found that the builds failed on some machines because the > compiler found that p could be uninitialized in gen_tracepoint_table. > Attached is a patch that fixes that specific issue. > > -Will |
From: William C. <wc...@re...> - 2023-04-14 20:33:21
|
When trying to build the new libpfm-4.13.0 as an rpm for various architectures I found that the builds failed on some machines because the compiler found that p could be uninitialized in gen_tracepoint_table. Attached is a patch that fixes that specific issue. -Will |
From: Namhyung K. <nam...@gm...> - 2023-03-30 01:13:30
|
It doesn't need to traverse the filesystem hierarchy from the root. Instead it can use relative pathname with openat() and pass it to fdopendir(). Actually it can introduce some kernel lock contentions when it's invoked from multiple CPUs at the same time. Signed-off-by: Namhyung Kim <nam...@go...> --- lib/pfmlib_perf_event_pmu.c | 28 ++++++++++++++++++++++++---- 1 file changed, 24 insertions(+), 4 deletions(-) diff --git a/lib/pfmlib_perf_event_pmu.c b/lib/pfmlib_perf_event_pmu.c index 718815d..4ac9299 100644 --- a/lib/pfmlib_perf_event_pmu.c +++ b/lib/pfmlib_perf_event_pmu.c @@ -356,11 +356,12 @@ gen_tracepoint_table(void) struct dirent *d1, *d2; perf_event_t *p; perf_umask_t *um; - char d2path[MAXPATHLEN]; + char POTENTIALLY_UNUSED d2path[MAXPATHLEN]; char idpath[MAXPATHLEN]; char id_str[32]; uint64_t id; int fd, err; + int POTENTIALLY_UNUSED dir1_fd; int POTENTIALLY_UNUSED dir2_fd; int reuse_event = 0; int numasks; @@ -374,7 +375,15 @@ gen_tracepoint_table(void) strncat(debugfs_mnt, "/tracing/events", MAXPATHLEN-1); debugfs_mnt[MAXPATHLEN-1]= '\0'; +#ifdef HAS_OPENAT + dir1_fd = open(debugfs_mnt, O_DIRECTORY); + if (dir1_fd < 0) + return; + + dir1 = fdopendir(dir1_fd); +#else dir1 = opendir(debugfs_mnt); +#endif if (!dir1) return; @@ -387,6 +396,17 @@ gen_tracepoint_table(void) if (!strcmp(d1->d_name, "..")) continue; +#ifdef HAS_OPENAT + /* fails if it cannot open */ + dir2_fd = openat(dir1_fd, d1->d_name, O_DIRECTORY); + if (dir2_fd < 0) + continue; + + /* fails if d2path is not a directory */ + dir2 = fdopendir(dir2_fd); + if (!dir2) + continue; +#else retlen = snprintf(d2path, MAXPATHLEN, "%s/%s", debugfs_mnt, d1->d_name); /* ensure generated d2path string is valid */ if (retlen <= 0 || MAXPATHLEN <= retlen) @@ -398,7 +418,7 @@ gen_tracepoint_table(void) continue; dir2_fd = dirfd(dir2); - +#endif /* * if a subdir did not fit our expected * tracepoint format, then we reuse the @@ -440,14 +460,14 @@ gen_tracepoint_table(void) retlen = snprintf(idpath, MAXPATHLEN, "%s/id", d2->d_name); /* ensure generated d2path string is valid */ if (retlen <= 0 || MAXPATHLEN <= retlen) - continue; + continue; fd = openat(dir2_fd, idpath, O_RDONLY); #else retlen = snprintf(idpath, MAXPATHLEN, "%s/%s/id", d2path, d2->d_name); /* ensure generated d2path string is valid */ if (retlen <= 0 || MAXPATHLEN <= retlen) - continue; + continue; fd = open(idpath, O_RDONLY); #endif -- 2.40.0.348.gf938b09366-goog |
From: Namhyung K. <nam...@gm...> - 2023-03-30 01:13:28
|
Now I think all major Linux distro provides openat() functions in libc as it's specified in POSIX.1-2008. Maybe we could add a config check to detect them later if somebody don't. Also remove the old code to undefine the macro unconditionally. Signed-off-by: Namhyung Kim <nam...@go...> --- lib/Makefile | 1 + lib/pfmlib_perf_event_pmu.c | 7 ------- 2 files changed, 1 insertion(+), 7 deletions(-) diff --git a/lib/Makefile b/lib/Makefile index 5ca71e3..aae64a1 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -33,6 +33,7 @@ include $(TOPDIR)/rules.mk SRCS=pfmlib_common.c ifeq ($(SYS),Linux) +CFLAGS += -DHAS_OPENAT SRCS += pfmlib_perf_event_pmu.c pfmlib_perf_event.c pfmlib_perf_event_raw.c endif diff --git a/lib/pfmlib_perf_event_pmu.c b/lib/pfmlib_perf_event_pmu.c index 637c5b1..718815d 100644 --- a/lib/pfmlib_perf_event_pmu.c +++ b/lib/pfmlib_perf_event_pmu.c @@ -34,13 +34,6 @@ #include <sys/param.h> #endif -/* - * looks like several distributions do not have - * the latest libc with openat support, so disable - * for now - */ -#undef HAS_OPENAT - #include "pfmlib_priv.h" #include "pfmlib_perf_event_priv.h" -- 2.40.0.348.gf938b09366-goog |
From: Namhyung K. <nam...@gm...> - 2023-03-29 22:25:25
|
It doesn't need to traverse the filesystem hierarchy from the root. Instead it can use relative pathname with openat() and pass it to fdopendir(). Actually it can introduce some kernel lock contentions when it's invoked from multiple CPUs at the same time. Signed-off-by: Namhyung Kim <nam...@go...> --- lib/pfmlib_perf_event_pmu.c | 28 ++++++++++++++++++++++++---- 1 file changed, 24 insertions(+), 4 deletions(-) diff --git a/lib/pfmlib_perf_event_pmu.c b/lib/pfmlib_perf_event_pmu.c index 718815d..4ac9299 100644 --- a/lib/pfmlib_perf_event_pmu.c +++ b/lib/pfmlib_perf_event_pmu.c @@ -356,11 +356,12 @@ gen_tracepoint_table(void) struct dirent *d1, *d2; perf_event_t *p; perf_umask_t *um; - char d2path[MAXPATHLEN]; + char POTENTIALLY_UNUSED d2path[MAXPATHLEN]; char idpath[MAXPATHLEN]; char id_str[32]; uint64_t id; int fd, err; + int POTENTIALLY_UNUSED dir1_fd; int POTENTIALLY_UNUSED dir2_fd; int reuse_event = 0; int numasks; @@ -374,7 +375,15 @@ gen_tracepoint_table(void) strncat(debugfs_mnt, "/tracing/events", MAXPATHLEN-1); debugfs_mnt[MAXPATHLEN-1]= '\0'; +#ifdef HAS_OPENAT + dir1_fd = open(debugfs_mnt, O_DIRECTORY); + if (dir1_fd < 0) + return; + + dir1 = fdopendir(dir1_fd); +#else dir1 = opendir(debugfs_mnt); +#endif if (!dir1) return; @@ -387,6 +396,17 @@ gen_tracepoint_table(void) if (!strcmp(d1->d_name, "..")) continue; +#ifdef HAS_OPENAT + /* fails if it cannot open */ + dir2_fd = openat(dir1_fd, d1->d_name, O_DIRECTORY); + if (dir2_fd < 0) + continue; + + /* fails if d2path is not a directory */ + dir2 = fdopendir(dir2_fd); + if (!dir2) + continue; +#else retlen = snprintf(d2path, MAXPATHLEN, "%s/%s", debugfs_mnt, d1->d_name); /* ensure generated d2path string is valid */ if (retlen <= 0 || MAXPATHLEN <= retlen) @@ -398,7 +418,7 @@ gen_tracepoint_table(void) continue; dir2_fd = dirfd(dir2); - +#endif /* * if a subdir did not fit our expected * tracepoint format, then we reuse the @@ -440,14 +460,14 @@ gen_tracepoint_table(void) retlen = snprintf(idpath, MAXPATHLEN, "%s/id", d2->d_name); /* ensure generated d2path string is valid */ if (retlen <= 0 || MAXPATHLEN <= retlen) - continue; + continue; fd = openat(dir2_fd, idpath, O_RDONLY); #else retlen = snprintf(idpath, MAXPATHLEN, "%s/%s/id", d2path, d2->d_name); /* ensure generated d2path string is valid */ if (retlen <= 0 || MAXPATHLEN <= retlen) - continue; + continue; fd = open(idpath, O_RDONLY); #endif -- 2.40.0.348.gf938b09366-goog |
From: Namhyung K. <nam...@gm...> - 2023-03-29 22:25:21
|
Now I think all major Linux distro provides openat() functions in libc as it's specified in POSIX.1-2008. Maybe we could add a config check to detect them later if somebody don't. Also remove the old code to undefine the macro unconditionally. Signed-off-by: Namhyung Kim <nam...@go...> --- lib/Makefile | 1 + lib/pfmlib_perf_event_pmu.c | 7 ------- 2 files changed, 1 insertion(+), 7 deletions(-) diff --git a/lib/Makefile b/lib/Makefile index 5ca71e3..aae64a1 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -33,6 +33,7 @@ include $(TOPDIR)/rules.mk SRCS=pfmlib_common.c ifeq ($(SYS),Linux) +CFLAGS += -DHAS_OPENAT SRCS += pfmlib_perf_event_pmu.c pfmlib_perf_event.c pfmlib_perf_event_raw.c endif diff --git a/lib/pfmlib_perf_event_pmu.c b/lib/pfmlib_perf_event_pmu.c index 637c5b1..718815d 100644 --- a/lib/pfmlib_perf_event_pmu.c +++ b/lib/pfmlib_perf_event_pmu.c @@ -34,13 +34,6 @@ #include <sys/param.h> #endif -/* - * looks like several distributions do not have - * the latest libc with openat support, so disable - * for now - */ -#undef HAS_OPENAT - #include "pfmlib_priv.h" #include "pfmlib_perf_event_priv.h" -- 2.40.0.348.gf938b09366-goog |
From: Stephane E. <er...@go...> - 2023-01-23 05:52:51
|
Hi John, Thanks a lot for submitting the patches. I will review them and send you feedback. At first glance, you first use N! table to then patch it to N2. Just submit a patch with the final version of the table. If the V1 table is identical to N2, then they can share the table. I don't see any V2 specific patches. Thanks. On Thu, Jan 19, 2023 at 12:42 PM John C. Linford <joh...@gm...> wrote: > > Hi Stephane, > > I have patches for libpfm4 that add support for the Arm Neoverse V1 (AWS Graviton3) and Arm Neoverse V2 (NVIDIA Grace). What's the best way to upstream these patches? I've posted a merge request (https://sourceforge.net/p/perfmon2/libpfm4/merge-requests/21/) or I can simply provide the patch file here, if that's preferred. > > Similarly, I've written patches for Neoverse CPU support in PAPI: https://bitbucket.org/icl/papi/pull-requests/424. Giuseppe Congiu would like to see the libpfm4 patches accepted before moving forward with the PAPI patches. > > Please let me know how I can help. Thanks! > > ~John Linford |
From: John C. L. <joh...@gm...> - 2023-01-19 20:42:52
|
Hi Stephane, I have patches for libpfm4 that add support for the Arm Neoverse V1 (AWS Graviton3) and Arm Neoverse V2 (NVIDIA Grace). What's the best way to upstream these patches? I've posted a merge request ( https://sourceforge.net/p/perfmon2/libpfm4/merge-requests/21/) or I can simply provide the patch file here, if that's preferred. Similarly, I've written patches for Neoverse CPU support in PAPI: https://bitbucket.org/icl/papi/pull-requests/424. Giuseppe Congiu would like to see the libpfm4 patches accepted before moving forward with the PAPI patches. Please let me know how I can help. Thanks! ~John Linford |
From: Muhammet F. Ö. <muh...@al...> - 2022-12-21 20:36:30
|
Hello Giuseppe, Thank you for your quick reply. However, I can already intercept syscalls with ptrace calls in my program. My main problem is that I can only capture LBR entries with perf event samples which is not the desired method due to the nondeterministic behavior of sampling. What I want to do is read LBR entries which are generated by a specific process and don't use a sampling based approach. Below, you can find the list of the approaches I tried until so far: - Until now, I developed my tool by looking at the perf_examples/branch_smpl.c example under the libpfm4 repository. I tried to change the frequency of sampling in order to get a much more deterministic LBR reader but it doesn't help much. - I also tried to use wrmsr - rdmsr tools as well in order to read LBR entries on demand but those tools weren't working on a process level(returns all entries instead of specific LBR entries belong on a specific process) - I investigated the linux source code itself as well and observed that a function called "intel_pmu_lbr_read_64" which is used with Performance Monitor Unit but I have very little knowledge about how does PMU works in general. Again, my question is how can I implement a function(preferably in C/C++) such that it will return process specific LBR entries on demand without any sampling. Any guidance about Last Branch Reading or/and Performance Monitor Unit would be a great help. Have a nice day all! -Fatih On Wed, Dec 21, 2022 at 3:04 PM Giuseppe Congiu <gc...@ic...> wrote: > Hi Fatih, > > There are different ways you can do this. One could be to write your own > syscall wrapper library. The wrapper can read LBRs and then fallback to the > system provided syscall. You can look at the --wrap option of the > linker (man ld) as a possible implementation option for this. The linker > --wrap option assumes you can build the application from sources. If that > is not the case you can still try with LD_PRELOAD of the wrapper library, > but this will only work if the syscalls are made directly by the > application and not by a third party dependency library (take this with a > grain of salt, I might recall incorrectly). If the LD_PRELOAD does not work > either, another way could be to use the ptrace syscall to attach to your > application process (this is what strace uses). There might even be other, > better, ways of achieving the same result that I am not aware of, but these > should be good starting points. > > Best, > Giuseppe > > > > > On Wed, Dec 21, 2022 at 11:48 AM Muhammet Fatih Öztank (Student) via > perfmon2-devel <per...@li...> wrote: > >> Hello, >> >> I'm working on a project which requires usage of Intel LBR functionality, >> I'm currently developing a tool to detect process specific malicious >> behavior by reading Intel Last Branch Recording entries after each syscall. >> Currently, I managed to develop a LBR reader program by using perf events >> and libpfm4. >> >> However, my program uses sampling to retrieve LBR entries which is not >> the requested functionality due to the fact that sampling is >> nondeterministic. Our requested functionality is a request based structure >> where on each syscall we would like to retrieve all LBR entries on the >> debug registers which belong to the traced process. >> >> What I want to ask is is there any other way to read LBR entries in a >> deterministic manner(i.e. I am calling my function and I will get all LBR >> entries related to the process which is being traced). Any help is greatly >> appreciated. >> >> Thanks in advance >> -Fatih >> _______________________________________________ >> perfmon2-devel mailing list >> per...@li... >> https://lists.sourceforge.net/lists/listinfo/perfmon2-devel >> > |
From: Giuseppe C. <gc...@ic...> - 2022-12-21 12:04:58
|
Hi Fatih, There are different ways you can do this. One could be to write your own syscall wrapper library. The wrapper can read LBRs and then fallback to the system provided syscall. You can look at the --wrap option of the linker (man ld) as a possible implementation option for this. The linker --wrap option assumes you can build the application from sources. If that is not the case you can still try with LD_PRELOAD of the wrapper library, but this will only work if the syscalls are made directly by the application and not by a third party dependency library (take this with a grain of salt, I might recall incorrectly). If the LD_PRELOAD does not work either, another way could be to use the ptrace syscall to attach to your application process (this is what strace uses). There might even be other, better, ways of achieving the same result that I am not aware of, but these should be good starting points. Best, Giuseppe On Wed, Dec 21, 2022 at 11:48 AM Muhammet Fatih Öztank (Student) via perfmon2-devel <per...@li...> wrote: > Hello, > > I'm working on a project which requires usage of Intel LBR functionality, > I'm currently developing a tool to detect process specific malicious > behavior by reading Intel Last Branch Recording entries after each syscall. > Currently, I managed to develop a LBR reader program by using perf events > and libpfm4. > > However, my program uses sampling to retrieve LBR entries which is not the > requested functionality due to the fact that sampling is nondeterministic. > Our requested functionality is a request based structure where on each > syscall we would like to retrieve all LBR entries on the debug registers > which belong to the traced process. > > What I want to ask is is there any other way to read LBR entries in a > deterministic manner(i.e. I am calling my function and I will get all LBR > entries related to the process which is being traced). Any help is greatly > appreciated. > > Thanks in advance > -Fatih > _______________________________________________ > perfmon2-devel mailing list > per...@li... > https://lists.sourceforge.net/lists/listinfo/perfmon2-devel > |
From: Giuseppe C. <gc...@ic...> - 2022-12-21 11:33:40
|
Hello, I found a bug when using libpfm4 in PAPI with Zen3. Libpfm4 returns to PAPI more than one default PMU. I have had a look into libpfm4 and I think the revision number of Zen4 is wrong. It is set to Zen3 instead. Attached is a fix. Best, Giuseppe Congiu |
From: Muhammet F. Ö. (S. <oz...@sa...> - 2022-12-21 10:48:15
|
Hello, I'm working on a project which requires usage of Intel LBR functionality, I'm currently developing a tool to detect process specific malicious behavior by reading Intel Last Branch Recording entries after each syscall. Currently, I managed to develop a LBR reader program by using perf events and libpfm4. However, my program uses sampling to retrieve LBR entries which is not the requested functionality due to the fact that sampling is nondeterministic. Our requested functionality is a request based structure where on each syscall we would like to retrieve all LBR entries on the debug registers which belong to the traced process. What I want to ask is is there any other way to read LBR entries in a deterministic manner(i.e. I am calling my function and I will get all LBR entries related to the process which is being traced). Any help is greatly appreciated. Thanks in advance -Fatih |
From: John C. L. <joh...@gm...> - 2022-12-20 18:02:12
|
P.S. I also have patches for NVIDIA Grace (Arm Neoverse V2) based on the V1 patches. I can ship them as soon as we have Neoverse V1 support upstream. Thanks, On Tue, Dec 20, 2022 at 10:44 AM John C. Linford < joh...@gm...> wrote: > Hello, > > I've implemented support for the Arm Neoverse V1 (e.g. AWS Graviton3) in > libpfm4. What's the best way to upstream these patches? I've posted a > merge request ( > https://sourceforge.net/p/perfmon2/libpfm4/merge-requests/21/) or I can > simply provide the patch file here, if that's preferred. > > Thanks! > |
From: John C. L. <joh...@gm...> - 2022-12-20 16:45:10
|
Hello, I've implemented support for the Arm Neoverse V1 (e.g. AWS Graviton3) in libpfm4. What's the best way to upstream these patches? I've posted a merge request (https://sourceforge.net/p/perfmon2/libpfm4/merge-requests/21/) or I can simply provide the patch file here, if that's preferred. Thanks! |
From: Romin T. <rom...@gm...> - 2022-10-24 13:59:13
|
Hi all, First, let me thank you for this project. Awesome! We've been playing with /libpfm4/ these days, but couldn't get it working on our 2 *AMD ZEN3* CPUs. I pasted their specifications as given by /PAPI/ below. We think the problem with /libpfm4/ in our case is that in the function *amd64_get_revision* (in /lib/pfmlib_amd64.c/), there is a switch over the /model/, and only model /1/ is supported while our CPUs showcase model /80/ and /33/. We can propose the following patch, but we don't know if this is the right fix, or if there is a specific reason why only model /1/ is supported. What do you think? Many thanks in advance, Romin diff --git a/lib/pfmlib_amd64.c b/lib/pfmlib_amd64.c index 8e07525..b60b362 100644 --- a/lib/pfmlib_amd64.c +++ b/lib/pfmlib_amd64.c @@ -182,11 +182,13 @@ amd64_get_revision(pfm_amd64_config_t *cfg) rev = PFM_PMU_AMD64_FAM16H; } else if (cfg->family == 25) { /* family 19h */ switch (cfg->model) { - case 1: + case 1: + case 33: + case 80: rev = PFM_PMU_AMD64_FAM19H_ZEN3; - break; - default: - ; + break; + default: + break; } } Available components and hardware information. -------------------------------------------------------------------------------- PAPI version : 6.0.0.1 Operating system : Linux 5.15.0-52-generic Vendor string and code : AuthenticAMD (2, 0x2) Model string and code : AMD Ryzen 7 5800H with Radeon Graphics (80, 0x50) CPU revision : 0.000000 CPUID : Family/Model/Stepping 25/80/0, 0x19/0x50/0x00 CPU Max MHz : 4463 CPU Min MHz : 400 Total cores : 16 SMT threads per core : 2 Cores per socket : 8 Sockets : 1 Cores per NUMA region : 16 NUMA regions : 1 Running in a VM : no Number Hardware Counters : 0 Max Multiplex Counters : 384 Fast counter read (rdpmc): no Available components and hardware information. -------------------------------------------------------------------------------- PAPI version : 6.0.0.1 Operating system : Linux 5.15.0-46-generic Vendor string and code : AuthenticAMD (2, 0x2) Model string and code : AMD Ryzen 9 5950X 16-Core Processor (33, 0x21) CPU revision : 0.000000 CPUID : Family/Model/Stepping 25/33/0, 0x19/0x21/0x00 CPU Max MHz : 5083 CPU Min MHz : 2200 Total cores : 32 SMT threads per core : 2 Cores per socket : 16 Sockets : 1 Cores per NUMA region : 32 NUMA regions : 1 Running in a VM : no Number Hardware Counters : 0 Max Multiplex Counters : 384 Fast counter read (rdpmc): yes |
From: Giuseppe C. <gc...@ic...> - 2022-08-22 09:25:33
|
Hi Will, Sorry for the late response. ‘Us', ideally, would be myself, Anthony and Heike. That way any of us can test on Power10. Concerning suggestions on CAT, I have limited knowledge of the tool. Anthony can probably provide better guidance in this case. Best, Giuseppe > On 9 Aug 2022, at 16:59, will schmidt <wil...@vn... <mailto:wil...@vn...>> wrote: > > On Tue, 2022-08-09 at 08:08 +0200, Giuseppe Congiu wrote: >> Hi Will, >> >> We would need to verify your power10 presets before we add them to >> PAPI. We have a test suite that we normally use to do that (the >> counter analysis toolkit - CAT). Unfortunately, we don’t have access >> to any power10 machines at the moment. Would it be possible for us to >> run the counter analysis toolkit on one of your power10 systems? This >> would help speeding up the integration of the presets in PAPI. > > Hi, > > Depending on who 'us' is.. :-) I can poke around internally to see > what the status is of getting public Power10 systems available, though > that is generally out of my sphere of influence. I can certainly run > it and report results. > I hacked away at a few makefile/path entries > in the counter_analysis_toolkit subdir and got something to run... > I'll take advice and suggestions on what to tweak in order to continue. > > I took the entirety of the event list entries mentioned in this patch, > added them to the event_list.txt ala > PAPI_REF_CYC 0 > PAPI_L1_DCM 0 > PAPI_L1_LDM 0 > ... > and ran cat_collect as > indicated in the README. The test does seem to hang after generating > some results.. > > Under -verbose it appears to be hanging on the D-Cache > latencies portion. > > Branch Benchmarks: 100% > D-Cache Latencies: 0% > and most of the output created up to this point does have data. > > > cat > PAPI_LST_INS.branch > 38.00 > 41.00 > 39.00 > 305.25 > 369.00 > 367.75 > 181.25 > 245.25 > 243.50 > 7 > 2.00 > 6.00 > > Thanks > -Will > >> >> Thank you, >> Giuseppe >> >>> On 4 Aug 2022, at 03:57, will schmidt <wil...@vn... <mailto:wil...@vn...>> >>> wrote: >>> >>> On Wed, 2022-08-03 at 21:29 +0200, Giuseppe Congiu wrote: >>>> Hi will, >>>> >>>> How did you define the PAPI preset? >>>> >>> >>> Meaning how did I come up with the values used? :-) >>> Typically I start >>> with the values for the previous processor, >>> and depending on the changes to the current Power PMU event list, >>> do my best to refresh any event entries with their new equivalents. >>> >>> Thanks >>> -Will >>> >>>> —Giuseppe >>>> >>>>> On 3 Aug 2022, at 20:51, will schmidt < >>>>> wil...@vn... <mailto:wil...@vn...>> >>>>> wrote: >>>>> >>>>> [PATCH] PAPI, Power10 event list mappings. >>>>> >>>>> Hi, >>>>> This patch provides the PAPI event >>>>> mappings for Power10 support. >>>>> >>>>> This should be safe to commit once PAPI completes >>>>> the pull requests from libpfm4 that will include >>>>> the prerequisite Power10 content. >>>>> >>>>> >>>>> >>>>> >>>>> diff --git a/src/papi_events.csv b/src/papi_events.csv >>>>> index 4ef647959..d1c89d30b 100644 >>>>> --- a/src/papi_events.csv >>>>> +++ b/src/papi_events.csv >>>>> @@ -1673,10 +1673,59 @@ >>>>> PRESET,PAPI_BR_CN,DERIVED_SUB,PM_BR_CMPL,PM_BR_UNCOND >>>>> PRESET,PAPI_BR_NTK,DERIVED_POSTFIX,N0|N1|- >>>>>> ,PM_BR_CMPL,PM_BR_TAKEN_CMPL >>>>> PRESET,PAPI_BR_UCN,NOT_DERIVED,PM_BR_UNCOND >>>>> PRESET,PAPI_BR_TKN,NOT_DERIVED,PM_BR_CORECT_PRED_TAKEN_CMPL >>>>> PRESET,PAPI_FXU_IDL,NOT_DERIVED,PM_FXU_IDLE >>>>> # >>>>> +CPU,POWER10 >>>>> +CPU,power10 >>>>> +# >>>>> +PRESET,PAPI_REF_CYC,NOT_DERIVED,PM_CYC_ALT3 >>>>> +PRESET,PAPI_L1_DCM,NOT_DERIVED,PM_LD_MISS_L1 >>>>> +PRESET,PAPI_L1_LDM,NOT_DERIVED,PM_LD_MISS_L1 >>>>> +PRESET,PAPI_L1_STM,NOT_DERIVED,PM_ST_MISS_L1 >>>>> +PRESET,PAPI_L1_DCW,DERIVED_SUB,PM_ST_FIN,PM_ST_MISS_L1 >>>>> +PRESET,PAPI_L1_DCR,NOT_DERIVED,PM_LD_HIT_L1 >>>>> +PRESET,PAPI_L1_DCA,DERIVED_ADD,PM_LD_REF_L1,PM_ST_CMPL >>>>> +PRESET,PAPI_L2_DCM,NOT_DERIVED,PM_DATA_FROM_L2MISS >>>>> +PRESET,PAPI_L2_LDM,NOT_DERIVED,PM_L2_LD_MISS >>>>> +PRESET,PAPI_L2_STM,NOT_DERIVED,PM_L2_ST_MISS >>>>> +PRESET,PAPI_L2_DCR,NOT_DERIVED,PM_DATA_FROM_L2 >>>>> +PRESET,PAPI_L2_DCW,NOT_DERIVED,PM_L2_ST >>>>> +PRESET,PAPI_L3_DCR,NOT_DERIVED,PM_DATA_FROM_L3 >>>>> +PRESET,PAPI_L3_DCM,NOT_DERIVED,PM_DATA_FROM_L3MISS >>>>> +PRESET,PAPI_L3_LDM,NOT_DERIVED,PM_L3_LD_MISS >>>>> +PRESET,PAPI_L1_ICH,NOT_DERIVED,PM_INST_FROM_LMEM >>>>> +PRESET,PAPI_L1_ICM,NOT_DERIVED,PM_L1_ICACHE_MISS >>>>> +PRESET,PAPI_L2_ICM,NOT_DERIVED,PM_INST_FROM_L3 >>>>> +PRESET,PAPI_L2_ICH,NOT_DERIVED,PM_INST_FROM_L2 >>>>> +PRESET,PAPI_L3_ICA,NOT_DERIVED,PM_INST_FROM_L2MISS >>>>> +PRESET,PAPI_L3_ICH,NOT_DERIVED,PM_INST_FROM_L3 >>>>> +PRESET,PAPI_L3_ICM,NOT_DERIVED,PM_INST_FROM_L3MISS >>>>> +PRESET,PAPI_FMA_INS,NOT_DERIVED,PM_FMA_CMPL >>>>> +#PRESET,PAPI_TOT_IIS,NOT_DERIVED, >>>>> +PRESET,PAPI_TOT_INS,NOT_DERIVED,PM_INST_CMPL >>>>> +PRESET,PAPI_INT_INS,NOT_DERIVED,PM_FXU_ISSUE >>>>> +PRESET,PAPI_FP_OPS,NOT_DERIVED,PM_FLOP_CMPL >>>>> +PRESET,PAPI_FP_INS,NOT_DERIVED,PM_FLOP_CMPL >>>>> +PRESET,PAPI_DP_OPS,NOT_DERIVED,PM_2FLOP_CMPL >>>>> +PRESET,PAPI_SP_OPS,NOT_DERIVED,PM_SP_FLOP_CMPL >>>>> +PRESET,PAPI_TOT_CYC,NOT_DERIVED,PM_RUN_CYC >>>>> +#PRESET,PAPI_HW_INT,NOT_DERIVED,PM_EXT_INT >>>>> +PRESET,PAPI_STL_ICY,DERIVED_POSTFIX,N0|N1|- >>>>>> ,PM_RUN_CYC,PM_1PLUS_PPC_DISP >>>>> +PRESET,PAPI_SR_INS,NOT_DERIVED,PM_ST_FIN >>>>> +PRESET,PAPI_LD_INS,NOT_DERIVED,PM_LD_REF_L1 >>>>> +PRESET,PAPI_LST_INS,NOT_DERIVED,PM_LSU_FIN >>>>> +PRESET,PAPI_LST_INS,DERIVED_ADD,PM_LD_REF_L1,PM_LD_MISS_L1,PM_ >>>>> ST_F >>>>> IN >>>>> +PRESET,PAPI_BR_INS,NOT_DERIVED,PM_BR_FIN >>>>> +PRESET,PAPI_BR_MSP,NOT_DERIVED,PM_BR_MPRED_CMPL >>>>> +#PRESET,PAPI_BR_PRC,NOT_DERIVED, >>>>> +PRESET,PAPI_BR_CN,DERIVED_SUB,PM_BR_TAKEN_CMPL,PM_BR_TKN_UNCON >>>>> D_FI >>>>> N >>>>> +PRESET,PAPI_BR_NTK,NOT_DERIVED,PM_BR_MPRED_CMPL >>>>> +PRESET,PAPI_BR_UCN,NOT_DERIVED,PM_BR_FIN >>>>> +PRESET,PAPI_BR_TKN,NOT_DERIVED,PM_BR_TAKEN_CMPL >>>>> +#PRESET,PAPI_FXU_IDL,NOT_DERIVED,PM_FXU_IDLE >>>>> +# >>>>> CPU,ultra12 >>>>> # >>>>> PRESET,PAPI_TOT_CYC,NOT_DERIVED,CYCLE_CNT >>>>> PRESET,PAPI_TOT_INS,NOT_DERIVED,INSTR_CNT >>>>> PRESET,PAPI_L1_ICM,NOT_DERIVED,DISPATCH0_IC_MISS >>>>> > |
From: will s. <wil...@vn...> - 2022-08-09 14:59:55
|
On Tue, 2022-08-09 at 08:08 +0200, Giuseppe Congiu wrote: > Hi Will, > > We would need to verify your power10 presets before we add them to > PAPI. We have a test suite that we normally use to do that (the > counter analysis toolkit - CAT). Unfortunately, we don’t have access > to any power10 machines at the moment. Would it be possible for us to > run the counter analysis toolkit on one of your power10 systems? This > would help speeding up the integration of the presets in PAPI. Hi, Depending on who 'us' is.. :-) I can poke around internally to see what the status is of getting public Power10 systems available, though that is generally out of my sphere of influence. I can certainly run it and report results. I hacked away at a few makefile/path entries in the counter_analysis_toolkit subdir and got something to run... I'll take advice and suggestions on what to tweak in order to continue. I took the entirety of the event list entries mentioned in this patch, added them to the event_list.txt ala PAPI_REF_CYC 0 PAPI_L1_DCM 0 PAPI_L1_LDM 0 ... and ran cat_collect as indicated in the README. The test does seem to hang after generating some results.. Under -verbose it appears to be hanging on the D-Cache latencies portion. Branch Benchmarks: 100% D-Cache Latencies: 0% and most of the output created up to this point does have data. cat PAPI_LST_INS.branch 38.00 41.00 39.00 305.25 369.00 367.75 181.25 245.25 243.50 7 2.00 6.00 Thanks -Will > > Thank you, > Giuseppe > > > On 4 Aug 2022, at 03:57, will schmidt <wil...@vn...> > > wrote: > > > > On Wed, 2022-08-03 at 21:29 +0200, Giuseppe Congiu wrote: > > > Hi will, > > > > > > How did you define the PAPI preset? > > > > > > > Meaning how did I come up with the values used? :-) > > Typically I start > > with the values for the previous processor, > > and depending on the changes to the current Power PMU event list, > > do my best to refresh any event entries with their new equivalents. > > > > Thanks > > -Will > > > > > —Giuseppe > > > > > > > On 3 Aug 2022, at 20:51, will schmidt < > > > > wil...@vn...> > > > > wrote: > > > > > > > > [PATCH] PAPI, Power10 event list mappings. > > > > > > > > Hi, > > > > This patch provides the PAPI event > > > > mappings for Power10 support. > > > > > > > > This should be safe to commit once PAPI completes > > > > the pull requests from libpfm4 that will include > > > > the prerequisite Power10 content. > > > > > > > > > > > > > > > > > > > > diff --git a/src/papi_events.csv b/src/papi_events.csv > > > > index 4ef647959..d1c89d30b 100644 > > > > --- a/src/papi_events.csv > > > > +++ b/src/papi_events.csv > > > > @@ -1673,10 +1673,59 @@ > > > > PRESET,PAPI_BR_CN,DERIVED_SUB,PM_BR_CMPL,PM_BR_UNCOND > > > > PRESET,PAPI_BR_NTK,DERIVED_POSTFIX,N0|N1|- > > > > > ,PM_BR_CMPL,PM_BR_TAKEN_CMPL > > > > PRESET,PAPI_BR_UCN,NOT_DERIVED,PM_BR_UNCOND > > > > PRESET,PAPI_BR_TKN,NOT_DERIVED,PM_BR_CORECT_PRED_TAKEN_CMPL > > > > PRESET,PAPI_FXU_IDL,NOT_DERIVED,PM_FXU_IDLE > > > > # > > > > +CPU,POWER10 > > > > +CPU,power10 > > > > +# > > > > +PRESET,PAPI_REF_CYC,NOT_DERIVED,PM_CYC_ALT3 > > > > +PRESET,PAPI_L1_DCM,NOT_DERIVED,PM_LD_MISS_L1 > > > > +PRESET,PAPI_L1_LDM,NOT_DERIVED,PM_LD_MISS_L1 > > > > +PRESET,PAPI_L1_STM,NOT_DERIVED,PM_ST_MISS_L1 > > > > +PRESET,PAPI_L1_DCW,DERIVED_SUB,PM_ST_FIN,PM_ST_MISS_L1 > > > > +PRESET,PAPI_L1_DCR,NOT_DERIVED,PM_LD_HIT_L1 > > > > +PRESET,PAPI_L1_DCA,DERIVED_ADD,PM_LD_REF_L1,PM_ST_CMPL > > > > +PRESET,PAPI_L2_DCM,NOT_DERIVED,PM_DATA_FROM_L2MISS > > > > +PRESET,PAPI_L2_LDM,NOT_DERIVED,PM_L2_LD_MISS > > > > +PRESET,PAPI_L2_STM,NOT_DERIVED,PM_L2_ST_MISS > > > > +PRESET,PAPI_L2_DCR,NOT_DERIVED,PM_DATA_FROM_L2 > > > > +PRESET,PAPI_L2_DCW,NOT_DERIVED,PM_L2_ST > > > > +PRESET,PAPI_L3_DCR,NOT_DERIVED,PM_DATA_FROM_L3 > > > > +PRESET,PAPI_L3_DCM,NOT_DERIVED,PM_DATA_FROM_L3MISS > > > > +PRESET,PAPI_L3_LDM,NOT_DERIVED,PM_L3_LD_MISS > > > > +PRESET,PAPI_L1_ICH,NOT_DERIVED,PM_INST_FROM_LMEM > > > > +PRESET,PAPI_L1_ICM,NOT_DERIVED,PM_L1_ICACHE_MISS > > > > +PRESET,PAPI_L2_ICM,NOT_DERIVED,PM_INST_FROM_L3 > > > > +PRESET,PAPI_L2_ICH,NOT_DERIVED,PM_INST_FROM_L2 > > > > +PRESET,PAPI_L3_ICA,NOT_DERIVED,PM_INST_FROM_L2MISS > > > > +PRESET,PAPI_L3_ICH,NOT_DERIVED,PM_INST_FROM_L3 > > > > +PRESET,PAPI_L3_ICM,NOT_DERIVED,PM_INST_FROM_L3MISS > > > > +PRESET,PAPI_FMA_INS,NOT_DERIVED,PM_FMA_CMPL > > > > +#PRESET,PAPI_TOT_IIS,NOT_DERIVED, > > > > +PRESET,PAPI_TOT_INS,NOT_DERIVED,PM_INST_CMPL > > > > +PRESET,PAPI_INT_INS,NOT_DERIVED,PM_FXU_ISSUE > > > > +PRESET,PAPI_FP_OPS,NOT_DERIVED,PM_FLOP_CMPL > > > > +PRESET,PAPI_FP_INS,NOT_DERIVED,PM_FLOP_CMPL > > > > +PRESET,PAPI_DP_OPS,NOT_DERIVED,PM_2FLOP_CMPL > > > > +PRESET,PAPI_SP_OPS,NOT_DERIVED,PM_SP_FLOP_CMPL > > > > +PRESET,PAPI_TOT_CYC,NOT_DERIVED,PM_RUN_CYC > > > > +#PRESET,PAPI_HW_INT,NOT_DERIVED,PM_EXT_INT > > > > +PRESET,PAPI_STL_ICY,DERIVED_POSTFIX,N0|N1|- > > > > > ,PM_RUN_CYC,PM_1PLUS_PPC_DISP > > > > +PRESET,PAPI_SR_INS,NOT_DERIVED,PM_ST_FIN > > > > +PRESET,PAPI_LD_INS,NOT_DERIVED,PM_LD_REF_L1 > > > > +PRESET,PAPI_LST_INS,NOT_DERIVED,PM_LSU_FIN > > > > +PRESET,PAPI_LST_INS,DERIVED_ADD,PM_LD_REF_L1,PM_LD_MISS_L1,PM_ > > > > ST_F > > > > IN > > > > +PRESET,PAPI_BR_INS,NOT_DERIVED,PM_BR_FIN > > > > +PRESET,PAPI_BR_MSP,NOT_DERIVED,PM_BR_MPRED_CMPL > > > > +#PRESET,PAPI_BR_PRC,NOT_DERIVED, > > > > +PRESET,PAPI_BR_CN,DERIVED_SUB,PM_BR_TAKEN_CMPL,PM_BR_TKN_UNCON > > > > D_FI > > > > N > > > > +PRESET,PAPI_BR_NTK,NOT_DERIVED,PM_BR_MPRED_CMPL > > > > +PRESET,PAPI_BR_UCN,NOT_DERIVED,PM_BR_FIN > > > > +PRESET,PAPI_BR_TKN,NOT_DERIVED,PM_BR_TAKEN_CMPL > > > > +#PRESET,PAPI_FXU_IDL,NOT_DERIVED,PM_FXU_IDLE > > > > +# > > > > CPU,ultra12 > > > > # > > > > PRESET,PAPI_TOT_CYC,NOT_DERIVED,CYCLE_CNT > > > > PRESET,PAPI_TOT_INS,NOT_DERIVED,INSTR_CNT > > > > PRESET,PAPI_L1_ICM,NOT_DERIVED,DISPATCH0_IC_MISS > > > > |
From: Giuseppe C. <gc...@ic...> - 2022-08-09 06:35:06
|
Hi Will, We would need to verify your power10 presets before we add them to PAPI. We have a test suite that we normally use to do that (the counter analysis toolkit - CAT). Unfortunately, we don’t have access to any power10 machines at the moment. Would it be possible for us to run the counter analysis toolkit on one of your power10 systems? This would help speeding up the integration of the presets in PAPI. Thank you, Giuseppe > On 4 Aug 2022, at 03:57, will schmidt <wil...@vn...> wrote: > > On Wed, 2022-08-03 at 21:29 +0200, Giuseppe Congiu wrote: >> Hi will, >> >> How did you define the PAPI preset? >> > > Meaning how did I come up with the values used? :-) > Typically I start > with the values for the previous processor, > and depending on the changes to the current Power PMU event list, > do my best to refresh any event entries with their new equivalents. > > Thanks > -Will > >> —Giuseppe >> >>> On 3 Aug 2022, at 20:51, will schmidt <wil...@vn...> >>> wrote: >>> >>> [PATCH] PAPI, Power10 event list mappings. >>> >>> Hi, >>> This patch provides the PAPI event >>> mappings for Power10 support. >>> >>> This should be safe to commit once PAPI completes >>> the pull requests from libpfm4 that will include >>> the prerequisite Power10 content. >>> >>> >>> >>> >>> diff --git a/src/papi_events.csv b/src/papi_events.csv >>> index 4ef647959..d1c89d30b 100644 >>> --- a/src/papi_events.csv >>> +++ b/src/papi_events.csv >>> @@ -1673,10 +1673,59 @@ >>> PRESET,PAPI_BR_CN,DERIVED_SUB,PM_BR_CMPL,PM_BR_UNCOND >>> PRESET,PAPI_BR_NTK,DERIVED_POSTFIX,N0|N1|- >>> |,PM_BR_CMPL,PM_BR_TAKEN_CMPL >>> PRESET,PAPI_BR_UCN,NOT_DERIVED,PM_BR_UNCOND >>> PRESET,PAPI_BR_TKN,NOT_DERIVED,PM_BR_CORECT_PRED_TAKEN_CMPL >>> PRESET,PAPI_FXU_IDL,NOT_DERIVED,PM_FXU_IDLE >>> # >>> +CPU,POWER10 >>> +CPU,power10 >>> +# >>> +PRESET,PAPI_REF_CYC,NOT_DERIVED,PM_CYC_ALT3 >>> +PRESET,PAPI_L1_DCM,NOT_DERIVED,PM_LD_MISS_L1 >>> +PRESET,PAPI_L1_LDM,NOT_DERIVED,PM_LD_MISS_L1 >>> +PRESET,PAPI_L1_STM,NOT_DERIVED,PM_ST_MISS_L1 >>> +PRESET,PAPI_L1_DCW,DERIVED_SUB,PM_ST_FIN,PM_ST_MISS_L1 >>> +PRESET,PAPI_L1_DCR,NOT_DERIVED,PM_LD_HIT_L1 >>> +PRESET,PAPI_L1_DCA,DERIVED_ADD,PM_LD_REF_L1,PM_ST_CMPL >>> +PRESET,PAPI_L2_DCM,NOT_DERIVED,PM_DATA_FROM_L2MISS >>> +PRESET,PAPI_L2_LDM,NOT_DERIVED,PM_L2_LD_MISS >>> +PRESET,PAPI_L2_STM,NOT_DERIVED,PM_L2_ST_MISS >>> +PRESET,PAPI_L2_DCR,NOT_DERIVED,PM_DATA_FROM_L2 >>> +PRESET,PAPI_L2_DCW,NOT_DERIVED,PM_L2_ST >>> +PRESET,PAPI_L3_DCR,NOT_DERIVED,PM_DATA_FROM_L3 >>> +PRESET,PAPI_L3_DCM,NOT_DERIVED,PM_DATA_FROM_L3MISS >>> +PRESET,PAPI_L3_LDM,NOT_DERIVED,PM_L3_LD_MISS >>> +PRESET,PAPI_L1_ICH,NOT_DERIVED,PM_INST_FROM_LMEM >>> +PRESET,PAPI_L1_ICM,NOT_DERIVED,PM_L1_ICACHE_MISS >>> +PRESET,PAPI_L2_ICM,NOT_DERIVED,PM_INST_FROM_L3 >>> +PRESET,PAPI_L2_ICH,NOT_DERIVED,PM_INST_FROM_L2 >>> +PRESET,PAPI_L3_ICA,NOT_DERIVED,PM_INST_FROM_L2MISS >>> +PRESET,PAPI_L3_ICH,NOT_DERIVED,PM_INST_FROM_L3 >>> +PRESET,PAPI_L3_ICM,NOT_DERIVED,PM_INST_FROM_L3MISS >>> +PRESET,PAPI_FMA_INS,NOT_DERIVED,PM_FMA_CMPL >>> +#PRESET,PAPI_TOT_IIS,NOT_DERIVED, >>> +PRESET,PAPI_TOT_INS,NOT_DERIVED,PM_INST_CMPL >>> +PRESET,PAPI_INT_INS,NOT_DERIVED,PM_FXU_ISSUE >>> +PRESET,PAPI_FP_OPS,NOT_DERIVED,PM_FLOP_CMPL >>> +PRESET,PAPI_FP_INS,NOT_DERIVED,PM_FLOP_CMPL >>> +PRESET,PAPI_DP_OPS,NOT_DERIVED,PM_2FLOP_CMPL >>> +PRESET,PAPI_SP_OPS,NOT_DERIVED,PM_SP_FLOP_CMPL >>> +PRESET,PAPI_TOT_CYC,NOT_DERIVED,PM_RUN_CYC >>> +#PRESET,PAPI_HW_INT,NOT_DERIVED,PM_EXT_INT >>> +PRESET,PAPI_STL_ICY,DERIVED_POSTFIX,N0|N1|- >>> |,PM_RUN_CYC,PM_1PLUS_PPC_DISP >>> +PRESET,PAPI_SR_INS,NOT_DERIVED,PM_ST_FIN >>> +PRESET,PAPI_LD_INS,NOT_DERIVED,PM_LD_REF_L1 >>> +PRESET,PAPI_LST_INS,NOT_DERIVED,PM_LSU_FIN >>> +PRESET,PAPI_LST_INS,DERIVED_ADD,PM_LD_REF_L1,PM_LD_MISS_L1,PM_ST_F >>> IN >>> +PRESET,PAPI_BR_INS,NOT_DERIVED,PM_BR_FIN >>> +PRESET,PAPI_BR_MSP,NOT_DERIVED,PM_BR_MPRED_CMPL >>> +#PRESET,PAPI_BR_PRC,NOT_DERIVED, >>> +PRESET,PAPI_BR_CN,DERIVED_SUB,PM_BR_TAKEN_CMPL,PM_BR_TKN_UNCOND_FI >>> N >>> +PRESET,PAPI_BR_NTK,NOT_DERIVED,PM_BR_MPRED_CMPL >>> +PRESET,PAPI_BR_UCN,NOT_DERIVED,PM_BR_FIN >>> +PRESET,PAPI_BR_TKN,NOT_DERIVED,PM_BR_TAKEN_CMPL >>> +#PRESET,PAPI_FXU_IDL,NOT_DERIVED,PM_FXU_IDLE >>> +# >>> CPU,ultra12 >>> # >>> PRESET,PAPI_TOT_CYC,NOT_DERIVED,CYCLE_CNT >>> PRESET,PAPI_TOT_INS,NOT_DERIVED,INSTR_CNT >>> PRESET,PAPI_L1_ICM,NOT_DERIVED,DISPATCH0_IC_MISS >>> > |
From: Estanislao M. M. <lau...@bs...> - 2022-08-08 13:35:15
|
Dear all, I'm attaching a patch file which includes the MEM_STALL_ANYSTORE event, missing in HiSilicon's Kunpeng. Kind regards, -- Lau |
From: will s. <wil...@vn...> - 2022-08-04 01:57:48
|
On Wed, 2022-08-03 at 21:29 +0200, Giuseppe Congiu wrote: > Hi will, > > How did you define the PAPI preset? > Meaning how did I come up with the values used? :-) Typically I start with the values for the previous processor, and depending on the changes to the current Power PMU event list, do my best to refresh any event entries with their new equivalents. Thanks -Will > —Giuseppe > > > On 3 Aug 2022, at 20:51, will schmidt <wil...@vn...> > > wrote: > > > > [PATCH] PAPI, Power10 event list mappings. > > > > Hi, > > This patch provides the PAPI event > > mappings for Power10 support. > > > > This should be safe to commit once PAPI completes > > the pull requests from libpfm4 that will include > > the prerequisite Power10 content. > > > > > > > > > > diff --git a/src/papi_events.csv b/src/papi_events.csv > > index 4ef647959..d1c89d30b 100644 > > --- a/src/papi_events.csv > > +++ b/src/papi_events.csv > > @@ -1673,10 +1673,59 @@ > > PRESET,PAPI_BR_CN,DERIVED_SUB,PM_BR_CMPL,PM_BR_UNCOND > > PRESET,PAPI_BR_NTK,DERIVED_POSTFIX,N0|N1|- > > |,PM_BR_CMPL,PM_BR_TAKEN_CMPL > > PRESET,PAPI_BR_UCN,NOT_DERIVED,PM_BR_UNCOND > > PRESET,PAPI_BR_TKN,NOT_DERIVED,PM_BR_CORECT_PRED_TAKEN_CMPL > > PRESET,PAPI_FXU_IDL,NOT_DERIVED,PM_FXU_IDLE > > # > > +CPU,POWER10 > > +CPU,power10 > > +# > > +PRESET,PAPI_REF_CYC,NOT_DERIVED,PM_CYC_ALT3 > > +PRESET,PAPI_L1_DCM,NOT_DERIVED,PM_LD_MISS_L1 > > +PRESET,PAPI_L1_LDM,NOT_DERIVED,PM_LD_MISS_L1 > > +PRESET,PAPI_L1_STM,NOT_DERIVED,PM_ST_MISS_L1 > > +PRESET,PAPI_L1_DCW,DERIVED_SUB,PM_ST_FIN,PM_ST_MISS_L1 > > +PRESET,PAPI_L1_DCR,NOT_DERIVED,PM_LD_HIT_L1 > > +PRESET,PAPI_L1_DCA,DERIVED_ADD,PM_LD_REF_L1,PM_ST_CMPL > > +PRESET,PAPI_L2_DCM,NOT_DERIVED,PM_DATA_FROM_L2MISS > > +PRESET,PAPI_L2_LDM,NOT_DERIVED,PM_L2_LD_MISS > > +PRESET,PAPI_L2_STM,NOT_DERIVED,PM_L2_ST_MISS > > +PRESET,PAPI_L2_DCR,NOT_DERIVED,PM_DATA_FROM_L2 > > +PRESET,PAPI_L2_DCW,NOT_DERIVED,PM_L2_ST > > +PRESET,PAPI_L3_DCR,NOT_DERIVED,PM_DATA_FROM_L3 > > +PRESET,PAPI_L3_DCM,NOT_DERIVED,PM_DATA_FROM_L3MISS > > +PRESET,PAPI_L3_LDM,NOT_DERIVED,PM_L3_LD_MISS > > +PRESET,PAPI_L1_ICH,NOT_DERIVED,PM_INST_FROM_LMEM > > +PRESET,PAPI_L1_ICM,NOT_DERIVED,PM_L1_ICACHE_MISS > > +PRESET,PAPI_L2_ICM,NOT_DERIVED,PM_INST_FROM_L3 > > +PRESET,PAPI_L2_ICH,NOT_DERIVED,PM_INST_FROM_L2 > > +PRESET,PAPI_L3_ICA,NOT_DERIVED,PM_INST_FROM_L2MISS > > +PRESET,PAPI_L3_ICH,NOT_DERIVED,PM_INST_FROM_L3 > > +PRESET,PAPI_L3_ICM,NOT_DERIVED,PM_INST_FROM_L3MISS > > +PRESET,PAPI_FMA_INS,NOT_DERIVED,PM_FMA_CMPL > > +#PRESET,PAPI_TOT_IIS,NOT_DERIVED, > > +PRESET,PAPI_TOT_INS,NOT_DERIVED,PM_INST_CMPL > > +PRESET,PAPI_INT_INS,NOT_DERIVED,PM_FXU_ISSUE > > +PRESET,PAPI_FP_OPS,NOT_DERIVED,PM_FLOP_CMPL > > +PRESET,PAPI_FP_INS,NOT_DERIVED,PM_FLOP_CMPL > > +PRESET,PAPI_DP_OPS,NOT_DERIVED,PM_2FLOP_CMPL > > +PRESET,PAPI_SP_OPS,NOT_DERIVED,PM_SP_FLOP_CMPL > > +PRESET,PAPI_TOT_CYC,NOT_DERIVED,PM_RUN_CYC > > +#PRESET,PAPI_HW_INT,NOT_DERIVED,PM_EXT_INT > > +PRESET,PAPI_STL_ICY,DERIVED_POSTFIX,N0|N1|- > > |,PM_RUN_CYC,PM_1PLUS_PPC_DISP > > +PRESET,PAPI_SR_INS,NOT_DERIVED,PM_ST_FIN > > +PRESET,PAPI_LD_INS,NOT_DERIVED,PM_LD_REF_L1 > > +PRESET,PAPI_LST_INS,NOT_DERIVED,PM_LSU_FIN > > +PRESET,PAPI_LST_INS,DERIVED_ADD,PM_LD_REF_L1,PM_LD_MISS_L1,PM_ST_F > > IN > > +PRESET,PAPI_BR_INS,NOT_DERIVED,PM_BR_FIN > > +PRESET,PAPI_BR_MSP,NOT_DERIVED,PM_BR_MPRED_CMPL > > +#PRESET,PAPI_BR_PRC,NOT_DERIVED, > > +PRESET,PAPI_BR_CN,DERIVED_SUB,PM_BR_TAKEN_CMPL,PM_BR_TKN_UNCOND_FI > > N > > +PRESET,PAPI_BR_NTK,NOT_DERIVED,PM_BR_MPRED_CMPL > > +PRESET,PAPI_BR_UCN,NOT_DERIVED,PM_BR_FIN > > +PRESET,PAPI_BR_TKN,NOT_DERIVED,PM_BR_TAKEN_CMPL > > +#PRESET,PAPI_FXU_IDL,NOT_DERIVED,PM_FXU_IDLE > > +# > > CPU,ultra12 > > # > > PRESET,PAPI_TOT_CYC,NOT_DERIVED,CYCLE_CNT > > PRESET,PAPI_TOT_INS,NOT_DERIVED,INSTR_CNT > > PRESET,PAPI_L1_ICM,NOT_DERIVED,DISPATCH0_IC_MISS > > |
From: Giuseppe C. <gc...@ic...> - 2022-08-03 19:58:31
|
Hi will, How did you define the PAPI preset? —Giuseppe > On 3 Aug 2022, at 20:51, will schmidt <wil...@vn...> wrote: > > [PATCH] PAPI, Power10 event list mappings. > > Hi, > This patch provides the PAPI event > mappings for Power10 support. > > This should be safe to commit once PAPI completes > the pull requests from libpfm4 that will include > the prerequisite Power10 content. > > > > > diff --git a/src/papi_events.csv b/src/papi_events.csv > index 4ef647959..d1c89d30b 100644 > --- a/src/papi_events.csv > +++ b/src/papi_events.csv > @@ -1673,10 +1673,59 @@ PRESET,PAPI_BR_CN,DERIVED_SUB,PM_BR_CMPL,PM_BR_UNCOND > PRESET,PAPI_BR_NTK,DERIVED_POSTFIX,N0|N1|-|,PM_BR_CMPL,PM_BR_TAKEN_CMPL > PRESET,PAPI_BR_UCN,NOT_DERIVED,PM_BR_UNCOND > PRESET,PAPI_BR_TKN,NOT_DERIVED,PM_BR_CORECT_PRED_TAKEN_CMPL > PRESET,PAPI_FXU_IDL,NOT_DERIVED,PM_FXU_IDLE > # > +CPU,POWER10 > +CPU,power10 > +# > +PRESET,PAPI_REF_CYC,NOT_DERIVED,PM_CYC_ALT3 > +PRESET,PAPI_L1_DCM,NOT_DERIVED,PM_LD_MISS_L1 > +PRESET,PAPI_L1_LDM,NOT_DERIVED,PM_LD_MISS_L1 > +PRESET,PAPI_L1_STM,NOT_DERIVED,PM_ST_MISS_L1 > +PRESET,PAPI_L1_DCW,DERIVED_SUB,PM_ST_FIN,PM_ST_MISS_L1 > +PRESET,PAPI_L1_DCR,NOT_DERIVED,PM_LD_HIT_L1 > +PRESET,PAPI_L1_DCA,DERIVED_ADD,PM_LD_REF_L1,PM_ST_CMPL > +PRESET,PAPI_L2_DCM,NOT_DERIVED,PM_DATA_FROM_L2MISS > +PRESET,PAPI_L2_LDM,NOT_DERIVED,PM_L2_LD_MISS > +PRESET,PAPI_L2_STM,NOT_DERIVED,PM_L2_ST_MISS > +PRESET,PAPI_L2_DCR,NOT_DERIVED,PM_DATA_FROM_L2 > +PRESET,PAPI_L2_DCW,NOT_DERIVED,PM_L2_ST > +PRESET,PAPI_L3_DCR,NOT_DERIVED,PM_DATA_FROM_L3 > +PRESET,PAPI_L3_DCM,NOT_DERIVED,PM_DATA_FROM_L3MISS > +PRESET,PAPI_L3_LDM,NOT_DERIVED,PM_L3_LD_MISS > +PRESET,PAPI_L1_ICH,NOT_DERIVED,PM_INST_FROM_LMEM > +PRESET,PAPI_L1_ICM,NOT_DERIVED,PM_L1_ICACHE_MISS > +PRESET,PAPI_L2_ICM,NOT_DERIVED,PM_INST_FROM_L3 > +PRESET,PAPI_L2_ICH,NOT_DERIVED,PM_INST_FROM_L2 > +PRESET,PAPI_L3_ICA,NOT_DERIVED,PM_INST_FROM_L2MISS > +PRESET,PAPI_L3_ICH,NOT_DERIVED,PM_INST_FROM_L3 > +PRESET,PAPI_L3_ICM,NOT_DERIVED,PM_INST_FROM_L3MISS > +PRESET,PAPI_FMA_INS,NOT_DERIVED,PM_FMA_CMPL > +#PRESET,PAPI_TOT_IIS,NOT_DERIVED, > +PRESET,PAPI_TOT_INS,NOT_DERIVED,PM_INST_CMPL > +PRESET,PAPI_INT_INS,NOT_DERIVED,PM_FXU_ISSUE > +PRESET,PAPI_FP_OPS,NOT_DERIVED,PM_FLOP_CMPL > +PRESET,PAPI_FP_INS,NOT_DERIVED,PM_FLOP_CMPL > +PRESET,PAPI_DP_OPS,NOT_DERIVED,PM_2FLOP_CMPL > +PRESET,PAPI_SP_OPS,NOT_DERIVED,PM_SP_FLOP_CMPL > +PRESET,PAPI_TOT_CYC,NOT_DERIVED,PM_RUN_CYC > +#PRESET,PAPI_HW_INT,NOT_DERIVED,PM_EXT_INT > +PRESET,PAPI_STL_ICY,DERIVED_POSTFIX,N0|N1|-|,PM_RUN_CYC,PM_1PLUS_PPC_DISP > +PRESET,PAPI_SR_INS,NOT_DERIVED,PM_ST_FIN > +PRESET,PAPI_LD_INS,NOT_DERIVED,PM_LD_REF_L1 > +PRESET,PAPI_LST_INS,NOT_DERIVED,PM_LSU_FIN > +PRESET,PAPI_LST_INS,DERIVED_ADD,PM_LD_REF_L1,PM_LD_MISS_L1,PM_ST_FIN > +PRESET,PAPI_BR_INS,NOT_DERIVED,PM_BR_FIN > +PRESET,PAPI_BR_MSP,NOT_DERIVED,PM_BR_MPRED_CMPL > +#PRESET,PAPI_BR_PRC,NOT_DERIVED, > +PRESET,PAPI_BR_CN,DERIVED_SUB,PM_BR_TAKEN_CMPL,PM_BR_TKN_UNCOND_FIN > +PRESET,PAPI_BR_NTK,NOT_DERIVED,PM_BR_MPRED_CMPL > +PRESET,PAPI_BR_UCN,NOT_DERIVED,PM_BR_FIN > +PRESET,PAPI_BR_TKN,NOT_DERIVED,PM_BR_TAKEN_CMPL > +#PRESET,PAPI_FXU_IDL,NOT_DERIVED,PM_FXU_IDLE > +# > CPU,ultra12 > # > PRESET,PAPI_TOT_CYC,NOT_DERIVED,CYCLE_CNT > PRESET,PAPI_TOT_INS,NOT_DERIVED,INSTR_CNT > PRESET,PAPI_L1_ICM,NOT_DERIVED,DISPATCH0_IC_MISS > |