You can subscribe to this list here.
2003 |
Jan
|
Feb
|
Mar
(58) |
Apr
(261) |
May
(169) |
Jun
(214) |
Jul
(201) |
Aug
(219) |
Sep
(198) |
Oct
(203) |
Nov
(241) |
Dec
(94) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2004 |
Jan
(137) |
Feb
(149) |
Mar
(150) |
Apr
(193) |
May
(95) |
Jun
(173) |
Jul
(137) |
Aug
(236) |
Sep
(157) |
Oct
(150) |
Nov
(136) |
Dec
(90) |
2005 |
Jan
(139) |
Feb
(130) |
Mar
(274) |
Apr
(138) |
May
(184) |
Jun
(152) |
Jul
(261) |
Aug
(409) |
Sep
(239) |
Oct
(241) |
Nov
(260) |
Dec
(137) |
2006 |
Jan
(191) |
Feb
(142) |
Mar
(169) |
Apr
(75) |
May
(141) |
Jun
(169) |
Jul
(131) |
Aug
(141) |
Sep
(192) |
Oct
(176) |
Nov
(142) |
Dec
(95) |
2007 |
Jan
(98) |
Feb
(120) |
Mar
(93) |
Apr
(96) |
May
(95) |
Jun
(65) |
Jul
(62) |
Aug
(56) |
Sep
(53) |
Oct
(95) |
Nov
(106) |
Dec
(87) |
2008 |
Jan
(58) |
Feb
(149) |
Mar
(175) |
Apr
(110) |
May
(106) |
Jun
(72) |
Jul
(55) |
Aug
(89) |
Sep
(26) |
Oct
(96) |
Nov
(83) |
Dec
(93) |
2009 |
Jan
(97) |
Feb
(106) |
Mar
(74) |
Apr
(64) |
May
(115) |
Jun
(83) |
Jul
(137) |
Aug
(103) |
Sep
(56) |
Oct
(59) |
Nov
(61) |
Dec
(37) |
2010 |
Jan
(94) |
Feb
(71) |
Mar
(53) |
Apr
(105) |
May
(79) |
Jun
(111) |
Jul
(110) |
Aug
(81) |
Sep
(50) |
Oct
(82) |
Nov
(49) |
Dec
(21) |
2011 |
Jan
(87) |
Feb
(105) |
Mar
(108) |
Apr
(99) |
May
(91) |
Jun
(94) |
Jul
(114) |
Aug
(77) |
Sep
(58) |
Oct
(58) |
Nov
(131) |
Dec
(62) |
2012 |
Jan
(76) |
Feb
(93) |
Mar
(68) |
Apr
(95) |
May
(62) |
Jun
(109) |
Jul
(90) |
Aug
(87) |
Sep
(49) |
Oct
(54) |
Nov
(66) |
Dec
(84) |
2013 |
Jan
(67) |
Feb
(52) |
Mar
(93) |
Apr
(65) |
May
(33) |
Jun
(34) |
Jul
(52) |
Aug
(42) |
Sep
(52) |
Oct
(48) |
Nov
(66) |
Dec
(14) |
2014 |
Jan
(66) |
Feb
(51) |
Mar
(34) |
Apr
(47) |
May
(58) |
Jun
(27) |
Jul
(52) |
Aug
(41) |
Sep
(78) |
Oct
(30) |
Nov
(28) |
Dec
(26) |
2015 |
Jan
(41) |
Feb
(42) |
Mar
(20) |
Apr
(73) |
May
(31) |
Jun
(48) |
Jul
(23) |
Aug
(55) |
Sep
(36) |
Oct
(47) |
Nov
(48) |
Dec
(41) |
2016 |
Jan
(32) |
Feb
(34) |
Mar
(33) |
Apr
(22) |
May
(14) |
Jun
(31) |
Jul
(29) |
Aug
(41) |
Sep
(17) |
Oct
(27) |
Nov
(38) |
Dec
(28) |
2017 |
Jan
(28) |
Feb
(30) |
Mar
(16) |
Apr
(9) |
May
(27) |
Jun
(57) |
Jul
(28) |
Aug
(43) |
Sep
(31) |
Oct
(20) |
Nov
(24) |
Dec
(18) |
2018 |
Jan
(34) |
Feb
(50) |
Mar
(18) |
Apr
(26) |
May
(13) |
Jun
(31) |
Jul
(13) |
Aug
(11) |
Sep
(15) |
Oct
(12) |
Nov
(18) |
Dec
(13) |
2019 |
Jan
(12) |
Feb
(29) |
Mar
(51) |
Apr
(22) |
May
(13) |
Jun
(20) |
Jul
(13) |
Aug
(12) |
Sep
(21) |
Oct
(6) |
Nov
(9) |
Dec
(5) |
2020 |
Jan
(13) |
Feb
(5) |
Mar
(25) |
Apr
(4) |
May
(40) |
Jun
(27) |
Jul
(5) |
Aug
(17) |
Sep
(21) |
Oct
(1) |
Nov
(5) |
Dec
(15) |
2021 |
Jan
(28) |
Feb
(6) |
Mar
(11) |
Apr
(5) |
May
(7) |
Jun
(8) |
Jul
(5) |
Aug
(5) |
Sep
(11) |
Oct
(9) |
Nov
(10) |
Dec
(12) |
2022 |
Jan
(7) |
Feb
(13) |
Mar
(8) |
Apr
(7) |
May
(12) |
Jun
(27) |
Jul
(14) |
Aug
(27) |
Sep
(27) |
Oct
(17) |
Nov
(17) |
Dec
|
2023 |
Jan
(10) |
Feb
(18) |
Mar
(9) |
Apr
(26) |
May
|
Jun
(13) |
Jul
(18) |
Aug
(5) |
Sep
(12) |
Oct
(16) |
Nov
(1) |
Dec
|
2024 |
Jan
(4) |
Feb
(3) |
Mar
(6) |
Apr
(17) |
May
(2) |
Jun
(33) |
Jul
(13) |
Aug
(1) |
Sep
(6) |
Oct
(8) |
Nov
(6) |
Dec
(15) |
2025 |
Jan
(5) |
Feb
(11) |
Mar
(8) |
Apr
(20) |
May
(1) |
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
From: Philippe W. <phi...@sk...> - 2023-02-08 08:37:59
|
If you are envisaging to modify valgrind, you could take some inspiration from the way callgrind can dynamically activate/de-activate tracing. See callgrind manual command line options and client requests for more details Philippe On Wed, 2023-02-08 at 17:47 +1100, Eliot Moss wrote: > On 2/8/2023 4:10 PM, SAI GOVARDHAN M C PES1UG19EC255PESU ECE Student wrote: > > Hi, > > > > We are students working on memory access analysis, using the Lackey tool in Valgrind. > > Our memory trace results in a large log file, and we need the trace from discrete points of > > execution (between 40-60%). > > Instead of logging completely, and splitting manually, is there a way we can modify the Lackey > > command to pick from a desired point in the execution? > > > > For reference, the command we use is > > $ valgrind --tool=lackey --trace-mem=yes --log-file=/path_to_log ./program > > > > We need to modify this to command to trace from 40-60% of the program > > If you know the approximate number of memory accesses, you could do something > as simple as: > > valgrind ... | tail +n XXX | head -n YYY > > to start after XXX lines of output and stop after producing YYY lines. You > could do something more sophisticated using, say, gawk, to trigger on a > particular address being accessed, e.g., as an instruction fetch. > > This will all slow things down a bit, but might accomplish your goals. > > I'm not claiming there isn't some sophisticated way to tell valgrind when > to start tracing, either. Also, nobody is stopping you from customizing > the tool yourself :-) ... a mere exercise in programming, no? > > Best wishes - EM > > > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users |
From: Eliot M. <mo...@cs...> - 2023-02-08 06:47:29
|
On 2/8/2023 4:10 PM, SAI GOVARDHAN M C PES1UG19EC255PESU ECE Student wrote: > Hi, > > We are students working on memory access analysis, using the Lackey tool in Valgrind. > Our memory trace results in a large log file, and we need the trace from discrete points of > execution (between 40-60%). > Instead of logging completely, and splitting manually, is there a way we can modify the Lackey > command to pick from a desired point in the execution? > > For reference, the command we use is > $ valgrind --tool=lackey --trace-mem=yes --log-file=/path_to_log ./program > > We need to modify this to command to trace from 40-60% of the program If you know the approximate number of memory accesses, you could do something as simple as: valgrind ... | tail +n XXX | head -n YYY to start after XXX lines of output and stop after producing YYY lines. You could do something more sophisticated using, say, gawk, to trigger on a particular address being accessed, e.g., as an instruction fetch. This will all slow things down a bit, but might accomplish your goals. I'm not claiming there isn't some sophisticated way to tell valgrind when to start tracing, either. Also, nobody is stopping you from customizing the tool yourself :-) ... a mere exercise in programming, no? Best wishes - EM |
From: SAI G. M C P. E. S. <sai...@pe...> - 2023-02-08 05:40:47
|
Hi, We are students working on memory access analysis, using the Lackey tool in Valgrind. Our memory trace results in a large log file, and we need the trace from discrete points of execution (between 40-60%). Instead of logging completely, and splitting manually, is there a way we can modify the Lackey command to pick from a desired point in the execution? For reference, the command we use is $ valgrind --tool=lackey --trace-mem=yes --log-file=/path_to_log ./program We need to modify this to command to trace from 40-60% of the program Regards |
From: Eliot M. <mo...@cs...> - 2023-01-29 20:40:00
|
On 1/30/2023 7:08 AM, Ivica B wrote: > Can you please share the instructions on how to do it? > > On Sun, Jan 29, 2023, 9:07 PM Eliot Moss <mo...@cs... <mailto:mo...@cs...>> wrote: > > I have used lackey to get traces, which I have fed into > a cache model to detect conflicts and such. You could > also start with the lackey code and model the cache model > into the tool (which a student of mine did at one point). Lackey is one of the built-in valgrind tools. It has instructions. It produces a trace giving one memory access per line, and indicating if the access is for instruction fetch, memory read, memory write, or both read and write, with the address and size. You write a program to parse that and run your own model of whatever cache you're concerned with. Doing that part is for you to figure out. You do need to know the details of the cache you're going to model. There may be programs or libraries out there for analyzing address traces, but this would not be the list to find them. Sorry, but I'm not prepared to go through how to code a cache model ... EM |
From: Eliot M. <mo...@cs...> - 2023-01-29 20:25:08
|
I have used lackey to get traces, which I have fed into a cache model to detect conflicts and such. You could also start with the lackey code and model the cache model into the tool (which a student of mine did at one point). Regards - Eliot Moss |
From: Ivica B <ibo...@gm...> - 2023-01-29 20:09:20
|
Can you please share the instructions on how to do it? On Sun, Jan 29, 2023, 9:07 PM Eliot Moss <mo...@cs...> wrote: > I have used lackey to get traces, which I have fed into > a cache model to detect conflicts and such. You could > also start with the lackey code and model the cache model > into the tool (which a student of mine did at one point). > > Regards - Eliot Moss > |
From: Ivica B <ibo...@gm...> - 2023-01-29 19:51:15
|
Hi Paul! I read the info you provided, but none of the programs actually support detecting cache conflicts. Performance counters can detect cache misses, similar to cachegrind, but they cannot distinguish between cache misses related to cache conflicts and other cache misses. pahole is a tool with completely different usage, and that is to detect paddings in data structures. This isn't related to cache conflicts in any way. DHAT provides useful information, by allowing you to assess which data is accessed more frequently, but you need additional data to verify that the hot data is not evicted from the cache too soon. On Sun, Jan 29, 2023 at 4:25 PM Paul Floyd <pj...@wa...> wrote: > > > > On 29-01-23 14:31, Ivica B wrote: > > Hi! > > > > I am looking for a tool that can detect cache conflicts, but I am not > > finding any. There are a few that are mostly academic, and thus not > > maintained. I think it is important for the performance analysis > > community to have a tool that to some extent can detect cache > > conflicts. Is it possible to implement support for detecting source > > code lines where cache conflicts occur? More info on cache conflicts > > below. > > [snip] > > I agree that this is an interesting topic. If anyone else has ideas I'm > all ears. > > My recommendations for this are: > > 1/ PMU/PMC (performance monitoring unit/counter) event counting tools > (perf record on Linux, pmcstat on FreeBSD, Oracle Studio collect on > Solaris, don't know for macOS). These can record events such as cache > misses with the associated callstacks. You can then use tools HotSpot > and perfgrind/kcachegrind (I hae used HotSpot but not perfgrind). > > The big advantage of this is that the PMCs are part of the hardware and > the overhead of doing this is minor. The only slight limitation is that > then number of counters is limited. > > 2/ pahole > https://github.com/acmel/dwarves > A really nice binary analysis tool. It will analyze your binary (with > debuginfo) and generate a report for all structures showing holes, > padding and cache lines. It can even generate modified source with > members reordered to improve the packing. However as this is a static > tool working only on the data structures it knows nothing about your > access patterns. > > 3/ DHAT > One of the Valgrind tools. This profiles heap memory. If the block is > less than 1k it will also generate a kind of ascii-html heat map. That > map is an aggregate, but you can usually guess which offsets get hit the > most together. > > Cachegrind doesn't really do this with the kind of accuracy that PMCs > do. It has a reduced model of the cache and has a basic branch > predictor. I don't know if or how speculative execution affects the > cache hit rate, but Valgrind doesn't do any of that. > > A+ > Paul > > > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users |
From: John R. <jr...@bi...> - 2023-01-29 17:25:43
|
On 2023-01-29, Paul Floyd wrote: > My recommendations for this are: > > 1/ PMU/PMC (performance monitoring unit/counter) event counting tools (perf record on Linux, pmcstat on FreeBSD, Oracle Studio collect on Solaris, don't know for macOS). These can record events such as cache misses with the associated callstacks. You can then use tools HotSpot and > perfgrind/kcachegrind (I hae used HotSpot but not perfgrind). > > The big advantage of this is that the PMCs are part of the hardware and the overhead of doing this is minor. The only slight limitation is that then number of counters is limited. Another disadvantage: the hardware does not know which accesses belong to the target code versus which accesses belong to the code of valgrind itself. Even if the hardware could separate accesses on that basis, it does not know about stack frames. Allocating a stack frame shortly after CALL, and discarding it shortly before RETURN, can be significant reasons for cache misses, either immediately or in the near future. Then there are system calls, which might significantly alter cache contents. Sometimes the resulting cache misses should be included (they most certainly do affect wall clock time), but in some other cases you may wish that the operating system was ignored. If the target program uses threads, then using memory for inter-thread communication (semaphore, mutex, pipeline, etc.) becomes another factor. |
From: Paul F. <pj...@wa...> - 2023-01-29 15:24:07
|
On 29-01-23 14:31, Ivica B wrote: > Hi! > > I am looking for a tool that can detect cache conflicts, but I am not > finding any. There are a few that are mostly academic, and thus not > maintained. I think it is important for the performance analysis > community to have a tool that to some extent can detect cache > conflicts. Is it possible to implement support for detecting source > code lines where cache conflicts occur? More info on cache conflicts > below. [snip] I agree that this is an interesting topic. If anyone else has ideas I'm all ears. My recommendations for this are: 1/ PMU/PMC (performance monitoring unit/counter) event counting tools (perf record on Linux, pmcstat on FreeBSD, Oracle Studio collect on Solaris, don't know for macOS). These can record events such as cache misses with the associated callstacks. You can then use tools HotSpot and perfgrind/kcachegrind (I hae used HotSpot but not perfgrind). The big advantage of this is that the PMCs are part of the hardware and the overhead of doing this is minor. The only slight limitation is that then number of counters is limited. 2/ pahole https://github.com/acmel/dwarves A really nice binary analysis tool. It will analyze your binary (with debuginfo) and generate a report for all structures showing holes, padding and cache lines. It can even generate modified source with members reordered to improve the packing. However as this is a static tool working only on the data structures it knows nothing about your access patterns. 3/ DHAT One of the Valgrind tools. This profiles heap memory. If the block is less than 1k it will also generate a kind of ascii-html heat map. That map is an aggregate, but you can usually guess which offsets get hit the most together. Cachegrind doesn't really do this with the kind of accuracy that PMCs do. It has a reduced model of the cache and has a basic branch predictor. I don't know if or how speculative execution affects the cache hit rate, but Valgrind doesn't do any of that. A+ Paul |
From: Ivica B <ibo...@gm...> - 2023-01-29 13:31:25
|
Hi! I am looking for a tool that can detect cache conflicts, but I am not finding any. There are a few that are mostly academic, and thus not maintained. I think it is important for the performance analysis community to have a tool that to some extent can detect cache conflicts. Is it possible to implement support for detecting source code lines where cache conflicts occur? More info on cache conflicts below. === What are cache conflicts? === Cache conflict happens when a cache line is brought up from the memory to the cache, but very soon has to be evicted to the main memory because another cache line is mapped to the same entry. The problem with detecting cache conflicts is that it is normal that one cache line gets evicted because it is replaced by another cache line. Therefore, a cache conflict is an outlier: the cache line spent very little time in the cache before it got evicted. === How to detect cache conflicts? === As I said, there are a few science papers that talk about it. And probably there are a few different approaches on how to do it. One approach is to count the amount of time a cache line has been sitting in cache before it got evicted. For each instruction that causes an eviction, we count what is the amount of time that the evicted cache line spent in the cache. Next we build a statistic. Instructions evicting mostly shortly-lived cache lines are the ones where cache conflicts are most likely to happen. ========================= Please comment! Ivica |
From: Gordon M. <gor...@gm...> - 2023-01-16 22:05:54
|
On 2023-01-16 13:02, Gordon Messmer wrote: > Can anyone suggest why valgrind prints so many loss records for this > particular leak? Well, now I feel very silly, because these loss records are *not* 100% identical, and valgrind is actually reporting that rpmluaNew makes > 100 separate allocations. Sorry for the noise. |
From: Paul F. <pj...@wa...> - 2023-01-16 22:04:24
|
On 16-01-23 22:02, Gordon Messmer wrote: > Can anyone suggest why valgrind prints so many loss records for this > particular leak? Links for the two functions that I mentioned follow, > along with one of the loss records printed by valgrind. In my experience the most likely reason that you are getting a large number of leaks reported by Valgrind is that there is a large number of leaks. You need more stack depth to see all of the stack. Otherwise you can use gdb and put a breakpoint on malloc to confirm the allocations. A+ Paul |
From: Gordon M. <gor...@gm...> - 2023-01-16 21:02:23
|
I'm working on eliminating memory leaks in PackageKit, and I'd like to know more about whether I should suppress one of the results I'm getting. The code in question is dynamically loaded at runtime, but as far as I know, it's only loaded once and unloaded at exit. When I exit packagekitd, after even a very short run, I get one particular stack over a hundred times in valgrind's output. If I got this stack once, then I would conclude that it was a leak I could ignore: memory allocated for global state one time. But because it's reported repeatedly, I'm not sure how to interpret the output. The other reason that I find this very strange is that there are actually two mechanisms that should both individually guarantee that this allocation only happens once. The rpm Lua INITSTATE should only call rpmluaNew if the static variable globalLuaState is null, and libdnf calls rpmReadConfigFiles in a g_once_init_enterblock. Can anyone suggest why valgrind prints so many loss records for this particular leak? Links for the two functions that I mentioned follow, along with one of the loss records printed by valgrind. https://github.com/rpm-software-management/rpm/blob/master/rpmio/rpmlua.c#L93 https://github.com/rpm-software-management/libdnf/blob/dnf-4-master/libdnf/dnf-context.cpp#L400 ==49724== 24 bytes in 1 blocks are possibly lost in loss record 1,247 of 4,550 ==49724== at 0x484378A: malloc (vg_replace_malloc.c:392) ==49724== by 0x484870B: realloc (vg_replace_malloc.c:1451) ==49724== by 0x14F60600: luaM_malloc_ (lmem.c:192) ==49724== by 0x14F6B047: UnknownInlinedFun (ltable.c:490) ==49724== by 0x14F6B047: UnknownInlinedFun (ltable.c:478) ==49724== by 0x14F6B047: luaH_resize (ltable.c:558) ==49724== by 0x14F4CE34: lua_createtable (lapi.c:772) ==49724== by 0x14F68F43: UnknownInlinedFun (loadlib.c:732) ==49724== by 0x14F68F43: luaopen_package (loadlib.c:740) ==49724== by 0x14F5A671: UnknownInlinedFun (ldo.c:507) ==49724== by 0x14F5A671: luaD_precall (ldo.c:573) ==49724== by 0x14F522D7: UnknownInlinedFun (ldo.c:608) ==49724== by 0x14F522D7: UnknownInlinedFun (ldo.c:628) ==49724== by 0x14F522D7: lua_callk (lapi.c:1022) ==49724== by 0x14F5280B: luaL_requiref (lauxlib.c:976) ==49724== by 0x14F5D6E3: luaL_openlibs (linit.c:61) ==49724== by 0x14815163: rpmluaNew (rpmlua.c:128) ==49724== by 0x14815340: UnknownInlinedFun (rpmlua.c:96) ==49724== by 0x14815340: rpmluaGetGlobalState (rpmlua.c:93) ==49724== by 0x14B83E4C: rpmReadConfigFiles (rpmrc.c:1662) ==49724== by 0x146EA173: dnf_context_globals_init (in /usr/lib64/libdnf.so.2) ==49724== by 0x1475B155: ??? (in /usr/lib64/libdnf.so.2) ==49724== by 0x1475B66A: libdnf::getUserAgent(std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&) (in /usr/lib64/libdnf.so.2) ==49724== by 0x1475BC99: libdnf::getUserAgent[abi:cxx11]() (in /usr/lib64/libdnf.so.2) ==49724== by 0x146EC07F: ??? (in /usr/lib64/libdnf.so.2) ==49724== by 0x4A5B0E7: g_type_create_instance (gtype.c:1931) ==49724== by 0x4A40C1F: g_object_new_internal (gobject.c:2228) ==49724== by 0x4A42247: g_object_new_with_properties (gobject.c:2391) ==49724== by 0x4A42FF0: g_object_new (gobject.c:2037) ==49724== by 0x146F2375: dnf_context_new (in /usr/lib64/libdnf.so.2) ==49724== by 0x48616BB: pk_backend_ensure_default_dnf_context (pk-backend-dnf.c:225) ==49724== by 0x486757D: pk_backend_initialize (pk-backend-dnf.c:289) |
From: <569...@qq...> - 2022-11-23 11:52:41
|
I got the reply from the openmp tem, it said like this "The code you have sent should not cause the issue, as you are not doing any memory allocations. The allocation is coming from a data structure that GCC uses internally to keep track of task dependences. It looks like the data structure is allocated when the OpenMP implementation is initialized and it is not released before the program terminates." So the code has no issues. ------------------ Original ------------------ From: "Floyd, Paul" <pj...@wa...>; Date: Wed, Nov 23, 2022 07:49 PM To: "valgrind-users"<val...@li...>; Subject: Re: [Valgrind-users] client program compiled with pie On 17/11/2022 19:22, Mark Roberts wrote: > How do I find the loaded address of a client program that was compiled > with -pie? I.e., how to I map the current execution address - such as > 0x4021151 - to the address in the elf file - such as 0x1193? With -nopie > the two are identical. Hi Do the address space maps that you get when running with -d do what you want? A+ Paul _______________________________________________ Valgrind-users mailing list Val...@li... https://lists.sourceforge.net/lists/listinfo/valgrind-users |
From: Floyd, P. <pj...@wa...> - 2022-11-23 11:49:44
|
On 17/11/2022 19:22, Mark Roberts wrote: > How do I find the loaded address of a client program that was compiled > with -pie? I.e., how to I map the current execution address - such as > 0x4021151 - to the address in the elf file - such as 0x1193? With -nopie > the two are identical. Hi Do the address space maps that you get when running with -d do what you want? A+ Paul |
From: <569...@qq...> - 2022-11-23 00:22:40
|
Hi, The OS is CentOS 7.6 ARM CPU, kunpeng 920 and I have try it on intel 8260 (the same result) gcc 10.2.1 Valgrind-3.16.1 the omp is the default version working with gcc 10.2.1. You need to run the test serial times to get the error. ------------------ Original ------------------ From: "Floyd, Paul" <pj...@wa...>; Date: Tue, Nov 22, 2022 05:02 PM To: "valgrind-users"<val...@li...>; Subject: Re: [Valgrind-users] A weird memory leak with openmp task depend Hi You need to tell us more details Which OS? Which version of Valgrind? What CPU? Which compiler? Which version of OMP? A+ Paul _______________________________________________ Valgrind-users mailing list Val...@li... https://lists.sourceforge.net/lists/listinfo/valgrind-users |
From: Floyd, P. <pj...@wa...> - 2022-11-22 09:03:09
|
Hi You need to tell us more details Which OS? Which version of Valgrind? What CPU? Which compiler? Which version of OMP? A+ Paul |
From: <569...@qq...> - 2022-11-22 00:24:40
|
Dear All, I have a memory leak with the following example. I don't know why, please help me. #include <stdio.h> #include <stdlib.h> #include <omp.h> Code: int main() { int a = 0; int b = 0; int j = 0; for (j = 0; j < 3;j++) { #pragma omp parallel #pragma omp single { #pragma omp task depend(out: b) { printf("task 1=%d\n", a); } #pragma omp task depend(in: b) { printf("task 2=%d\n", a); } } } return 0; } error msg: 136 bytes in 1 bloacks are definitely lost in loss record 3 of 8 by 0x4008CB: main.-omp_fn.0(test.c: 19) Best Gang Chen Sichuan University |
From: Mark R. <ma...@cs...> - 2022-11-17 18:47:00
|
How do I find the loaded address of a client program that was compiled with -pie? I.e., how to I map the current execution address - such as 0x4021151 - to the address in the elf file - such as 0x1193? With -nopie the two are identical. Thank you, Mark |
From: Paul F. <pj...@wa...> - 2022-11-13 17:20:18
|
Hi I've done most of the work to get the pthread stack cache turned off with glibc >= 2.34. (see https://bugs.kde.org/show_bug.cgi?id=444488). This doesn't help with this example, and it looks to me that this is a problem with libc / rtld. A+ Paul |
From: Paul F. <pj...@wa...> - 2022-11-12 12:31:27
|
> Yes, the cache disabling is quite hacky, as mentionnd in the doc: > "Valgrind disables the cache using some internal > knowledge of the glibc stack cache implementation and by > examining the debug information of the pthread > library. This technique is thus somewhat fragile and might > not work for all glibc versions. This has been successfully > tested with various glibc versions (e.g. 2.11, 2.16, 2.18) > on various platforms." > > > > As you indicate, it looks broken on the more recent glibc version you tried. > > Philippe > Indeed. Looks like this: Author: Florian Weimer <fw...@re...> 2021-05-10 10:31:41 Committer: Florian Weimer <fw...@re...> 2021-05-10 10:31:41 Parent: d017b0ab5a181dce4145f3a1b3b27e3341abd201 (elf: Introduce __tls_pre_init_tp) Child: ee07b3a7222746fafc5d5cb2163c9609b81615ef (nptl: Simplify the change_stack_perm calling convention) Branches: master, remotes/origin/arm/morello/main, remotes/origin/arm/morello/v1, remotes/origin/arm/morello/v2, remotes/origin/azanella/bz23960-dirent, remotes/origin/azanella/clang, remotes/origin/codonell/c-utf8, remotes/origin/codonell/ld-audit, remotes/origin/fw/localedef-utf8, remotes/origin/maskray/relr, remotes/origin/maskray/x86-mpx, remotes/origin/master, remotes/origin/nsz/bug23293, remotes/origin/nsz/bug23293-v5, remotes/origin/nsz/bug23293-v6, remotes/origin/release/2.34/master, remotes/origin/release/2.35/master, remotes/origin/release/2.36/master, remotes/origin/siddhesh/realpath-and-getcwd Follows: glibc-2.33.9000 Precedes: glibc-2.34 nptl: Move more stack management variables into _rtld_global Permissions of the cached stacks may have to be updated if an object is loaded that requires executable stacks, so the dynamic loader needs to know about these cached stacks. The move of in_flight_stack and stack_cache_actsize is a requirement for merging __reclaim_stacks into the fork implementation in libc. Tested-by: Carlos O'Donell <ca...@re...> Reviewed-by: Carlos O'Donell <ca...@re...> It looks like "stack_cache_actsize" in libc moved to be _dl_stack_cache_actsize in ld-linux-x86-64.so.2 A+ Paul |
From: Mark W. <ma...@kl...> - 2022-11-12 11:56:49
|
On Sat, Nov 12, 2022 at 12:46:41PM +0100, Philippe Waroquiers wrote: > On Sat, 2022-11-12 at 12:21 +0100, Paul Floyd wrote: > > So my conclusion is that there are two problems > > 1. Some cleanup code missing in __libc_freeres that is causing this leak > > (libc problem) > > 2. no-stackcache not working. This is more a Valgrind problem, but it > > does rely on twiddling libc internals, so it's not too surprising that > > it breaks. That needs work on the Valgrind side. > Yes, the cache disabling is quite hacky, as mentionnd in the doc: > "Valgrind disables the cache using some internal > knowledge of the glibc stack cache implementation and by > examining the debug information of the pthread > library. This technique is thus somewhat fragile and might > not work for all glibc versions. This has been successfully > tested with various glibc versions (e.g. 2.11, 2.16, 2.18) > on various platforms." > > As you indicate, it looks broken on the more recent glibc version you tried. This is https://bugs.kde.org/show_bug.cgi?id=444488 Use glibc.pthread.stack_cache_size tunable Since glibc 2.34 the internal/private stack_cache_maxsize variable isn't available anymore, which causes "sched WARNING: pthread stack cache cannot be disabled!" when the simhint no_nptl_pthread_stackcache is set (e.g. in helgrind/tests/tls_threads.vgtest) Cheers, Mark |
From: Philippe W. <phi...@sk...> - 2022-11-12 11:47:06
|
On Sat, 2022-11-12 at 12:21 +0100, Paul Floyd wrote: > Philiipe wrote: > > Possibly --sim-hints=no-nptl-pthread-stackcache might help (if I > > re-read the manual entry for this sim-hint). > > > As the manpage says, the pthread stackcache stuff is mainly for Helgrind. .... > > I don't see how this would affect a leak though. This sim-hint also influences memcheck behaviour related to __thread (i.e. tls) variables. Here is the extract of the doc: "When using the memcheck tool, disabling the cache ensures the memory used by glibc to handle __thread variables is directly released when a thread terminates." (at least that was likely true in 2014, when the above was written). > > I did some tests to check that __libc_freeres is being called (and it is > being called). > > So my conclusion is that there are two problems > 1. Some cleanup code missing in __libc_freeres that is causing this leak > (libc problem) > 2. no-stackcache not working. This is more a Valgrind problem, but it > does rely on twiddling libc internals, so it's not too surprising that > it breaks. That needs work on the Valgrind side. Yes, the cache disabling is quite hacky, as mentionnd in the doc: "Valgrind disables the cache using some internal knowledge of the glibc stack cache implementation and by examining the debug information of the pthread library. This technique is thus somewhat fragile and might not work for all glibc versions. This has been successfully tested with various glibc versions (e.g. 2.11, 2.16, 2.18) on various platforms." As you indicate, it looks broken on the more recent glibc version you tried. Philippe |
From: Paul F. <pj...@wa...> - 2022-11-12 11:21:33
|
On 11/12/22 01:46, John Reiser wrote: > It's a bug (or implementation constraint) in glibc timer. > When I run it under valgrind-3.19.0 with glibc-debuginfo and > glibc-debugsource installed (2.35-17.fc36.x86_64): > [Notice the annotation "LOOK HERE"] > ==281161== Command: ./a.out > ==281161== > --281161:0: sched WARNING: pthread stack cache cannot be disabled! > <<<<< LOOK HERE <<<<< And also Philiipe wrote: > Possibly --sim-hints=no-nptl-pthread-stackcache might help (if I > re-read the manual entry for this sim-hint). As the manpage says, the pthread stackcache stuff is mainly for Helgrind. What the code does is use debuginfo to find the GNU libc variable that describes the size of the stack cache, and forces it to be some large value. That causes libthead to think that the cache is full (when it is still really empty) and not use the cache. That means that every time a thread gets created a new stack will get allocated rather than allocated and recycled in the cache. The caching causes problems with Helgrind for applications using thread local storage in sequences like write to TLS var on thread 2 thread 2 exit thread 3 created recycles thread2's TLS read from TLS var on thread 3 Helgrind just sees unprotected reads and writes from the same address without knowing that it isn't the same variable. This test is currently failing for me (Fedora 36 amd64): paulf> perl tests/vg_regtest helgrind/tests/tls_threads tls_threads: valgrind -q --sim-hints=no-nptl-pthread-stackcache ./tls_threads *** tls_threads failed (stderr) *** (More details here https://github.com/paulfloyd/freebsd_valgrind/issues/113 since I've looked into how to implement something similar for FreeBSD). I don't see how this would affect a leak though. I did some tests to check that __libc_freeres is being called (and it is being called). So my conclusion is that there are two problems 1. Some cleanup code missing in __libc_freeres that is causing this leak (libc problem) 2. no-stackcache not working. This is more a Valgrind problem, but it does rely on twiddling libc internals, so it's not too surprising that it breaks. That needs work on the Valgrind side. FWIW on FreeBSD (no stack cache disable or libc freeres) I also get a bunch of leaks that I need to suppress. A+ Paul |
From: Domenico P. <pan...@gm...> - 2022-11-12 07:22:04
|
> [[snip horrible formatting]] It looks so good. Probably your email client messed up it. Thanks so much for answer. Good job, Domenico Il 12/11/22 01:46, John Reiser ha scritto: > On 11/11/22 13:23, Domenico Panella wrote: >> Operating System: Slackware 15.0 (Current) Kernel Version: 5.19.17 >> (64-bit) Graphics Platform: X11 Processors: 8 × Intel® Core™ i7-8565U >> CPU @ 1.80GHz >> >> A small example: >> >> #include<stdbool.h> > [[snip horrible formatting]] >> > > It's a bug (or implementation constraint) in glibc timer. > > When I run it under valgrind-3.19.0 with glibc-debuginfo and > glibc-debugsource > installed (2.35-17.fc36.x86_64): [Notice the annotation "LOOK HERE"] > ===== > $ valgrind --leak-check=full --sim-hints=no-nptl-pthread-stackcache > ./a.out > ==281161== Memcheck, a memory error detector > ==281161== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et > al. > ==281161== Using Valgrind-3.19.0 and LibVEX; rerun with -h for > copyright info > ==281161== Command: ./a.out > ==281161== > --281161:0: sched WARNING: pthread stack cache cannot be disabled! > <<<<< LOOK HERE <<<<< > ==281161== > ==281161== HEAP SUMMARY: > ==281161== in use at exit: 272 bytes in 1 blocks > ==281161== total heap usage: 3 allocs, 2 frees, 512 bytes allocated > ==281161== > ==281161== 272 bytes in 1 blocks are possibly lost in loss record 1 of 1 > ==281161== at 0x484A464: calloc (vg_replace_malloc.c:1328) > ==281161== by 0x4012E42: UnknownInlinedFun (rtld-malloc.h:44) > ==281161== by 0x4012E42: allocate_dtv (dl-tls.c:375) > ==281161== by 0x4013841: _dl_allocate_tls (dl-tls.c:634) > ==281161== by 0x48F5A98: allocate_stack (allocatestack.c:428) > ==281161== by 0x48F5A98: pthread_create@@GLIBC_2.34 > (pthread_create.c:647) > ==281161== by 0x4900864: __timer_start_helper_thread > (timer_routines.c:147) > ==281161== by 0x48F9E36: __pthread_once_slow (pthread_once.c:116) > ==281161== by 0x49002CA: timer_create@@GLIBC_2.34 (timer_create.c:70) > ==281161== by 0x4011E2: main (timer.c:40) > ==281161== > ==281161== LEAK SUMMARY: > ==281161== definitely lost: 0 bytes in 0 blocks > ==281161== indirectly lost: 0 bytes in 0 blocks > ==281161== possibly lost: 272 bytes in 1 blocks > ==281161== still reachable: 0 bytes in 0 blocks > ==281161== suppressed: 0 bytes in 0 blocks > ==281161== > ==281161== For lists of detected and suppressed errors, rerun with: -s > ==281161== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) > ===== > > > > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users |