You can subscribe to this list here.
| 2003 |
Jan
|
Feb
|
Mar
(58) |
Apr
(261) |
May
(169) |
Jun
(214) |
Jul
(201) |
Aug
(219) |
Sep
(198) |
Oct
(203) |
Nov
(241) |
Dec
(94) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2004 |
Jan
(137) |
Feb
(149) |
Mar
(150) |
Apr
(193) |
May
(95) |
Jun
(173) |
Jul
(137) |
Aug
(236) |
Sep
(157) |
Oct
(150) |
Nov
(136) |
Dec
(90) |
| 2005 |
Jan
(139) |
Feb
(130) |
Mar
(274) |
Apr
(138) |
May
(184) |
Jun
(152) |
Jul
(261) |
Aug
(409) |
Sep
(239) |
Oct
(241) |
Nov
(260) |
Dec
(137) |
| 2006 |
Jan
(191) |
Feb
(142) |
Mar
(169) |
Apr
(75) |
May
(141) |
Jun
(169) |
Jul
(131) |
Aug
(141) |
Sep
(192) |
Oct
(176) |
Nov
(142) |
Dec
(95) |
| 2007 |
Jan
(98) |
Feb
(120) |
Mar
(93) |
Apr
(96) |
May
(95) |
Jun
(65) |
Jul
(62) |
Aug
(56) |
Sep
(53) |
Oct
(95) |
Nov
(106) |
Dec
(87) |
| 2008 |
Jan
(58) |
Feb
(149) |
Mar
(175) |
Apr
(110) |
May
(106) |
Jun
(72) |
Jul
(55) |
Aug
(89) |
Sep
(26) |
Oct
(96) |
Nov
(83) |
Dec
(93) |
| 2009 |
Jan
(97) |
Feb
(106) |
Mar
(74) |
Apr
(64) |
May
(115) |
Jun
(83) |
Jul
(137) |
Aug
(103) |
Sep
(56) |
Oct
(59) |
Nov
(61) |
Dec
(37) |
| 2010 |
Jan
(94) |
Feb
(71) |
Mar
(53) |
Apr
(105) |
May
(79) |
Jun
(111) |
Jul
(110) |
Aug
(81) |
Sep
(50) |
Oct
(82) |
Nov
(49) |
Dec
(21) |
| 2011 |
Jan
(87) |
Feb
(105) |
Mar
(108) |
Apr
(99) |
May
(91) |
Jun
(94) |
Jul
(114) |
Aug
(77) |
Sep
(58) |
Oct
(58) |
Nov
(131) |
Dec
(62) |
| 2012 |
Jan
(76) |
Feb
(93) |
Mar
(68) |
Apr
(95) |
May
(62) |
Jun
(109) |
Jul
(90) |
Aug
(87) |
Sep
(49) |
Oct
(54) |
Nov
(66) |
Dec
(84) |
| 2013 |
Jan
(67) |
Feb
(52) |
Mar
(93) |
Apr
(65) |
May
(33) |
Jun
(34) |
Jul
(52) |
Aug
(42) |
Sep
(52) |
Oct
(48) |
Nov
(66) |
Dec
(14) |
| 2014 |
Jan
(66) |
Feb
(51) |
Mar
(34) |
Apr
(47) |
May
(58) |
Jun
(27) |
Jul
(52) |
Aug
(41) |
Sep
(78) |
Oct
(30) |
Nov
(28) |
Dec
(26) |
| 2015 |
Jan
(41) |
Feb
(42) |
Mar
(20) |
Apr
(73) |
May
(31) |
Jun
(48) |
Jul
(23) |
Aug
(55) |
Sep
(36) |
Oct
(47) |
Nov
(48) |
Dec
(41) |
| 2016 |
Jan
(32) |
Feb
(34) |
Mar
(33) |
Apr
(22) |
May
(14) |
Jun
(31) |
Jul
(29) |
Aug
(41) |
Sep
(17) |
Oct
(27) |
Nov
(38) |
Dec
(28) |
| 2017 |
Jan
(28) |
Feb
(30) |
Mar
(16) |
Apr
(9) |
May
(27) |
Jun
(57) |
Jul
(28) |
Aug
(43) |
Sep
(31) |
Oct
(20) |
Nov
(24) |
Dec
(18) |
| 2018 |
Jan
(34) |
Feb
(50) |
Mar
(18) |
Apr
(26) |
May
(13) |
Jun
(31) |
Jul
(13) |
Aug
(11) |
Sep
(15) |
Oct
(12) |
Nov
(18) |
Dec
(13) |
| 2019 |
Jan
(12) |
Feb
(29) |
Mar
(51) |
Apr
(22) |
May
(13) |
Jun
(20) |
Jul
(13) |
Aug
(12) |
Sep
(21) |
Oct
(6) |
Nov
(9) |
Dec
(5) |
| 2020 |
Jan
(13) |
Feb
(5) |
Mar
(25) |
Apr
(4) |
May
(40) |
Jun
(27) |
Jul
(5) |
Aug
(17) |
Sep
(21) |
Oct
(1) |
Nov
(5) |
Dec
(15) |
| 2021 |
Jan
(28) |
Feb
(6) |
Mar
(11) |
Apr
(5) |
May
(7) |
Jun
(8) |
Jul
(5) |
Aug
(5) |
Sep
(11) |
Oct
(9) |
Nov
(10) |
Dec
(12) |
| 2022 |
Jan
(7) |
Feb
(13) |
Mar
(8) |
Apr
(7) |
May
(12) |
Jun
(27) |
Jul
(14) |
Aug
(27) |
Sep
(27) |
Oct
(17) |
Nov
(17) |
Dec
|
| 2023 |
Jan
(10) |
Feb
(18) |
Mar
(9) |
Apr
(26) |
May
|
Jun
(13) |
Jul
(18) |
Aug
(5) |
Sep
(12) |
Oct
(16) |
Nov
(1) |
Dec
|
| 2024 |
Jan
(4) |
Feb
(3) |
Mar
(6) |
Apr
(17) |
May
(2) |
Jun
(33) |
Jul
(13) |
Aug
(1) |
Sep
(6) |
Oct
(8) |
Nov
(6) |
Dec
(15) |
| 2025 |
Jan
(5) |
Feb
(11) |
Mar
(8) |
Apr
(20) |
May
(1) |
Jun
|
Jul
|
Aug
(9) |
Sep
(1) |
Oct
(7) |
Nov
(1) |
Dec
|
|
From: John R. <jr...@bi...> - 2022-09-12 15:58:53
|
> OK, but why would that break core files only with valgrind? Because when ran directly, the core files work perfectly fine. [Rhetorical] Why are there bugs? [Practical] The operating system itself is the writer of ordinary core files, which contain process state: register values, copies of Writable pages, partial information from Read-only pages, etc. Valgrind is an in-process emulator. As far as the OS is concerned, the process is valgrind, not postgresql. The register values are those of the valgrind emulator internal code, not of the target program that valgrind is emulating. In order for the core file to look like it was generated for postgresql, then valgrind must write the core file. The spec for the layout of a core file (the C-language 'struct' that corresponds to the sequence of bytes in the file) is rife with opportunities for bugs. First, the spec is hard to find, or may refer to other documents that are hard to access. (What _exactly_ is the entire programmer- visible register state?) Then the spec is not executable (directly compilable). Often the spec or the C_language 'struct' is not updated in timely manner when the hardware or the OS changes. In practice it is very easy for there to be a discrepancy involving the presence, order, width, or alignment of various fields, especially for condition codes, processor modes (32 or 64 bit?), and optional register files or accelerators (floating point, SIMD, vector units, etc.) > > ... attached is a simple .c file, with a trivial example (3 functions) and segfault (or abort). ... > The core file produced without valgrind is perfectly fine: > > $ gdb ./a.out core > ... > Core was generated by `./a.out'. > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x0000005594350734 in f3 () at valgrind-core-test.c:6 > 6 *ptr = 'a'; > (gdb) bt > #0 0x0000005594350734 in f3 () at valgrind-core-test.c:6 > #1 0x0000005594350750 in f2 () at valgrind-core-test.c:13 > #2 0x0000005594350768 in f1 () at valgrind-core-test.c:18 > #3 0x0000005594350780 in main () at valgrind-core-test.c:23 > > but when run under valgrind it looks like this: > > $ gdb ./a.out vgcore.1395835 > ... > Core was generated by `'. > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x0000000000108734 in ?? () > (gdb) bt > #0 0x0000000000108734 in ?? () > #1 0x0000000000108780 in ?? () > #2 0x0000000000108644 in ?? () > > However, when I do this on x86 (Fedora 34, gcc 11.3.1, valgrind 3.18.1) it works just fine and I get the same backtrace. > > So perhaps this is specific to (either) gcc 10.2, or aarch64 platform. Bingo! You now have the raw material for a very good bug report: "valgrind-generated core files lose debugging info on aarch64". Please file a bug report; see https://valgrind.org/support/bug_reports.html (Also note that several values for program counter in the two tracebacks agree in the lowest 12 bits (3 hex digits). So this may be some confusion about the placement ("relocation") of [groups of] whole pages in the address space.) |
|
From: Tomas V. <tv...@fu...> - 2022-09-12 10:08:32
|
On 9/9/22 18:20, John Reiser wrote: > [[Aggressive snipping, but relevant details preserved.]] > >> No threading is used. Postgres is multi-process, and uses shared >> memory for the shared cache (through shm_open etc.). > > Multi-process plus shm_open() IS THREADING! Not pthreads, but multiple > execution contexts that read and write the same memory, which is > subject to the same types of synchronization errors as pthreads. > Perhaps --tool=drd and --tool=helgrind can help. > OK, but why would that break core files only with valgrind? Because when ran directly, the core files work perfectly fine. > > [[Another topic]] >> Sure, but that's more of a workaround - it does not make the core file >> useful, it provides alternative way to get to the same result. Plus it >> requires additional tooling/scripting, and I'd prefer keeping the >> tooling as simple as possible. > > I made a specific suggestion that takes less than one hour: build a > small test case > that performs a short chain of subroutine calls, with the last routine > generating > a deliberate SIGABRT. Run the test case under valgrind, get a core file > from valgrind, > and see if gdb gives the correct traceback from that core file. The > objective > is to provide a strong clue about whether *every* core file generated by > valgrind > (in your environment) fails to work well with gdb. Perhaps solving the > problem > that involves your larger and more-complex case can be subsumed by > analyzing > something that is much simpler. > > Please perform that experiment and report the results here. > I did this experiment - attached is a simple .c file, with a trivial example (3 functions) and segfault (or abort). When built like this: $ gcc valgrind-core-test.c -O0 -g then the core file produced without valgrind is perfectly fine: $ gdb ./a.out core ... Core was generated by `./a.out'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000005594350734 in f3 () at valgrind-core-test.c:6 6 *ptr = 'a'; (gdb) bt #0 0x0000005594350734 in f3 () at valgrind-core-test.c:6 #1 0x0000005594350750 in f2 () at valgrind-core-test.c:13 #2 0x0000005594350768 in f1 () at valgrind-core-test.c:18 #3 0x0000005594350780 in main () at valgrind-core-test.c:23 but when run under valgrind it looks like this: $ gdb ./a.out vgcore.1395835 ... Core was generated by `'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000000000108734 in ?? () (gdb) bt #0 0x0000000000108734 in ?? () #1 0x0000000000108780 in ?? () #2 0x0000000000108644 in ?? () However, when I do this on x86 (Fedora 34, gcc 11.3.1, valgrind 3.18.1) it works just fine and I get the same backtrace. So perhaps this is specific to (either) gcc 10.2, or aarch64 platform. regards Tomas |
|
From: John R. <jr...@bi...> - 2022-09-09 16:20:24
|
[[Aggressive snipping, but relevant details preserved.]] > No threading is used. Postgres is multi-process, and uses shared memory for the shared cache (through shm_open etc.). Multi-process plus shm_open() IS THREADING! Not pthreads, but multiple execution contexts that read and write the same memory, which is subject to the same types of synchronization errors as pthreads. Perhaps --tool=drd and --tool=helgrind can help. [[Another topic]] > Sure, but that's more of a workaround - it does not make the core file useful, it provides alternative way to get to the same result. Plus it requires additional tooling/scripting, and I'd prefer keeping the tooling as simple as possible. I made a specific suggestion that takes less than one hour: build a small test case that performs a short chain of subroutine calls, with the last routine generating a deliberate SIGABRT. Run the test case under valgrind, get a core file from valgrind, and see if gdb gives the correct traceback from that core file. The objective is to provide a strong clue about whether *every* core file generated by valgrind (in your environment) fails to work well with gdb. Perhaps solving the problem that involves your larger and more-complex case can be subsumed by analyzing something that is much simpler. Please perform that experiment and report the results here. |
|
From: Tomas V. <tv...@fu...> - 2022-09-09 14:26:28
|
On 9/9/22 04:58, John Reiser wrote:
>>> 1. Describe the environment completely.
>
> Also: Any kind of threading (pthreads, or shm_open, or
> mmap(,,,MAP_SHARED,,))
> must be mentioned explicitly. Multiple execution contexts which access
> the same address space instance are a significant complicating factor.
>
> If threading is involved, then try using "valgrind --tool=drd ..."
> or --tool=helgrind, because those tools specifically target detecting
> race conditions and other synchronization errors, much like --tool=memcheck
> [the default tool when no --tool= is mentioned] targets errors involving
> malloc() and free(), uninitialized variables, etc.
>
No threading is used. Postgres is multi-process, and uses shared memory
for the shared cache (through shm_open etc.). FWIW, as I mentioned
before, this works perfectly fine when the core is not generated by
valgrind.
>>> 4. Walk before attempting to run.
>>> Did you try a simple example? Write a half-page program with 5
>>> subroutines,
>>> each of which calls the next one, and the last one sends SIGABRT to
>>> the process.
>
>>> Does the .core file when run under valgrind give the correct
>>> traceback using gdb?
>
> Specifically: apply valgrind to the small program which causes a
> deliberate SIGABRT,
> and get a core file. Does gdb give the correct traceback for that core
> file?
> If not, then you have an ideal test case for filing a bug report against
> valgrind
> because even the simple core file is bad. If gdb does give a correct
> traceback
> for the simple core file, then you have to keep looking for the source
> of the
> problem on your larger program.
>
I'll try this once I have access to the machine early next week.
>
>>> 5. (Learn and) Use the built-in tools where possible.
>>> Run the process interactively, invoking valgrind with "--vgdb-error=0",
>>> and giving the debugger command "(gdb) continue" after establishing
>>> connectivity between vgdb and the process.
>>> See the valgrind manual, section 3.2.9 "vgdb command line options".
>>> When the SIGABRT happens, then vgdb will allow you to use all the
>>> ordinary
>>> gdb commands to get a backtrace, go up and down the stack, examine
>>> variables and other memory, run
>>> (gdb) info proc
>>> (gdb) shell cat /proc/$PID/maps
>>> to see exactly the layout of process memory, etc.
>>> There are also special commands to access valgrind functionality
>>> interactively, such as checking for memory leaks.
>>>
>>
>> I already explained why I don't want / can't use the interactive gdb.
>> I'm aware of the option, I've used it before, but in this case it's
>> not very practical.
>
> The gdb process does not *have* to be run interactively, it just takes
> more work
> and patience to run non-interactively. Run "valgrind --vgdb-error=0 ..."
> and notice the last part of the printed instructions:
>
> and then give GDB the following command
> ==215935== target remote |
> /path/to/libexec/valgrind/../../bin/vgdb --pid=215935
> ==215935== --pid is optional if only one valgrind process is running
>
> So if there is only one valgrind process, then you do not need to know
> the pid.
> Thus you can run gdb with re-directed stdin/stdout/stderr, or perhaps
> use the -x
> command-line option. This allows a static, pre-scripted list of gdb
> commands;
> it may require a few iterations to get a good debug script. (Try the
> commands
> using the trivial SIGABRT case!) Also get the full gdb manual (more
> than 800 pages)
> and look at the "thread apply all ..." and "frame apply all ..." commands.
>
Sure, but that's more of a workaround - it does not make the core file
useful, it provides alternative way to get to the same result. Plus it
requires additional tooling/scripting, and I'd prefer keeping the
tooling as simple as possible.
Postgres is a multi-process system, that runs a bunch of management
processes, and client processes (1:1 to connections). We don't know in
which one an issue might happen, so we'd have to attach a script to each
of them.
Furthermore, there's the question of performance - we run these tests on
many machines (although only some of them run them under valgrind), the
valgrind makes it fairly slow already - if this vgdb thing makes even
slower, that'd be an issue. But I haven't measured it, so maybe it's not
as bad as I'm afraid.
> It may be possible to perform some interactive "reconnaisance" to suggest
> good things for the script to try. Using --vgdb-error=0, put a breakpoint
> on a likely location for the error (or shortly before the error),
> and look around. In the logged traceback:
>
> TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File:
> "reorderbuffer.c", Line: 902, PID: 536049)
> (ExceptionalCondition+0x98)[0x8f5cec]
> (+0x57a574)[0x682574]
> (+0x579edc)[0x681edc]
> (ReorderBufferAddNewTupleCids+0x60)[0x6864dc]
> (SnapBuildProcessNewCid+0x94)[0x68b6a4]
>
> any of those named locations, or shortly before them, might be a good spot.
> When execution stops at any one of the breakpoints, then look around
> and see if you can find clues about "prev_first_lsn < cur_txn->first_lsn"
> even though the error has not yet occurred. Perhaps this will help
> identify location(s) that might be closer to the actual error
> when it does happen. This might suggest commands for the non-interactive
> gdb debugging script.
>
This does not work, I'm afraid. The issue is a (rare) race condition,
and we run the assert thousands of times and it's fine 99.999% of the
time. The breakpoint & interactive reconnaissance is unlikely to find
anything 99% of the time, and it can easily make the race condition go
away by changing the timing. That's kinda the interesting thing - this
is not an issue valgrind is meant to discover, it's just that it seems
to change the timing just enough to increase the probability.
regards
Tomas
|
|
From: John R. <jr...@bi...> - 2022-09-09 02:59:02
|
>> 1. Describe the environment completely.
Also: Any kind of threading (pthreads, or shm_open, or mmap(,,,MAP_SHARED,,))
must be mentioned explicitly. Multiple execution contexts which access
the same address space instance are a significant complicating factor.
If threading is involved, then try using "valgrind --tool=drd ..."
or --tool=helgrind, because those tools specifically target detecting
race conditions and other synchronization errors, much like --tool=memcheck
[the default tool when no --tool= is mentioned] targets errors involving
malloc() and free(), uninitialized variables, etc.
>> 4. Walk before attempting to run.
>> Did you try a simple example? Write a half-page program with 5 subroutines,
>> each of which calls the next one, and the last one sends SIGABRT to the process.
>> Does the .core file when run under valgrind give the correct traceback using gdb?
Specifically: apply valgrind to the small program which causes a deliberate SIGABRT,
and get a core file. Does gdb give the correct traceback for that core file?
If not, then you have an ideal test case for filing a bug report against valgrind
because even the simple core file is bad. If gdb does give a correct traceback
for the simple core file, then you have to keep looking for the source of the
problem on your larger program.
>> 5. (Learn and) Use the built-in tools where possible.
>> Run the process interactively, invoking valgrind with "--vgdb-error=0",
>> and giving the debugger command "(gdb) continue" after establishing
>> connectivity between vgdb and the process.
>> See the valgrind manual, section 3.2.9 "vgdb command line options".
>> When the SIGABRT happens, then vgdb will allow you to use all the ordinary
>> gdb commands to get a backtrace, go up and down the stack, examine
>> variables and other memory, run
>> (gdb) info proc
>> (gdb) shell cat /proc/$PID/maps
>> to see exactly the layout of process memory, etc.
>> There are also special commands to access valgrind functionality
>> interactively, such as checking for memory leaks.
>>
>
> I already explained why I don't want / can't use the interactive gdb. I'm aware of the option, I've used it before, but in this case it's not very practical.
The gdb process does not *have* to be run interactively, it just takes more work
and patience to run non-interactively. Run "valgrind --vgdb-error=0 ..."
and notice the last part of the printed instructions:
and then give GDB the following command
==215935== target remote | /path/to/libexec/valgrind/../../bin/vgdb --pid=215935
==215935== --pid is optional if only one valgrind process is running
So if there is only one valgrind process, then you do not need to know the pid.
Thus you can run gdb with re-directed stdin/stdout/stderr, or perhaps use the -x
command-line option. This allows a static, pre-scripted list of gdb commands;
it may require a few iterations to get a good debug script. (Try the commands
using the trivial SIGABRT case!) Also get the full gdb manual (more than 800 pages)
and look at the "thread apply all ..." and "frame apply all ..." commands.
It may be possible to perform some interactive "reconnaisance" to suggest
good things for the script to try. Using --vgdb-error=0, put a breakpoint
on a likely location for the error (or shortly before the error),
and look around. In the logged traceback:
TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 902, PID: 536049)
(ExceptionalCondition+0x98)[0x8f5cec]
(+0x57a574)[0x682574]
(+0x579edc)[0x681edc]
(ReorderBufferAddNewTupleCids+0x60)[0x6864dc]
(SnapBuildProcessNewCid+0x94)[0x68b6a4]
any of those named locations, or shortly before them, might be a good spot.
When execution stops at any one of the breakpoints, then look around
and see if you can find clues about "prev_first_lsn < cur_txn->first_lsn"
even though the error has not yet occurred. Perhaps this will help
identify location(s) that might be closer to the actual error
when it does happen. This might suggest commands for the non-interactive
gdb debugging script.
|
|
From: Paul F. <pj...@wa...> - 2022-09-08 21:25:20
|
<div dir='auto'><br><div class="gmail_extra" dir="auto"><br><div class="gmail_quote">On 8 Sept 2022 15:27, Shane Bishop <sha...@ou...> wrote:<br type="attribution"><blockquote class="quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"> <div style="font-family:'calibri' , 'helvetica' , sans-serif;font-size:12pt;color:rgb( 0 , 0 , 0 );background-color:rgb( 255 , 255 , 255 )"> Hi,</div> <div style="font-family:'calibri' , 'helvetica' , sans-serif;font-size:12pt;color:rgb( 0 , 0 , 0 );background-color:rgb( 255 , 255 , 255 )"> <br> </div> <div style="font-family:'calibri' , 'helvetica' , sans-serif;font-size:12pt;color:rgb( 0 , 0 , 0 );background-color:rgb( 255 , 255 , 255 )"> I am trying to compile Valgrind 3.19.0 on Solaris </div> <div style="font-family:'calibri' , 'helvetica' , sans-serif;font-size:12pt;color:rgb( 0 , 0 , 0 );background-color:rgb( 255 , 255 , 255 )"> <br> </div> <div style="font-family:'calibri' , 'helvetica' , sans-serif;font-size:12pt;color:rgb( 0 , 0 , 0 );background-color:rgb( 255 , 255 , 255 )"> Is there an earlier release of Valgrind that is known to successfully compile on Solaris 11 that I could try building instead?</div></div></blockquote></div></div><div dir="auto"><br></div><div dir="auto">If you have Oracle support I believe that they have a version available. Otherwise I think that there is an Oracle GitHub repo with either a fork or patches.</div><div dir="auto"><br></div><div dir="auto">I can probably provide more info next week.</div><div dir="auto"><br></div><div dir="auto">A+</div><div dir="auto">Paul</div><div class="gmail_extra" dir="auto"></div></div> |
|
From: Tomas V. <tv...@fu...> - 2022-09-08 17:33:42
|
On 9/4/22 04:16, John Reiser wrote:
>> Any ideas what I might be doing wrong? Or how do I load the core file?
>
> Why does use of valgrind cause programmers to forget general debugging
> technique?
>
> 1. Describe the environment completely.
> The report does not say which compilers and compiler versions were used,
> or if the compiler commands contained any directives about debugging
> format.
> Such information is necessary to help understand what might be happening
> with regard to debugging and tracebacks.
>
Yes, I should have included this information - I don't have access to
the machine at the moment, but I'll share the detailed info early next week.
However, it's running current RasberryPI OS 64-bit version, which is
based on Debian 11. So it should have the same version of gcc etc.
> 2. Get debugging information whenever invoking a compiler.
> Traceback lines such as "(+0x57a574)[0x682574]" which lack the name
> of a symbol or file, suggest that "-g" debugging info was not requested
> for *all* compilations. Start over ("make clean; rm -rf '*.[oa]'")
> then re-compile every source file, making be sure to specify "-g"
> and no variant of "-O" or "-On", except possibly "-O0".
>
This is a bit puzzling. I'm always running valgrind tests with "-O0" and
possibly with -fno-omit-frame-pointer, as that gives me the most
reliable results etc. "-g" should be enabled too (thanks to the postgres
specific --enable-debug configure switch).
> 3. Optimizing for speed comes after achieving correct execution.
> If 'inline' is used anywhere, then re-compile with the compile-time
> argument
> "-Dinline=/*empty*/" in order to #define 'inline' as a one-word comment.
> If the behavior of the program changes (any difference at all, excepting
> only slower execution), then there is a *design error* in the source code.
> Fix that first.
>
If I was optimizing for speed, I wouldn't be running with "-O0". I'm not
sure what's causing the missing symbols, but it certainly is not inline
functions - we do have a couple of those, but definitely not this high
in the stack.
The other thing is that when loading the core file into gdb, the
backtrace is entirely different (and bogus) from what was written into
the server log (which comes from "backtrace()" - maybe the missing
symbol names are due to some limitation in this).
> 4. Walk before attempting to run.
> Did you try a simple example? Write a half-page program with 5
> subroutines,
> each of which calls the next one, and the last one sends SIGABRT to the
> process.
I've inspected *thousands* of core files in the last couple years, both
as part of development and supporting all kinds of systems. And most of
the time it either works just fine or it's clear why it's not working.
Except when running under valgrind, in which case I have no idea why it
doesn't work (with the same compile options and all that).
> Does the .core file when run under valgrind give the correct traceback
> using gdb?
>
I'm not sure I understand the questions. In my initial post I showed two
backtraces - one I extracted from the .core file using gdb, and another
one that the application itself (postgres) writes into the server log
(after using backtrace() etc.).
The logged backtrace has a couple missing symbols, but seems reasonable
otherwise.
The backtrace extracted from the .core file is clearly bogus.
> 5. (Learn and) Use the built-in tools where possible.
> Run the process interactively, invoking valgrind with "--vgdb-error=0",
> and giving the debugger command "(gdb) continue" after establishing
> connectivity between vgdb and the process.
> See the valgrind manual, section 3.2.9 "vgdb command line options".
> When the SIGABRT happens, then vgdb will allow you to use all the ordinary
> gdb commands to get a backtrace, go up and down the stack, examine
> variables and other memory, run
> (gdb) info proc
> (gdb) shell cat /proc/$PID/maps
> to see exactly the layout of process memory, etc.
> There are also special commands to access valgrind functionality
> interactively, such as checking for memory leaks.
>
I already explained why I don't want / can't use the interactive gdb.
I'm aware of the option, I've used it before, but in this case it's not
very practical.
regards
Tomas
|
|
From: Shane B. <sha...@ou...> - 2022-09-08 15:01:53
|
Hi, I am trying to compile Valgrind 3.19.0 on Solaris 11. The compilation fails. Here are the commands I have run: wget --no-check-certificate https://sourceware.org/pub/valgrind/valgrind-3.19.0.tar.bz2 tar xjf valgrind-3.19.0.tar.bz2 cd valgrind-3.19.0 MAKE=gmake ./configure --enable-only64bit gmake Once I run "gmake" then the build runs for a while, and then it fails: gcc -m64 -O2 -g -Wall -Wmissing-prototypes -Wshadow -Wpointer-arith -Wstrict-prototypes -Wmissing-declarations -Wcast-align -Wcast-qual -Wwrite-strings -Wempty-body -Wformat -Wformat-signedness -Wformat-security -Wignored-qualifiers -Wmissing-parameter-type -Wlogical-op -Wimplicit-fallthrough=2 -Wold-style-declaration -finline-functions -fno-stack-protector -fno-strict-aliasing -fno-builtin -O -g -fno-omit-frame-pointer -fno-strict-aliasing -fpic -fno-builtin -fno-ipa-icf -nodefaultlibs -shared -Wl,-z,interpose,-z,initfirst -Wl,-M,../solaris/vgpreload-solaris.mapfile -m64 -Wl,-soname -Wl,vgpreload_core.so.0 -o vgpreload_core-amd64-solaris.so vgpreload_core_amd64_solaris_so-vg_preloaded.o Undefined first referenced symbol in file __xpg4 ../solaris/vgpreload-solaris.mapfile (symbol scope specifies local binding) __xpg6 ../solaris/vgpreload-solaris.mapfile (symbol scope specifies local binding) ld: fatal: symbol referencing errors collect2: error: ld returned 1 exit status gmake[3]: *** [Makefile:3469: vgpreload_core-amd64-solaris.so] Error 1 gmake[3]: Leaving directory '/root/Downloads/valgrind-3.19.0/coregrind' gmake[2]: *** [Makefile:2475: all] Error 2 gmake[2]: Leaving directory '/root/Downloads/valgrind-3.19.0/coregrind' gmake[1]: *** [Makefile:896: all-recursive] Error 1 gmake[1]: Leaving directory '/root/Downloads/valgrind-3.19.0' gmake: *** [Makefile:759: all] Error 2 I have gcc version 7.3.0 and gmake version 4.2.1. Please let me know any other details I should share to help identify why my build is failing. Is it possible I am missing some headers or libs I need to install to build successfully? Is there an earlier release of Valgrind that is known to successfully compile on Solaris 11 that I could try building instead? Thanks, Shane P.S. I apologize if my email client inserts any HTML formatting, I don't know how to prevent this. |
|
From: Tomas V. <tv...@fu...> - 2022-09-04 19:17:41
|
On 9/4/22 13:18, Philippe Waroquiers wrote: > On Sun, 2022-09-04 at 00:14 +0200, Tomas Vondra wrote: >> >> Clearly, this is not an issue valgrind is meant to detect (like invalid >> memory access, etc.) but an application issue. I've tried reproducing it >> without valgrind, but it only ever happens with valgrind - my theory is >> it's some sort of race condition, and valgrind changes the timing in a >> way that makes it much more likely to hit. I need to analyze the core to >> inspect the state more closely, etc. >> >> Any ideas what I might be doing wrong? Or how do I load the core file? > > Rather than have the core dump and analyse it, you might interactively debug > your program under valgrind. > E.g. you might put a breakpoint on the assert or at some interesting points > before the assert. > > See https://valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.gdbserver > for more info. > I know, and I've used vgdbserver before. But sometimes that's not very practical for a number of reasons: 1) Our tests run mostly unattended, possibly even on CI machines that we don't have access to. And we don't want the machine to just sit there and wait for someone to debug it interactively, it's better to report the failure. Being able to inspect the core later would be helpful, though. 2) The error may be quite rare and/or hard to trigger - we regularly see race conditions that happen 1 in 1000 runs. True, I could automate that using a gdb script. 3) I'd bet it's not so simple in multi-process system that forks various processes that can trigger the issue. I'd have to do attach a gdb to each of those. 4) It's already pretty slow under valgrind, I'd bet it'll be even worse with gdb, but maybe it's not that bad. rpi4 is very constrained, though. 5) Race conditions are often very sensitive to change in timing. For example I've never seen this particular issue without valgrind. I can easily imagine gdb changing the timing just enough for the race condition to not happen. regards Tomas |
|
From: Philippe W. <phi...@sk...> - 2022-09-04 11:18:16
|
On Sun, 2022-09-04 at 00:14 +0200, Tomas Vondra wrote: > > Clearly, this is not an issue valgrind is meant to detect (like invalid > memory access, etc.) but an application issue. I've tried reproducing it > without valgrind, but it only ever happens with valgrind - my theory is > it's some sort of race condition, and valgrind changes the timing in a > way that makes it much more likely to hit. I need to analyze the core to > inspect the state more closely, etc. > > Any ideas what I might be doing wrong? Or how do I load the core file? Rather than have the core dump and analyse it, you might interactively debug your program under valgrind. E.g. you might put a breakpoint on the assert or at some interesting points before the assert. See https://valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.gdbserver for more info. Philippe |
|
From: John R. <jr...@bi...> - 2022-09-04 02:17:10
|
> Any ideas what I might be doing wrong? Or how do I load the core file?
Why does use of valgrind cause programmers to forget general debugging technique?
1. Describe the environment completely.
The report does not say which compilers and compiler versions were used,
or if the compiler commands contained any directives about debugging format.
Such information is necessary to help understand what might be happening
with regard to debugging and tracebacks.
2. Get debugging information whenever invoking a compiler.
Traceback lines such as "(+0x57a574)[0x682574]" which lack the name
of a symbol or file, suggest that "-g" debugging info was not requested
for *all* compilations. Start over ("make clean; rm -rf '*.[oa]'")
then re-compile every source file, making be sure to specify "-g"
and no variant of "-O" or "-On", except possibly "-O0".
3. Optimizing for speed comes after achieving correct execution.
If 'inline' is used anywhere, then re-compile with the compile-time argument
"-Dinline=/*empty*/" in order to #define 'inline' as a one-word comment.
If the behavior of the program changes (any difference at all, excepting
only slower execution), then there is a *design error* in the source code.
Fix that first.
4. Walk before attempting to run.
Did you try a simple example? Write a half-page program with 5 subroutines,
each of which calls the next one, and the last one sends SIGABRT to the process.
Does the .core file when run under valgrind give the correct traceback using gdb?
5. (Learn and) Use the built-in tools where possible.
Run the process interactively, invoking valgrind with "--vgdb-error=0",
and giving the debugger command "(gdb) continue" after establishing
connectivity between vgdb and the process.
See the valgrind manual, section 3.2.9 "vgdb command line options".
When the SIGABRT happens, then vgdb will allow you to use all the ordinary
gdb commands to get a backtrace, go up and down the stack, examine
variables and other memory, run
(gdb) info proc
(gdb) shell cat /proc/$PID/maps
to see exactly the layout of process memory, etc.
There are also special commands to access valgrind functionality
interactively, such as checking for memory leaks.
|
|
From: Tomas V. <tv...@fu...> - 2022-09-03 22:38:50
|
Hi,
I'm having some issues with analyzing cores generated from valgrind. I
do get the core file, but when I try opening it in gdb it just shows
some entirely bogus information / backtrace etc.
This is a rpi4 machine, with 64-bit debian, running a local build of
valgrind 3.19.0 (built from sources, not a package).
This is how I run the program (postgres binary)
valgrind --quiet --trace-children=yes --track-origins=yes \
--read-var-info=yes --num-callers=20 --leak-check=no \
--gen-suppressions=all --error-limit=no \
--log-file=/tmp/valgrind.543917.log postgres \
-D /home/debian/postgres /contrib/test_decoding/tmp_check_iso/data \
-F -c listen_addresses= -k /tmp/pg_regress-n7HodE
I get a ~200MB core file in /tmp, which I try loading like this:
gdb src/backend/postgres /tmp/valgrind.542299.log.core.542391
but all I get is this:
Reading symbols from src/backend/postgres...
[New LWP 542391]
Cannot access memory at address 0xcc10cc00cbf0cc6
Cannot access memory at address 0xcc10cc00cbf0cbe
Core was generated by `'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00000000049d42ac in ?? ()
(gdb) bt
#0 0x00000000049d42ac in ?? ()
#1 0x0000000000400000 in dshash_dump (hash_table=0x0) at dshash.c:782
#2 0x0000000000400000 in dshash_dump (hash_table=0x49c0e44) at
dshash.c:782
#3 0x0000000000000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt
stack?)
So the stack might be corrupt, for some reason? The first part looks
entirely bogus too, though. The file size seems about right - with 128MB
shared buffers, 200MB might be about right.
The core is triggered by an "assert" in the source, and we even log a
backtrace into the log - and that seems much more plausible:
TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File:
"reorderbuffer.c", Line: 902, PID: 536049)
(ExceptionalCondition+0x98)[0x8f5cec]
(+0x57a574)[0x682574]
(+0x579edc)[0x681edc]
(ReorderBufferAddNewTupleCids+0x60)[0x6864dc]
(SnapBuildProcessNewCid+0x94)[0x68b6a4]
(heap2_decode+0x17c)[0x671584]
(LogicalDecodingProcessRecord+0xbc)[0x670cd0]
(+0x570f88)[0x678f88]
(pg_logical_slot_get_changes+0x1c)[0x6790fc]
(ExecMakeTableFunctionResult+0x29c)[0x4a92c0]
(+0x3be638)[0x4c6638]
(+0x3a2c14)[0x4aac14]
(ExecScan+0x8c)[0x4aaca8]
(+0x3bea14)[0x4c6a14]
(+0x39ea60)[0x4a6a60]
(+0x392378)[0x49a378]
(+0x39520c)[0x49d20c]
(standard_ExecutorRun+0x214)[0x49aad8]
(ExecutorRun+0x64)[0x49a8b8]
(+0x62e2ac)[0x7362ac]
(PortalRun+0x27c)[0x735f08]
(+0x626be8)[0x72ebe8]
(PostgresMain+0x9a0)[0x733e9c]
(+0x547be8)[0x64fbe8]
(+0x547540)[0x64f540]
(+0x542d30)[0x64ad30]
(PostmasterMain+0x1460)[0x64a574]
(+0x418888)[0x520888]
Clearly, this is not an issue valgrind is meant to detect (like invalid
memory access, etc.) but an application issue. I've tried reproducing it
without valgrind, but it only ever happens with valgrind - my theory is
it's some sort of race condition, and valgrind changes the timing in a
way that makes it much more likely to hit. I need to analyze the core to
inspect the state more closely, etc.
Any ideas what I might be doing wrong? Or how do I load the core file?
thanks
Tomas
|
|
From: John R. <jr...@bi...> - 2022-09-03 11:42:31
|
> ==123254== HEAP SUMMARY: > ==123254== in use at exit: 0 bytes in 0 blocks > ==123254== total heap usage: 6 allocs, 6 frees, 2,084 bytes allocated "2,084 bytes allocated" is the sum of all 6 arguments that were passed to malloc(), calloc() [possibly by calling malloc()], realloc() [at least the increase], etc. |
|
From: jian he <jia...@gm...> - 2022-09-03 07:25:42
|
helloc$valgrind ./a.out
==123254== Memcheck, a memory error detector
==123254== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==123254== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright
info
==123254== Command: ./a.out
==123254==
enter string (EOF) to quit): test1
enter string (EOF) to quit): test2
enter string (EOF) to quit): (all done)
test1
test2
==123254==
==123254== HEAP SUMMARY:
==123254== in use at exit: 0 bytes in 0 blocks
==123254== total heap usage: 6 allocs, 6 frees, 2,084 bytes allocated
==123254==
==123254== All heap blocks were freed -- no leaks are possible
==123254==
==123254== For lists of detected and suppressed errors, rerun with: -s
==123254== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
-----------------------------------------------------------------------------------------------------------------
simple c program source code:
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
#define MAXC 1024
int main(void){
char buf[MAXC],**arr = NULL;
size_t nstr = 0; /* counter for number of string stored */
for(;;){
size_t len; /* var to hold length of string after n removeal */
fputs("enter string (EOF) to quit): ",stdout);
if(!fgets(buf,MAXC,stdin)){
puts("(all done)\n");
break;
}
buf[len = strcspn(buf,"\r\n")] = 0;
/*always realloc using temp pointer to avoid mem-leak on reallco
failure*/
void *tmp = realloc(arr,(nstr+1) * sizeof *arr);
if(!tmp){
perror("realloc-tmp");
break;
}
arr = tmp;
if(!(arr[nstr] = malloc(len + 1))){
perror("malloc-arr[str]");
break;
}
memcpy(arr[nstr++], buf,len + 1);
}
for(size_t i = 0; i < nstr; i++){
puts(arr[i]);
free(arr[i]);
}
free(arr);
return 0;
}
---------------------------------------------
New to C, I am not sure the following:
total heap usage: 6 allocs, 6 frees, 2,084 bytes allocated
I guess 6 allocs is 3 times mallocs called plus 3 times puts function
called?
But I don't know where 2084 comes from.
--
I recommend David Deutsch's <<The Beginning of Infinity>>
Jian
|
|
From: Tom H. <to...@co...> - 2022-09-01 06:05:59
|
On 01/09/2022 01:03, Bresalier, Rob (Nokia - US/Murray Hill) wrote: > Don't understand why strace log has exit(0) without the underscore, I know for a fact that it was with the underscore. Because exit() and _exit() are C library functions but both call the SYS_exit system call and that is what strace shows. The difference is that _exit doesn't run atexit() handlers or do any other cleanup before calling SYS_exit. Tom -- Tom Hughes (to...@co...) http://compton.nu/ |
|
From: Bresalier, R. (N. - US/M. Hill) <rob...@no...> - 2022-09-01 00:18:20
|
> Normally, if it is the OOM that kills a process, you should find a trace of this in the system logs. I looked in every system log I could find, there was no indication of OOM killing it in any system log. > I do not understand what you mean by reducing the nr of callers from 12 to 6. > What are these callers ? Is that some threads of the process you are running > under valgrind ? > I mean the --num-callers option core option to valgrind. By default this is 12, and I didn't specify it. I tried using --num-callers=6 to reduce memory consumption. From the valgrind manual this means " Specifies the maximum number of entries shown in stack traces that identify program locations.". By reducing it to 6 I was hoping to reduce valgrind memory consumption in case it really was OOM killer, which I really doubt now. > And just in case: are you using the last version of Valgrind ? Yes I used the last version of valgrind and many earlier versions. > You might use "strace" on valgrind to see what is going on at the time > _exit(0) is called. I did use 'strace' and dmesg. Neither indicated it was OOM killer. I did happen to save the strace log when the SIGKILL happened. Here is the part around the _exit(0): read(2040, "R", 1) = 1 gettid() = 3332 rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP SYS], NULL, 8) = 0 rt_sigprocmask(SIG_SETMASK, ~[], ~[ILL TRAP BUS FPE KILL SEGV STOP SYS], 8) = 0 rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP SYS], NULL, 8) = 0 gettid() = 3332 write(2041, "S", 1) = 1 exit(0) = ? +++ killed by SIGKILL +++ Don't understand why strace log has exit(0) without the underscore, I know for a fact that it was with the underscore. The strace log doesn't indicate anything special happening around the _exit(0). When I removed it the SIGKILL went away. > You might also start valgrind with some debug trace e.g. -d -d -d -d -v -v -v -v Was not aware of this and didn't try it. Don't have time to try it now. Regards, Rob |
|
From: Philippe W. <phi...@sk...> - 2022-08-31 22:14:27
|
On Wed, 2022-08-31 at 17:42 +0000, Bresalier, Rob (Nokia - US/Murray Hill) wrote: > > When running memcheck on a massive monolith embedded executable > > (237MB stripped, 1.8GiB unstripped), after I stop the executable under > > valgrind I see the "HEAP SUMMARY" but then valgrind dies before any leak > > reports are printed. The parent process sees that the return status of > > memcheck is that it was SIGKILLed (status returned in waitpid call is '9'). > > We found that removing a call to _exit(0) made it so that valgrind is no longer > SIGKILLED. > > Any ideas why using _exit(0) may get rid of valgrind getting SIGKILLed? > > Previously exit(0) was called, without the leading underscore, but changed it to > _exit(0) to really make sure no memory was being deallocated. This worked well on a > different process, so we carried it over to this one, that is why we did it. > > Even with exit(0) (no underscore), in this process there is not much deallocation going > on in exit handlers, so have lots of doubts that valgrind/memcheck was using too much > memory and invoking the OOM killer. > > Using strace and dmesg while we had _exit(0) in use didn't show that OOM killer was > SIGKILLing valgrind. > > I also tried reducing number of callers from 12 to 6 when using _exit(0), still got the > SIGKILL. > > Also tried using a system that had an additional 4GByte of memory, and also got the > SIGKILL there. > > So I have many doubts that Valgrind was getting SIGKILLed due to too much memory usage. > > Don't know why removing _exit(0) got rid of the SIGKILL. Was wondering if anyone had any > ideas? Normally, if it is the OOM that kills a process, you should find a trace of this in the system logs. I do not understand what you mean by reducing the nr of callers from 12 to 6. What are these callers ? Is that some threads of the process you are running under valgrind ? And just in case: are you using the last version of Valgrind ? You might use "strace" on valgrind to see what is going on at the time _exit(0) is called. You might also start valgrind with some debug trace e.g. -d -d -d -d -v -v -v -v Philippe |
|
From: Bresalier, R. (N. - US/M. Hill) <rob...@no...> - 2022-08-31 19:16:21
|
> When running memcheck on a massive monolith embedded executable > (237MB stripped, 1.8GiB unstripped), after I stop the executable under > valgrind I see the "HEAP SUMMARY" but then valgrind dies before any leak > reports are printed. The parent process sees that the return status of > memcheck is that it was SIGKILLed (status returned in waitpid call is '9'). We found that removing a call to _exit(0) made it so that valgrind is no longer SIGKILLED. Any ideas why using _exit(0) may get rid of valgrind getting SIGKILLed? Previously exit(0) was called, without the leading underscore, but changed it to _exit(0) to really make sure no memory was being deallocated. This worked well on a different process, so we carried it over to this one, that is why we did it. Even with exit(0) (no underscore), in this process there is not much deallocation going on in exit handlers, so have lots of doubts that valgrind/memcheck was using too much memory and invoking the OOM killer. Using strace and dmesg while we had _exit(0) in use didn't show that OOM killer was SIGKILLing valgrind. I also tried reducing number of callers from 12 to 6 when using _exit(0), still got the SIGKILL. Also tried using a system that had an additional 4GByte of memory, and also got the SIGKILL there. So I have many doubts that Valgrind was getting SIGKILLed due to too much memory usage. Don't know why removing _exit(0) got rid of the SIGKILL. Was wondering if anyone had any ideas? |
|
From: Philippe W. <phi...@sk...> - 2022-08-06 08:35:01
|
> > > Is there anything that can be done with memcheck to make it consume less memory? > > No. In fact, Yes :). Or more precisely, yes, memory can be somewhat reduced :). See my other mail. Philippe |
|
From: Philippe W. <phi...@sk...> - 2022-08-06 08:32:56
|
On Fri, 2022-08-05 at 15:34 +0000, Bresalier, Rob (Nokia - US/Murray Hill) wrote: > > If finding memory leaks is the only goal (for instance, if you are satisfied that > > memcheck has found all the overrun blocks, uninitialized reads, etc.) then > > https://github.com/KDE/heaptrack is the best tool. > > Thanks! I didn't know about heaptrack. I will look definitely into that. Does heaptrack > also show the 'still reachable' types of leaks that memcheck does? > > Any chance that the 'massif' tool would survive the OOM killer? This may be easier for > me to get going as I already have valgrind built. > > Is there anything that can be done with memcheck to make it consume less memory? You might be interested in looking at the slides of the FOSDEM presentation 'Tuning Valgrind for your workload' https://archive.fosdem.org/2015/schedule/event/valgrind_tuning/attachments/slides/743/export/events/attachments/valgrind_tuning/slides/743/tuning_V_for_your_workload.pdf There are several things you can do to reduce memcheck memory usage. Note also that you can also run leak search while your program runs, either via memcheck client requests or from the shell, using vgdb. Philippe |
|
From: Julian S. <jse...@gm...> - 2022-08-06 06:43:36
|
> Is there anything that can be done with memcheck to make it consume less memory? First of all, figure out whether memcheck got sigkilled because the machine ran out of space, or because you hit some shell limit/ulimit. In the former case, you can then try adding swap space to the machine. In the latter case you'll need to mess with the shell's ulimit settings. You could also try reducing the (data) size of the workload. Massif and Memcheck are different tools and do largely different things. Whether or not you can use one or the other depends a lot on the specifics of what problem you're trying to solve. J |
|
From: Eliot M. <mo...@cs...> - 2022-08-06 02:05:51
|
On 8/5/2022 8:47 PM, G N Srinivasa Prasanna wrote: > Thanks for this information. > > We are doing a memory system simulation, and need the address stream. At this point of time, we > don't care if we need a Terabyte even, we can delete the files later. > > Is there anything we can use from Valgrind? The lackey tool does just that - output a trace of memory references. -- Eliot Moss |
|
From: G N S. P. <gns...@ii...> - 2022-08-06 01:49:20
|
Thanks, will check it out. Best ________________________________ From: Eliot Moss <mo...@cs...> Sent: 06 August 2022 07:10 To: G N Srinivasa Prasanna <gns...@ii...>; John Reiser <jr...@bi...>; val...@li... <val...@li...> Subject: Re: [Valgrind-users] Valgrind trace Memory Addresses while running? On 8/5/2022 8:47 PM, G N Srinivasa Prasanna wrote: > Thanks for this information. > > We are doing a memory system simulation, and need the address stream. At this point of time, we > don't care if we need a Terabyte even, we can delete the files later. > > Is there anything we can use from Valgrind? The lackey tool does just that - output a trace of memory references. -- Eliot Moss |
|
From: G N S. P. <gns...@ii...> - 2022-08-06 00:47:29
|
Thanks for this information. We are doing a memory system simulation, and need the address stream. At this point of time, we don't care if we need a Terabyte even, we can delete the files later. Is there anything we can use from Valgrind? Best ________________________________ From: John Reiser <jr...@bi...> Sent: 06 August 2022 01:18 To: val...@li... <val...@li...> Subject: Re: [Valgrind-users] Valgrind trace Memory Addresses while running? >> if we can get a list of all the physical addresses the program used, in the order the program accessed them, and whether read/write. > For any real world application the size of the log would be overwhelmingly huge ... (unless you only want unique addresses). Of course this is the purpose of data compression (such as gzip, etc). You get some/much/most of the benefit of restricting to unique addresses while still capturing the entire stream of references. But as Paul noted, valgrind works in virtual addresses. Getting all the actual physical addresses is close to impossible. If you are working in an embedded device environment and care only about a small handful of memory-mapped device registers, then you can (must) process the mapping yourself. _______________________________________________ Valgrind-users mailing list Val...@li... https://lists.sourceforge.net/lists/listinfo/valgrind-users |
|
From: Bresalier, R. (N. - US/M. Hill) <rob...@no...> - 2022-08-05 23:37:06
|
I tried 'massif' on a simple program shown below where there are "definitely lost" leaks.
massif doesn't seem to find "definitely lost" leaks, is this correct?
I'm tried with both 3.19.0 and 3.15.0 versions of valgrind/massif, same result, "definitely lost" leaks are not found.
I launch massif via:
valgrind --tool=massif --sigill-diagnostics=no --error-limit=no --massif-out-file=definitely.%p.massif definitely.elf
When I use memcheck it does find these definite leaks as below:
==29917== 60 bytes in 3 blocks are definitely lost in loss record 1 of 1
==29917== at 0x402F67C: malloc (vg_replace_malloc.c:381)
==29917== by 0x80491D1: f2() (definitely.cpp:11)
==29917== by 0x804920F: f1() (definitely.cpp:17)
==29917== by 0x8049262: main (definitely.cpp:25)
But massif doesn't find them at all? Is this correct?
When I use massif on a program with "still reachable" it does find the still reachable, but it isn't finding definite leaks.
Shouldn't massif also find definite leaks?
The C code for "definitely.elf" is below:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
void*
f2()
{
return malloc(20);
}
void
f1()
{
f2();
}
int
main()
{
for (int i = 1; i <= 3; i++)
{
f1();
}
return 0;
}
Thanks,
Rob
|