|
From: Oliver G. <ol...@gm...> - 2008-03-05 14:23:01
Attachments:
vg-massif-children.c
|
Hello, I have some problems with using Massif 3.3.0 on a process which does some fork() calls. The process doesn't actually call exec() after forking - the new long-running child basically does an endless loop with a sleep inside. Problem is, when running this under Massif, the massif.out.<pid> files seem to be corrupted: when running ms_print on them, I usually get an error like this: Line 1136: expected "snapshot" line, got: pty The offending part in the file looks like this: ... #----------- snapshot=52 #----------- time=38429472 mem_heap_B=441711 mem_heap_extra_B=34113 mem_stacks_B=0 heap_tree=empty pty #----------- snapshot=53 #----------- ... (line 1136 indeed is the "pty" line). I'm not really interested in the child's memory usage, but this error also prevents to analyze the parent. So I used --trace-children=no and --child-silent-after-fork=yes , but that didn't help (ms_print still fails). I'm somewhat out of ideas... Do you see what is going wrong here? So for testing this, I made a short example app (it's attached) and let it run under Massif. Strangely, it now seems to _only_ notice the 3kb allocated in a child, and it ignores the 60MB allocated in the parent process! Only if I don't fork(), the big allocation is correctly shown. Massif is started with > valgrind --tool=massif --child-silent-after-fork=yes ./vg-massif-children Here's what ms_print gives when fork() is used: > Number of snapshots: 4 > Detailed snapshots: [2 (peak)] > > -------------------------------------------------------------------------------- > n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B) > -------------------------------------------------------------------------------- > 0 0 0 0 0 0 > 1 110,003 3,008 3,000 8 0 > 2 116,675 3,008 3,000 8 0 > 99.73% (3,000B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc. > ->99.73% (3,000B) 0x80485B5: doFork (vg-massif-children.c:27) > ->99.73% (3,000B) 0x80486B6: main (vg-massif-children.c:65) > > -------------------------------------------------------------------------------- > n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B) > -------------------------------------------------------------------------------- > 3 116,675 0 0 0 0 Do you have an idea what could cause this sub-problem (ie. that malloc in main process is ignored if there's a fork())? Thanks, Oliver Gerlich |
|
From: Nicholas N. <nj...@cs...> - 2008-03-05 22:23:15
Attachments:
diff
|
On Wed, 5 Mar 2008, Oliver Gerlich wrote: > I have some problems with using Massif 3.3.0 on a process which does some > fork() calls. The process doesn't actually call exec() after forking - the > new long-running child basically does an endless loop with a sleep inside. > Problem is, when running this under Massif, the massif.out.<pid> files seem > to be corrupted: Turns out this is an easy fix, fortunately. Here's the problem: Massif's output, by default, goes to a file called "massif.out.<pid>", where <pid> is its pid. Currently, this file name is determined when Massif starts up. However, when you fork, the child ends up with a different pid, but it (erroneously) inherits the already-determined file name, which uses the parent's pid. As a result, both the parent and the child write to the same file at termination. This can cause file corruption (which you saw) or one process to overwrite the data of the other process (which you also saw). The fix is to determine the output file name at termination, when the file is written, instead of start-up. Then you end up with two massif.out.<pid> files, one for the parent and one for the child. Can you try the attached patch and let me know if it fixes the problem? (And thanks for giving such a detailed and well-explained bug report!) Nick ps: Valgrind's main logging mechanism suffers from the same problem, as do Cachegrind and Callgrind. The fix is equally simple for Cachegrind and Callgrind. For the core it's more difficult, because the output doesn't happen all in one hit at the end, rather it happens gradually. I guess when a fork happens, if we're logging to file and a %p specifier is present in the --log-file option, then a new logging file should be opened. Sigh. |
|
From: Nicholas N. <nj...@cs...> - 2008-03-07 20:04:55
|
On Fri, 7 Mar 2008, Josef Weidendorfer wrote: > On Wednesday 05 March 2008, Nicholas Nethercote wrote: >> ps: Valgrind's main logging mechanism suffers from the same problem, as do >> Cachegrind and Callgrind. The fix is equally simple for Cachegrind and >> Callgrind. > > Yes. > There still is an issue here: you get all the counters happening before > the fork duplicated in the output of the parent and the child. I am not > sure this is the right thing. > It probably would be better to clear the counters in the child directly > after the fork. Maybe -- there's two clear possibilities that make sense. It's unclear to me which is correct. Clearing the counters is more difficult, because it requires the tool to notice it's been forked. >> For the core it's more difficult, because the output doesn't happen all in >> one hit at the end, rather it happens gradually. I guess when a fork >> happens, if we're logging to file and a %p specifier is present in the >> --log-file option, then a new logging file should be opened. Sigh. > > It is even more difficult if the child sometimes afterwards does an exec, > and VG does follow the new executable. Then, the new log file started after > the exec will overwrite the file already written to between fork and exec. Is that a problem? It might be a little inefficient (although the new flag --child-silent-after-fork can help), but I don't see how it would cause a correctness problem? > Hmm... does the log file mechanism concatenate existing files? No. Nick |
|
From: Josef W. <Jos...@gm...> - 2008-03-09 18:42:10
|
On Friday 07 March 2008, Nicholas Nethercote wrote: > On Fri, 7 Mar 2008, Josef Weidendorfer wrote: > > > On Wednesday 05 March 2008, Nicholas Nethercote wrote: > >> ps: Valgrind's main logging mechanism suffers from the same problem, as do > >> Cachegrind and Callgrind. The fix is equally simple for Cachegrind and > >> Callgrind. > > > > Yes. > > There still is an issue here: you get all the counters happening before > > the fork duplicated in the output of the parent and the child. I am not > > sure this is the right thing. > > It probably would be better to clear the counters in the child directly > > after the fork. > > Maybe -- there's two clear possibilities that make sense. It's unclear to > me which is correct. Clearing the counters is more difficult, because it > requires the tool to notice it's been forked. OK, this depends very much on the tool. For profiling tools, clearing the counters is better option. As the output file name contains the child PID, the user would not expect that there is data of the parent included, too. > >> For the core it's more difficult, because the output doesn't happen all in > >> one hit at the end, rather it happens gradually. I guess when a fork > >> happens, if we're logging to file and a %p specifier is present in the > >> --log-file option, then a new logging file should be opened. Sigh. > > > > It is even more difficult if the child sometimes afterwards does an exec, > > and VG does follow the new executable. Then, the new log file started after > > the exec will overwrite the file already written to between fork and exec. > > Is that a problem? It might be a little inefficient (although the new flag > --child-silent-after-fork can help), but I don't see how it would cause a > correctness problem? Lost lines in the log file for sure can be a problem? As the child after fork has the same PID as the new Valgrind process after exec in this child, the log file name will be the same. Or am I missing something here? Josef > > > Hmm... does the log file mechanism concatenate existing files? > > No. > > Nick > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users > |
|
From: Nicholas N. <nj...@cs...> - 2008-03-10 20:23:52
|
On Sun, 9 Mar 2008, Josef Weidendorfer wrote: >>> It is even more difficult if the child sometimes afterwards does an exec, >>> and VG does follow the new executable. Then, the new log file started after >>> the exec will overwrite the file already written to between fork and exec. >> >> Is that a problem? It might be a little inefficient (although the new flag >> --child-silent-after-fork can help), but I don't see how it would cause a >> correctness problem? > > Lost lines in the log file for sure can be a problem? As the child after > fork has the same PID as the new Valgrind process after exec in this child, > the log file name will be the same. Or am I missing something here? I see. I guess if a fork follows an exec, in practice it will be almost immediate so anything lost would be unimportant? But in general, yes, there's the potential for lost information. Nick |