I've begun valgrind'ing ncap.
There's no shortage of memory leaks and errors.
I would like your help getting these cleaned up in the
next few weeks. That means running valgrind and
looking for memory problems.
I think bug squashing with valgrind is kind of fun.
It's when valgrind stops pointing out problems yet
there are still bugs that I break out in a cold sweat.
Anyway, TODO ncap68 is now the purify ncap item.
The first memory error, ncap69 should be tackled
first.
Thanks,
Charlie
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Charlie,
I ran valgrind as follows.
valgrind --leak-check=yes --show-reachable=yes --tool=memcheck ncap -O -S ncap.in in.nc foo.nc
But I cannot see in the output where you get a message about "var_ycc[idx]->undefined" being unitialized.
Are there some #defines I should be putting in the code ?
Can you also please send me your suppresions file
Many Thanks
Henry
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I believe I've asked you to do this in the past so you would have
access to large, fast machine to aid in development/benchmarking.
Once you obtain your account, my suppressions file is in
~zender/c++/valgrind.txt
I received your ncap roadmap and am still evaluating it and trying
to decide what the priorities are. I should have comments back to you within a week.
Thanks,
Charlie
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Apparently I have been un-subscribed from this forum at least
since you posted this. This has happened to me before.
Sorry for the late response. Next time I don't respond within a few days please ping me via e-mail. I'm subscribed again and catching up.
==13796== Memcheck, a memory error detector.
==13796== Copyright (C) 2002-2005, and GNU GPL'd, by Julian Seward et al.
==13796== Using LibVEX rev 1203, a library for dynamic binary translation.
==13796== Copyright (C) 2004-2005, and GNU GPL'd, by OpenWorks LLP.
==13796== Using valgrind-3.0.0.SVN, a dynamic binary instrumentation framework.
==13796== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et al.
==13796== For more details, rerun with: -v
==13796==
==13796== Conditional jump or move depends on uninitialised value(s)
==13796== at 0x804B1A2: main (ncap.c:570)
ncap: WARNING Replacing missing value data in variable val_half_half
==13796==
==13796== ERROR SUMMARY: 9 errors from 1 contexts (suppressed: 18 from 1)
==13796== malloc/free: in use at exit: 51252 bytes in 689 blocks.
==13796== malloc/free: 9518 allocs, 8829 frees, 495695 bytes allocated.
==13796== For counts of detected errors, rerun with: -v
==13796== searching for pointers to 689 not-freed blocks.
==13796== checked 145568 bytes.
==13796==
==13796==
==13796== 48 bytes in 4 blocks are still reachable in loss record 1 of 11
==13796== at 0x1B8FF649: malloc (vg_replace_malloc.c:220)
==13796== by 0x1B9C7076: (within /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x1B9C8947: (within /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x1B9C71FF: (within /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x1B9C7D55: (within /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x1B9C65DD: localtime (in /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x1B9C64AC: ctime (in /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x804A349: main (ncap.c:268)
==13796==
==13796==
==13796== 232 bytes in 1 blocks are still reachable in loss record 2 of 11
==13796== at 0x1B8FF649: malloc (vg_replace_malloc.c:220)
==13796== by 0x805F1A0: nco_malloc (nco_mmr.c:85)
==13796== by 0x8066775: nco_var_lst_mk (nco_var_lst.c:86)
==13796== by 0x804B299: main (ncap.c:587)
==13796==
==13796==
==13796== 248 bytes in 31 blocks are still reachable in loss record 3 of 11
==13796== at 0x1B8FF649: malloc (vg_replace_malloc.c:220)
==13796== by 0x1B9ABAFE: strdup (in /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x1B9C71DE: (within /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x1B9C7D55: (within /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x1B9C65DD: localtime (in /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x1B9C64AC: ctime (in /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x804A349: main (ncap.c:268)
==13796==
==13796==
==13796== 704 bytes in 2 blocks are still reachable in loss record 4 of 11
==13796== at 0x1B8FF649: malloc (vg_replace_malloc.c:220)
==13796== by 0x1B996338: (within /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x1B998879: fopen64 (in /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x804B084: main (ncap.c:546)
==13796==
==13796==
==13796== 776 bytes in 67 blocks are indirectly lost in loss record 5 of 11
==13796== at 0x1B8FF649: malloc (vg_replace_malloc.c:220)
==13796== by 0x805F1A0: nco_malloc (nco_mmr.c:85)
==13796== by 0x806D231: nco_var_dpl (nco_var_utl.c:559)
==13796== by 0x8055D4F: nco_var_cnf_dmn (nco_cnf_dmn.c:170)
==13796== by 0x80563B9: ncap_var_cnf_dmn (nco_cnf_dmn.c:327)
==13796== by 0x804CB85: ncap_var_var_sub (ncap_utl.c:481)
==13796== by 0x804FBDE: yyparse (ncap_yacc.y:477)
==13796== by 0x804B0EF: main (ncap.c:557)
==13796==
==13796==
==13796== 976 bytes in 1 blocks are still reachable in loss record 6 of 11
==13796== at 0x1B8FF649: malloc (vg_replace_malloc.c:220)
==13796== by 0x1B9C83AA: (within /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x1B9C71FF: (within /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x1B9C7D55: (within /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x1B9C65DD: localtime (in /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x1B9C64AC: ctime (in /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x804A349: main (ncap.c:268)
==13796==
==13796==
==13796== 2070 bytes in 215 blocks are definitely lost in loss record 7 of 11
==13796== at 0x1B8FF649: malloc (vg_replace_malloc.c:220)
==13796== by 0x1B9ABAFE: strdup (in /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x8051E9D: yylex (ncap_lex.l:564)
==13796== by 0x804E77B: yyparse (ncap_yacc.c:1257)
==13796== by 0x804B0EF: main (ncap.c:557)
==13796==
==13796==
==13796== 2516 bytes in 227 blocks are indirectly lost in loss record 8 of 11
==13796== at 0x1B8FF649: malloc (vg_replace_malloc.c:220)
==13796== by 0x1B9ABAFE: strdup (in /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x806D144: nco_var_dpl (nco_var_utl.c:543)
==13796== by 0x8055D4F: nco_var_cnf_dmn (nco_cnf_dmn.c:170)
==13796== by 0x80563B9: ncap_var_cnf_dmn (nco_cnf_dmn.c:327)
==13796== by 0x804CB85: ncap_var_var_sub (ncap_utl.c:481)
==13796== by 0x804FBDE: yyparse (ncap_yacc.y:477)
==13796== by 0x804B0EF: main (ncap.c:557)
==13796==
==13796==
==13796== 14150 (4886 direct, 9264 indirect) bytes in 123 blocks are definitely lost in loss record 9 of 11
==13796== at 0x1B8FF649: malloc (vg_replace_malloc.c:220)
==13796== by 0x805F1A0: nco_malloc (nco_mmr.c:85)
==13796== by 0x8051B28: yylex (ncap_lex.l:452)
==13796== by 0x804E77B: yyparse (ncap_yacc.c:1257)
==13796== by 0x804B0EF: main (ncap.c:557)
==13796==
==13796==
==13796== 5972 bytes in 14 blocks are indirectly lost in loss record 10 of 11
==13796== at 0x1B8FF649: malloc (vg_replace_malloc.c:220)
==13796== by 0x805F2C4: nco_malloc_dbg (nco_mmr.c:149)
==13796== by 0x806D174: nco_var_dpl (nco_var_utl.c:547)
==13796== by 0x8055D4F: nco_var_cnf_dmn (nco_cnf_dmn.c:170)
==13796== by 0x80563B9: ncap_var_cnf_dmn (nco_cnf_dmn.c:327)
==13796== by 0x804CB85: ncap_var_var_sub (ncap_utl.c:481)
==13796== by 0x804FBDE: yyparse (ncap_yacc.y:477)
==13796== by 0x804B0EF: main (ncap.c:557)
==13796==
==13796==
==13796== 32824 bytes in 4 blocks are still reachable in loss record 11 of 11
==13796== at 0x1B8FF649: malloc (vg_replace_malloc.c:220)
==13796== by 0x805361D: yyalloc (lex.yy.c:3016)
==13796== by 0x8050C67: yylex (lex.yy.c:1095)
==13796== by 0x804E77B: yyparse (ncap_yacc.c:1257)
==13796== by 0x804B0EF: main (ncap.c:557)
==13796==
==13796== LEAK SUMMARY:
==13796== definitely lost: 6956 bytes in 338 blocks.
==13796== indirectly lost: 9264 bytes in 308 blocks.
==13796== possibly lost: 0 bytes in 0 blocks.
==13796== still reachable: 35032 bytes in 43 blocks.
==13796== suppressed: 0 bytes in 0 blocks.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
A few days ago you committed changes to ncap_lex.l, and possibly
other parts of ncap as well. Soon thereafter, ncap began seg-faulting every time I run it. Could you please check whether
the segfaults are due to your commit, and, if so, back out the
patch that causes the segfaults? Please post a message to the
developer's group when you commit anything that changes the
result of a regression test (and this does).
If the recent ncap segfaults were caused by something else,
let me know and I'll take a look.
Thanks!
Charlie
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Charlie,
will tackle problem as soon as I get a login on the ESMF machine. I tried last night to speak to the right person- but our wires got crossed - will try again later this evening.
On my machine ncap.in runs fine - but havent got the juice to run the main bm's.
FYI --- will be signing off on Monday the shopping cart software I've done for Colin Narbeth . The jobs been a f**king nightmare from beginning to end. So I will have more time and be less stressed
Speak to you soon
Henry
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have a project that sound like Colin Narbeth.
I wish that such projects came with eject seats.
The ncap segfault will be difficult to isolate on the ESMF, I think.
Best bet is an x86 running valgrind.
Even if it does not segfault, it will flag the memory error.
My guess is a double-free() at the end of the code in the
cleanup section, since the output file looks fine.
Best,
Charlie
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Charlie,
Have plugged a few more leaks -- fixed TODO ncap69.
Am running valgrind as follows:
valgrind --leak-check=yes --show-reachable=yes --tool=memcheck ncap -O -S ncap.in in.nc foo.nc >& vg_out
Still outstanding 3 leaks wilth yylex --Not sure how to tackle these as they are leaks with the automatically generated code. I don't want to add calls to yy_flush_buffer() or other functions as this could create compatability problems on some of the platforms.
Still have 3 leaks associated nco_var_cnf_dmn().
Regards Henry
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, that is basically how I would use valgrind on ncap.
I should have mentioned: Don't worry about memory
allocated by automatically generated code like bison and flex.
There's nothing much we can do about that code, so it does
not count as an NCO memory leak in my book.
Leaks associated with nco_var_cnf_dmn() are our responsibility.
Let me know if you think it's something in the base code, or
something ncap-specific. I'll be happy to look at it after you've
given a crack at trying to plug the leaks.
Thanks,
Charlie
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Whas happening with ncap_var_cnf_dmn is weird. when the two variables are made to conform the is NO memory leak.
when the function is called when the variables already conform then we get leakage. Try running the following.
I took a crack or two at the leak in ncap_var_cnf_dmn().
I've committed one significant change, which still leaves
the original problem, but may be an improvement.
I didn't want to change more until I hear what you think.
First:
Conditionally free both $1 and $3 in var,var addition in ncap_yacc.y
The reason for this is that clearly at least one of them
is sometimes not being free()d.
This parallels the reasoning in nco_cnf_dmn.c lines 329-338.
For some reason, that patch does not work, but it's clear to
me that there's a problem in the memory management there.
Please look at this changes and see if it makes sense to you.
If the second change does, then clearly it should be applied to
the other variable-involved operations in ncap_yacc.y.
Otherwise it should be ripped out.
Finally, note that unlike your previous statement, I always
see memory leaks with addition et al. regardless of whether the
variables already conform, so that a simpler test case is
I have managed to convince myself that the cause of the large
memory leak in ncap is due to dangling pointers to the input
variables to ncap_var_cnf_dmn(). You can see in lines 342-343
of nco_cnf_dmn.c where I attempted to free() these pointers.
However, doing so causes much breakage and I'm not sure why.
Am hoping you can investigate further.
Perhaps there's another part of ncap which I'm not aware of
that depends on the contents of the input variables not being
free'd in that routine. That would be an error that valgrind
could point you to, i.e., which code tries to use the free'd
memory when you uncomment those lines.
My earlier modification to ncap_yacc.y lines 478-479 is still
in, as I think it is also necessary, in principle. But you might
want to back it out while tracking down the ncap_var_cnf_dmn()
problem. Let me know if you get to another sticking point and
I can dig deeper.
Thanks,
Charlie
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
>Didn't you get any similar problems with it when you leakproofed >the other operators ?
No. There is no "leak" per se in nco_var_cnf_dmn().
It is used extensively by ncwa and ncbo.
However, those programs know to clean up after themselves.
The problem, if any, is in the way ncap_var_cnf_dmn()
wraps nco_var_cnf_dmn(), and loses track of the initial variables.
If I could get those free() statements in ncap_var_cnf_dmn()
to work, then I think the problem would be solved.
The fact that they don't work leads me to suspect that ncap
has other dependencies on the entering variables which I've
forgotten or never knew. Feel free to give me a call now.
Understanding the nco_var_cnf_dmn() algorithm, and thus
how to wrap it, is non-trivial...
Charlie
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Charlie,
Commited ncap code
Almost finished memory tidy up
still left with a problem in ncap_var_stretch()
c.f
valgrind --leak-check=yes --show-reachable=yes --tool=memcheck ncap -O -v -s "a7[time,lat,lon]=3.14d" in.nc foo.nc >& vg_cmd2
Regards Henry
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I just committed the patch to fix the above-mentioned leak in
ncap_var_stretch(). Good news! This means that all known ncap
leaks that are fixable, are fixed! All the operators are leak free
except for a small leak in ncatted which is of no consequence.
Charlie
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Henry,
I've begun valgrind'ing ncap.
There's no shortage of memory leaks and errors.
I would like your help getting these cleaned up in the
next few weeks. That means running valgrind and
looking for memory problems.
I think bug squashing with valgrind is kind of fun.
It's when valgrind stops pointing out problems yet
there are still bugs that I break out in a cold sweat.
Anyway, TODO ncap68 is now the purify ncap item.
The first memory error, ncap69 should be tackled
first.
Thanks,
Charlie
Hi Charlie,
Just getting into valgrind.
what valgrind command line did you use to find TODO69
Regards Henry
I used TODO ncap68, i.e., I ran the whole ncap.in script through
valgrind.
Charlie
Hi Charlie,
I ran valgrind as follows.
valgrind --leak-check=yes --show-reachable=yes --tool=memcheck ncap -O -S ncap.in in.nc foo.nc
But I cannot see in the output where you get a message about "var_ycc[idx]->undefined" being unitialized.
Are there some #defines I should be putting in the code ?
Can you also please send me your suppresions file
Many Thanks
Henry
Continuing with the previous post, if you look at the error,
line 570 of ncap.c is:
if(var_ycc[idx]->undefined){
var_ycc[idx]=nco_var_free(var_ycc[idx]);
continue;
Hence the problem mentioned in my original post is still flagged
by valgrind as being a memory error.
There are no #defines required by valgrind.
As for my suppressions file, get an account on the ESMF by following the instructions at
http://www.ess.uci.edu/esmf/accounts.html
I believe I've asked you to do this in the past so you would have
access to large, fast machine to aid in development/benchmarking.
Once you obtain your account, my suppressions file is in
~zender/c++/valgrind.txt
I received your ncap roadmap and am still evaluating it and trying
to decide what the priorities are. I should have comments back to you within a week.
Thanks,
Charlie
Hi Henry,
Apparently I have been un-subscribed from this forum at least
since you posted this. This has happened to me before.
Sorry for the late response. Next time I don't respond within a few days please ping me via e-mail. I'm subscribed again and catching up.
Following is the output from
valgrind --leak-check=yes --show-reachable=yes --suppressions=${HOME}/c++/valgrind.txt --tool=memcheck ncap -O -S ${HOME}/nco/data/ncap.in -p ${HOME}/nco/data -l /tmp in.nc ~/foo.nc > ~/foo 2>&1
using
zender@ashes:~/nco/bld$ valgrind --version
valgrind-3.0.0.SVN
which I compiled from source:
==13796== Memcheck, a memory error detector.
==13796== Copyright (C) 2002-2005, and GNU GPL'd, by Julian Seward et al.
==13796== Using LibVEX rev 1203, a library for dynamic binary translation.
==13796== Copyright (C) 2004-2005, and GNU GPL'd, by OpenWorks LLP.
==13796== Using valgrind-3.0.0.SVN, a dynamic binary instrumentation framework.
==13796== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et al.
==13796== For more details, rerun with: -v
==13796==
==13796== Conditional jump or move depends on uninitialised value(s)
==13796== at 0x804B1A2: main (ncap.c:570)
ncap: WARNING Replacing missing value data in variable val_half_half
==13796==
==13796== ERROR SUMMARY: 9 errors from 1 contexts (suppressed: 18 from 1)
==13796== malloc/free: in use at exit: 51252 bytes in 689 blocks.
==13796== malloc/free: 9518 allocs, 8829 frees, 495695 bytes allocated.
==13796== For counts of detected errors, rerun with: -v
==13796== searching for pointers to 689 not-freed blocks.
==13796== checked 145568 bytes.
==13796==
==13796==
==13796== 48 bytes in 4 blocks are still reachable in loss record 1 of 11
==13796== at 0x1B8FF649: malloc (vg_replace_malloc.c:220)
==13796== by 0x1B9C7076: (within /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x1B9C8947: (within /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x1B9C71FF: (within /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x1B9C7D55: (within /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x1B9C65DD: localtime (in /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x1B9C64AC: ctime (in /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x804A349: main (ncap.c:268)
==13796==
==13796==
==13796== 232 bytes in 1 blocks are still reachable in loss record 2 of 11
==13796== at 0x1B8FF649: malloc (vg_replace_malloc.c:220)
==13796== by 0x805F1A0: nco_malloc (nco_mmr.c:85)
==13796== by 0x8066775: nco_var_lst_mk (nco_var_lst.c:86)
==13796== by 0x804B299: main (ncap.c:587)
==13796==
==13796==
==13796== 248 bytes in 31 blocks are still reachable in loss record 3 of 11
==13796== at 0x1B8FF649: malloc (vg_replace_malloc.c:220)
==13796== by 0x1B9ABAFE: strdup (in /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x1B9C71DE: (within /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x1B9C7D55: (within /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x1B9C65DD: localtime (in /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x1B9C64AC: ctime (in /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x804A349: main (ncap.c:268)
==13796==
==13796==
==13796== 704 bytes in 2 blocks are still reachable in loss record 4 of 11
==13796== at 0x1B8FF649: malloc (vg_replace_malloc.c:220)
==13796== by 0x1B996338: (within /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x1B998879: fopen64 (in /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x804B084: main (ncap.c:546)
==13796==
==13796==
==13796== 776 bytes in 67 blocks are indirectly lost in loss record 5 of 11
==13796== at 0x1B8FF649: malloc (vg_replace_malloc.c:220)
==13796== by 0x805F1A0: nco_malloc (nco_mmr.c:85)
==13796== by 0x806D231: nco_var_dpl (nco_var_utl.c:559)
==13796== by 0x8055D4F: nco_var_cnf_dmn (nco_cnf_dmn.c:170)
==13796== by 0x80563B9: ncap_var_cnf_dmn (nco_cnf_dmn.c:327)
==13796== by 0x804CB85: ncap_var_var_sub (ncap_utl.c:481)
==13796== by 0x804FBDE: yyparse (ncap_yacc.y:477)
==13796== by 0x804B0EF: main (ncap.c:557)
==13796==
==13796==
==13796== 976 bytes in 1 blocks are still reachable in loss record 6 of 11
==13796== at 0x1B8FF649: malloc (vg_replace_malloc.c:220)
==13796== by 0x1B9C83AA: (within /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x1B9C71FF: (within /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x1B9C7D55: (within /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x1B9C65DD: localtime (in /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x1B9C64AC: ctime (in /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x804A349: main (ncap.c:268)
==13796==
==13796==
==13796== 2070 bytes in 215 blocks are definitely lost in loss record 7 of 11
==13796== at 0x1B8FF649: malloc (vg_replace_malloc.c:220)
==13796== by 0x1B9ABAFE: strdup (in /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x8051E9D: yylex (ncap_lex.l:564)
==13796== by 0x804E77B: yyparse (ncap_yacc.c:1257)
==13796== by 0x804B0EF: main (ncap.c:557)
==13796==
==13796==
==13796== 2516 bytes in 227 blocks are indirectly lost in loss record 8 of 11
==13796== at 0x1B8FF649: malloc (vg_replace_malloc.c:220)
==13796== by 0x1B9ABAFE: strdup (in /lib/tls/i686/cmov/libc-2.3.2.so)
==13796== by 0x806D144: nco_var_dpl (nco_var_utl.c:543)
==13796== by 0x8055D4F: nco_var_cnf_dmn (nco_cnf_dmn.c:170)
==13796== by 0x80563B9: ncap_var_cnf_dmn (nco_cnf_dmn.c:327)
==13796== by 0x804CB85: ncap_var_var_sub (ncap_utl.c:481)
==13796== by 0x804FBDE: yyparse (ncap_yacc.y:477)
==13796== by 0x804B0EF: main (ncap.c:557)
==13796==
==13796==
==13796== 14150 (4886 direct, 9264 indirect) bytes in 123 blocks are definitely lost in loss record 9 of 11
==13796== at 0x1B8FF649: malloc (vg_replace_malloc.c:220)
==13796== by 0x805F1A0: nco_malloc (nco_mmr.c:85)
==13796== by 0x8051B28: yylex (ncap_lex.l:452)
==13796== by 0x804E77B: yyparse (ncap_yacc.c:1257)
==13796== by 0x804B0EF: main (ncap.c:557)
==13796==
==13796==
==13796== 5972 bytes in 14 blocks are indirectly lost in loss record 10 of 11
==13796== at 0x1B8FF649: malloc (vg_replace_malloc.c:220)
==13796== by 0x805F2C4: nco_malloc_dbg (nco_mmr.c:149)
==13796== by 0x806D174: nco_var_dpl (nco_var_utl.c:547)
==13796== by 0x8055D4F: nco_var_cnf_dmn (nco_cnf_dmn.c:170)
==13796== by 0x80563B9: ncap_var_cnf_dmn (nco_cnf_dmn.c:327)
==13796== by 0x804CB85: ncap_var_var_sub (ncap_utl.c:481)
==13796== by 0x804FBDE: yyparse (ncap_yacc.y:477)
==13796== by 0x804B0EF: main (ncap.c:557)
==13796==
==13796==
==13796== 32824 bytes in 4 blocks are still reachable in loss record 11 of 11
==13796== at 0x1B8FF649: malloc (vg_replace_malloc.c:220)
==13796== by 0x805361D: yyalloc (lex.yy.c:3016)
==13796== by 0x8050C67: yylex (lex.yy.c:1095)
==13796== by 0x804E77B: yyparse (ncap_yacc.c:1257)
==13796== by 0x804B0EF: main (ncap.c:557)
==13796==
==13796== LEAK SUMMARY:
==13796== definitely lost: 6956 bytes in 338 blocks.
==13796== indirectly lost: 9264 bytes in 308 blocks.
==13796== possibly lost: 0 bytes in 0 blocks.
==13796== still reachable: 35032 bytes in 43 blocks.
==13796== suppressed: 0 bytes in 0 blocks.
Hi Henry,
A few days ago you committed changes to ncap_lex.l, and possibly
other parts of ncap as well. Soon thereafter, ncap began seg-faulting every time I run it. Could you please check whether
the segfaults are due to your commit, and, if so, back out the
patch that causes the segfaults? Please post a message to the
developer's group when you commit anything that changes the
result of a regression test (and this does).
If the recent ncap segfaults were caused by something else,
let me know and I'll take a look.
Thanks!
Charlie
Hi Charlie,
will tackle problem as soon as I get a login on the ESMF machine. I tried last night to speak to the right person- but our wires got crossed - will try again later this evening.
On my machine ncap.in runs fine - but havent got the juice to run the main bm's.
FYI --- will be signing off on Monday the shopping cart software I've done for Colin Narbeth . The jobs been a f**king nightmare from beginning to end. So I will have more time and be less stressed
Speak to you soon
Henry
Hi Henry,
If you follow the ESMF account application instructions at
http://www.ess.uci.edu/esmf/accounts.html
there should be no need for phone calls.
I have a project that sound like Colin Narbeth.
I wish that such projects came with eject seats.
The ncap segfault will be difficult to isolate on the ESMF, I think.
Best bet is an x86 running valgrind.
Even if it does not segfault, it will flag the memory error.
My guess is a double-free() at the end of the code in the
cleanup section, since the output file looks fine.
Best,
Charlie
Hi Charlie,
Have plugged a few more leaks -- fixed TODO ncap69.
Am running valgrind as follows:
valgrind --leak-check=yes --show-reachable=yes --tool=memcheck ncap -O -S ncap.in in.nc foo.nc >& vg_out
Still outstanding 3 leaks wilth yylex --Not sure how to tackle these as they are leaks with the automatically generated code. I don't want to add calls to yy_flush_buffer() or other functions as this could create compatability problems on some of the platforms.
Still have 3 leaks associated nco_var_cnf_dmn().
Regards Henry
Hi Henry,
Yes, that is basically how I would use valgrind on ncap.
I should have mentioned: Don't worry about memory
allocated by automatically generated code like bison and flex.
There's nothing much we can do about that code, so it does
not count as an NCO memory leak in my book.
Leaks associated with nco_var_cnf_dmn() are our responsibility.
Let me know if you think it's something in the base code, or
something ncap-specific. I'll be happy to look at it after you've
given a crack at trying to plug the leaks.
Thanks,
Charlie
Hi Charlie,
Have been (val)grinding away. With ncap.in we now have 8 outstanding leaks
3 lex/yacc leaks -- connected with automatically generated code.
1 ncap_var_stretch() leak
4 ncap_var_cnf_dmn () leaks
Whas happening with ncap_var_cnf_dmn is weird. when the two variables are made to conform the is NO memory leak.
when the function is called when the variables already conform then we get leakage. Try running the following.
valgrind --leak-check=yes --show-reachable=yes --tool=memcheck ncap -v -O -s "t=three_dmn_var_dbl+three_dmn_var_int" in.nc foo.nc >& vg_lst
Regards Henry
Good progress so far. I'll a look at your example.
It always helps to have a specific example like this.
Charlie
Hi Henry,
I took a crack or two at the leak in ncap_var_cnf_dmn().
I've committed one significant change, which still leaves
the original problem, but may be an improvement.
I didn't want to change more until I hear what you think.
First:
Conditionally free both $1 and $3 in var,var addition in ncap_yacc.y
The reason for this is that clearly at least one of them
is sometimes not being free()d.
This parallels the reasoning in nco_cnf_dmn.c lines 329-338.
For some reason, that patch does not work, but it's clear to
me that there's a problem in the memory management there.
Please look at this changes and see if it makes sense to you.
If the second change does, then clearly it should be applied to
the other variable-involved operations in ncap_yacc.y.
Otherwise it should be ripped out.
Finally, note that unlike your previous statement, I always
see memory leaks with addition et al. regardless of whether the
variables already conform, so that a simpler test case is
valgrind --leak-check=yes --show-reachable=yes
--suppressions=${HOME}/c++/valgrind.txt --tool=memcheck ncap -O -v -s
"foo=one+one" -p ${HOME}/nco/data -l /tmp in.nc ~/foo.nc > ~/foo 2>&1
Thanks,
Charlie
Hi Henry,
I have managed to convince myself that the cause of the large
memory leak in ncap is due to dangling pointers to the input
variables to ncap_var_cnf_dmn(). You can see in lines 342-343
of nco_cnf_dmn.c where I attempted to free() these pointers.
if(*var_1 != var_1_org) var_1_org=nco_var_free(var_1_org);
if(*var_2 != var_2_org) var_2_org=nco_var_free(var_2_org);
However, doing so causes much breakage and I'm not sure why.
Am hoping you can investigate further.
Perhaps there's another part of ncap which I'm not aware of
that depends on the contents of the input variables not being
free'd in that routine. That would be an error that valgrind
could point you to, i.e., which code tries to use the free'd
memory when you uncomment those lines.
My earlier modification to ncap_yacc.y lines 478-479 is still
in, as I think it is also necessary, in principle. But you might
want to back it out while tracking down the ncap_var_cnf_dmn()
problem. Let me know if you get to another sticking point and
I can dig deeper.
Thanks,
Charlie
Hi Charlie,
Currently taking a long hard look at
nco_var_cnf_dmn(). Will let you know when I get
some positive results
Didn't you get any similar problems with it when you leakproofed the other operators ?
Regards Henry
>Didn't you get any similar problems with it when you leakproofed >the other operators ?
No. There is no "leak" per se in nco_var_cnf_dmn().
It is used extensively by ncwa and ncbo.
However, those programs know to clean up after themselves.
The problem, if any, is in the way ncap_var_cnf_dmn()
wraps nco_var_cnf_dmn(), and loses track of the initial variables.
If I could get those free() statements in ncap_var_cnf_dmn()
to work, then I think the problem would be solved.
The fact that they don't work leads me to suspect that ncap
has other dependencies on the entering variables which I've
forgotten or never knew. Feel free to give me a call now.
Understanding the nco_var_cnf_dmn() algorithm, and thus
how to wrap it, is non-trivial...
Charlie
Hi Charlie,
Commited ncap code
Almost finished memory tidy up
still left with a problem in ncap_var_stretch()
c.f
valgrind --leak-check=yes --show-reachable=yes --tool=memcheck ncap -O -v -s "a7[time,lat,lon]=3.14d" in.nc foo.nc >& vg_cmd2
Regards Henry
Hi Henry,
I just committed the patch to fix the above-mentioned leak in
ncap_var_stretch(). Good news! This means that all known ncap
leaks that are fixable, are fixed! All the operators are leak free
except for a small leak in ncatted which is of no consequence.
Charlie