Menu

ncap memory clean up

Developers
2005-05-30
2013-10-17
  • Charlie Zender

    Charlie Zender - 2005-05-30

    Hi Henry,

    I've begun valgrind'ing ncap.
    There's no shortage of memory leaks and errors.
    I would like your help getting these cleaned up in the
    next few weeks. That means running valgrind and
    looking for memory problems.
    I think bug squashing with valgrind is kind of fun.
    It's when valgrind stops pointing out problems yet
    there are still bugs that I break out in a cold sweat.

    Anyway, TODO ncap68 is now the purify ncap item.
    The first memory error, ncap69 should be tackled
    first.

    Thanks,
    Charlie

     
    • Nobody/Anonymous

      Hi Charlie,
      Just getting into valgrind.
      what valgrind command line did you use to find TODO69

      Regards Henry

       
      • Charlie Zender

        Charlie Zender - 2005-06-02

        I used TODO ncap68, i.e., I ran the whole ncap.in script through
        valgrind.

        Charlie

         
    • Nobody/Anonymous

      Hi Charlie,
      I ran valgrind as follows.
      valgrind --leak-check=yes --show-reachable=yes --tool=memcheck ncap -O -S ncap.in in.nc foo.nc
      But I cannot see in the output where you get a message about "var_ycc[idx]->undefined" being unitialized.

      Are there some #defines I should be putting in the code ?
      Can you also please send me your suppresions file

      Many Thanks

      Henry

       
      • Charlie Zender

        Charlie Zender - 2005-06-14

        Continuing with the previous post, if you look at the error,
        line 570 of ncap.c is:

              if(var_ycc[idx]->undefined){
            var_ycc[idx]=nco_var_free(var_ycc[idx]);
            continue;

        Hence the problem mentioned in my original post is still flagged
        by valgrind as being a memory error.

        There are no #defines required by valgrind.

        As for my suppressions file, get an account on the ESMF by following the instructions at

        http://www.ess.uci.edu/esmf/accounts.html

        I believe I've asked you to do this in the past so you would have
        access to large, fast machine to aid in development/benchmarking.
        Once you obtain your account, my suppressions file is in

        ~zender/c++/valgrind.txt

        I received your ncap roadmap and am still evaluating it and trying
        to decide what the priorities are. I should have comments back to you within a week.

        Thanks,
        Charlie

         
    • Charlie Zender

      Charlie Zender - 2005-06-14

      Hi Henry,

      Apparently I have been un-subscribed from this forum at least
      since you posted this. This has happened to me before.
      Sorry for the late response. Next time I don't respond within a few days please ping me via e-mail. I'm subscribed again and catching up.

      Following is the output from

      valgrind --leak-check=yes --show-reachable=yes --suppressions=${HOME}/c++/valgrind.txt --tool=memcheck ncap -O -S ${HOME}/nco/data/ncap.in -p ${HOME}/nco/data -l /tmp in.nc ~/foo.nc > ~/foo 2>&1

      using

      zender@ashes:~/nco/bld$ valgrind --version
      valgrind-3.0.0.SVN

      which I compiled from source:

      ==13796== Memcheck, a memory error detector.
      ==13796== Copyright (C) 2002-2005, and GNU GPL'd, by Julian Seward et al.
      ==13796== Using LibVEX rev 1203, a library for dynamic binary translation.
      ==13796== Copyright (C) 2004-2005, and GNU GPL'd, by OpenWorks LLP.
      ==13796== Using valgrind-3.0.0.SVN, a dynamic binary instrumentation framework.
      ==13796== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et al.
      ==13796== For more details, rerun with: -v
      ==13796==
      ==13796== Conditional jump or move depends on uninitialised value(s)
      ==13796==    at 0x804B1A2: main (ncap.c:570)
      ncap: WARNING Replacing missing value data in variable val_half_half
      ==13796==
      ==13796== ERROR SUMMARY: 9 errors from 1 contexts (suppressed: 18 from 1)
      ==13796== malloc/free: in use at exit: 51252 bytes in 689 blocks.
      ==13796== malloc/free: 9518 allocs, 8829 frees, 495695 bytes allocated.
      ==13796== For counts of detected errors, rerun with: -v
      ==13796== searching for pointers to 689 not-freed blocks.
      ==13796== checked 145568 bytes.
      ==13796==
      ==13796==
      ==13796== 48 bytes in 4 blocks are still reachable in loss record 1 of 11
      ==13796==    at 0x1B8FF649: malloc (vg_replace_malloc.c:220)
      ==13796==    by 0x1B9C7076: (within /lib/tls/i686/cmov/libc-2.3.2.so)
      ==13796==    by 0x1B9C8947: (within /lib/tls/i686/cmov/libc-2.3.2.so)
      ==13796==    by 0x1B9C71FF: (within /lib/tls/i686/cmov/libc-2.3.2.so)
      ==13796==    by 0x1B9C7D55: (within /lib/tls/i686/cmov/libc-2.3.2.so)
      ==13796==    by 0x1B9C65DD: localtime (in /lib/tls/i686/cmov/libc-2.3.2.so)
      ==13796==    by 0x1B9C64AC: ctime (in /lib/tls/i686/cmov/libc-2.3.2.so)
      ==13796==    by 0x804A349: main (ncap.c:268)
      ==13796==
      ==13796==
      ==13796== 232 bytes in 1 blocks are still reachable in loss record 2 of 11
      ==13796==    at 0x1B8FF649: malloc (vg_replace_malloc.c:220)
      ==13796==    by 0x805F1A0: nco_malloc (nco_mmr.c:85)
      ==13796==    by 0x8066775: nco_var_lst_mk (nco_var_lst.c:86)
      ==13796==    by 0x804B299: main (ncap.c:587)
      ==13796==
      ==13796==
      ==13796== 248 bytes in 31 blocks are still reachable in loss record 3 of 11
      ==13796==    at 0x1B8FF649: malloc (vg_replace_malloc.c:220)
      ==13796==    by 0x1B9ABAFE: strdup (in /lib/tls/i686/cmov/libc-2.3.2.so)
      ==13796==    by 0x1B9C71DE: (within /lib/tls/i686/cmov/libc-2.3.2.so)
      ==13796==    by 0x1B9C7D55: (within /lib/tls/i686/cmov/libc-2.3.2.so)
      ==13796==    by 0x1B9C65DD: localtime (in /lib/tls/i686/cmov/libc-2.3.2.so)
      ==13796==    by 0x1B9C64AC: ctime (in /lib/tls/i686/cmov/libc-2.3.2.so)
      ==13796==    by 0x804A349: main (ncap.c:268)
      ==13796==
      ==13796==
      ==13796== 704 bytes in 2 blocks are still reachable in loss record 4 of 11
      ==13796==    at 0x1B8FF649: malloc (vg_replace_malloc.c:220)
      ==13796==    by 0x1B996338: (within /lib/tls/i686/cmov/libc-2.3.2.so)
      ==13796==    by 0x1B998879: fopen64 (in /lib/tls/i686/cmov/libc-2.3.2.so)
      ==13796==    by 0x804B084: main (ncap.c:546)
      ==13796==
      ==13796==
      ==13796== 776 bytes in 67 blocks are indirectly lost in loss record 5 of 11
      ==13796==    at 0x1B8FF649: malloc (vg_replace_malloc.c:220)
      ==13796==    by 0x805F1A0: nco_malloc (nco_mmr.c:85)
      ==13796==    by 0x806D231: nco_var_dpl (nco_var_utl.c:559)
      ==13796==    by 0x8055D4F: nco_var_cnf_dmn (nco_cnf_dmn.c:170)
      ==13796==    by 0x80563B9: ncap_var_cnf_dmn (nco_cnf_dmn.c:327)
      ==13796==    by 0x804CB85: ncap_var_var_sub (ncap_utl.c:481)
      ==13796==    by 0x804FBDE: yyparse (ncap_yacc.y:477)
      ==13796==    by 0x804B0EF: main (ncap.c:557)
      ==13796==
      ==13796==
      ==13796== 976 bytes in 1 blocks are still reachable in loss record 6 of 11
      ==13796==    at 0x1B8FF649: malloc (vg_replace_malloc.c:220)
      ==13796==    by 0x1B9C83AA: (within /lib/tls/i686/cmov/libc-2.3.2.so)
      ==13796==    by 0x1B9C71FF: (within /lib/tls/i686/cmov/libc-2.3.2.so)
      ==13796==    by 0x1B9C7D55: (within /lib/tls/i686/cmov/libc-2.3.2.so)
      ==13796==    by 0x1B9C65DD: localtime (in /lib/tls/i686/cmov/libc-2.3.2.so)
      ==13796==    by 0x1B9C64AC: ctime (in /lib/tls/i686/cmov/libc-2.3.2.so)
      ==13796==    by 0x804A349: main (ncap.c:268)
      ==13796==
      ==13796==
      ==13796== 2070 bytes in 215 blocks are definitely lost in loss record 7 of 11
      ==13796==    at 0x1B8FF649: malloc (vg_replace_malloc.c:220)
      ==13796==    by 0x1B9ABAFE: strdup (in /lib/tls/i686/cmov/libc-2.3.2.so)
      ==13796==    by 0x8051E9D: yylex (ncap_lex.l:564)
      ==13796==    by 0x804E77B: yyparse (ncap_yacc.c:1257)
      ==13796==    by 0x804B0EF: main (ncap.c:557)
      ==13796==
      ==13796==
      ==13796== 2516 bytes in 227 blocks are indirectly lost in loss record 8 of 11
      ==13796==    at 0x1B8FF649: malloc (vg_replace_malloc.c:220)
      ==13796==    by 0x1B9ABAFE: strdup (in /lib/tls/i686/cmov/libc-2.3.2.so)
      ==13796==    by 0x806D144: nco_var_dpl (nco_var_utl.c:543)
      ==13796==    by 0x8055D4F: nco_var_cnf_dmn (nco_cnf_dmn.c:170)
      ==13796==    by 0x80563B9: ncap_var_cnf_dmn (nco_cnf_dmn.c:327)
      ==13796==    by 0x804CB85: ncap_var_var_sub (ncap_utl.c:481)
      ==13796==    by 0x804FBDE: yyparse (ncap_yacc.y:477)
      ==13796==    by 0x804B0EF: main (ncap.c:557)
      ==13796==
      ==13796==
      ==13796== 14150 (4886 direct, 9264 indirect) bytes in 123 blocks are definitely lost in loss record 9 of 11
      ==13796==    at 0x1B8FF649: malloc (vg_replace_malloc.c:220)
      ==13796==    by 0x805F1A0: nco_malloc (nco_mmr.c:85)
      ==13796==    by 0x8051B28: yylex (ncap_lex.l:452)
      ==13796==    by 0x804E77B: yyparse (ncap_yacc.c:1257)
      ==13796==    by 0x804B0EF: main (ncap.c:557)
      ==13796==
      ==13796==
      ==13796== 5972 bytes in 14 blocks are indirectly lost in loss record 10 of 11
      ==13796==    at 0x1B8FF649: malloc (vg_replace_malloc.c:220)
      ==13796==    by 0x805F2C4: nco_malloc_dbg (nco_mmr.c:149)
      ==13796==    by 0x806D174: nco_var_dpl (nco_var_utl.c:547)
      ==13796==    by 0x8055D4F: nco_var_cnf_dmn (nco_cnf_dmn.c:170)
      ==13796==    by 0x80563B9: ncap_var_cnf_dmn (nco_cnf_dmn.c:327)
      ==13796==    by 0x804CB85: ncap_var_var_sub (ncap_utl.c:481)
      ==13796==    by 0x804FBDE: yyparse (ncap_yacc.y:477)
      ==13796==    by 0x804B0EF: main (ncap.c:557)
      ==13796==
      ==13796==
      ==13796== 32824 bytes in 4 blocks are still reachable in loss record 11 of 11
      ==13796==    at 0x1B8FF649: malloc (vg_replace_malloc.c:220)
      ==13796==    by 0x805361D: yyalloc (lex.yy.c:3016)
      ==13796==    by 0x8050C67: yylex (lex.yy.c:1095)
      ==13796==    by 0x804E77B: yyparse (ncap_yacc.c:1257)
      ==13796==    by 0x804B0EF: main (ncap.c:557)
      ==13796==
      ==13796== LEAK SUMMARY:
      ==13796==    definitely lost: 6956 bytes in 338 blocks.
      ==13796==    indirectly lost: 9264 bytes in 308 blocks.
      ==13796==      possibly lost: 0 bytes in 0 blocks.
      ==13796==    still reachable: 35032 bytes in 43 blocks.
      ==13796==         suppressed: 0 bytes in 0 blocks.

       
    • Charlie Zender

      Charlie Zender - 2005-06-16

      Hi Henry,

      A few days ago you committed changes to ncap_lex.l, and possibly
      other parts of ncap as well. Soon thereafter, ncap began seg-faulting every time I run it. Could you please check whether
      the segfaults are due to your commit, and, if so, back out the
      patch that causes the segfaults? Please post a message to the
      developer's group when you commit anything that changes the
      result of a regression test (and this does).
      If the recent ncap segfaults were caused by something else,
      let me know and I'll take a look.

      Thanks!
      Charlie

       
    • Nobody/Anonymous

      Hi Charlie,
      will tackle problem as soon as I get  a  login on the ESMF machine. I tried last night to speak to the right person- but our wires got crossed - will try again later this evening.
      On my machine ncap.in runs fine  - but  havent got the juice to run the main bm's.
      FYI --- will be signing off on Monday the shopping cart software I've done for Colin Narbeth . The jobs been a f**king nightmare from beginning to end. So I will have more time and be less stressed

      Speak to you soon

      Henry

       
      • Charlie Zender

        Charlie Zender - 2005-06-17

        Hi Henry,

        If you follow the ESMF account application instructions at

        http://www.ess.uci.edu/esmf/accounts.html

        there should be no need for phone calls.

        I have a project that sound like Colin Narbeth.
        I wish that such projects came with eject seats.

        The ncap segfault will be difficult to isolate on the ESMF, I think.
        Best bet is an x86 running valgrind.
        Even if it does not segfault, it will flag the memory error.
        My guess is a double-free() at the end of the code in the
        cleanup section, since the output file looks fine.

        Best,
        Charlie

         
    • Nobody/Anonymous

      Hi Charlie,
      Have plugged a few more leaks -- fixed TODO ncap69.
      Am running valgrind as follows:
      valgrind  --leak-check=yes --show-reachable=yes --tool=memcheck ncap -O -S ncap.in in.nc foo.nc >& vg_out

      Still outstanding 3 leaks wilth yylex --Not sure how to tackle these as they are leaks with the automatically generated code. I don't want to add calls to yy_flush_buffer() or other functions as this could create compatability problems on some of the platforms.

      Still have 3 leaks associated nco_var_cnf_dmn().

      Regards Henry

       
      • Charlie Zender

        Charlie Zender - 2005-06-22

        Hi Henry,

        Yes, that is basically how I would use valgrind on ncap.
        I should have mentioned: Don't worry about memory
        allocated by automatically generated code like bison and flex.
        There's nothing much we can do about that code, so it does
        not count as an NCO memory leak in my book.

        Leaks associated with nco_var_cnf_dmn() are our responsibility.
        Let me know if you think it's something in the base code, or
        something ncap-specific. I'll be happy to look at it after you've
        given a crack at trying to plug the leaks.

        Thanks,
        Charlie

         
    • Nobody/Anonymous

      Hi Charlie,
      Have been (val)grinding away. With ncap.in  we now have 8 outstanding leaks

      3 lex/yacc leaks -- connected with automatically generated code.
      1 ncap_var_stretch() leak
      4 ncap_var_cnf_dmn () leaks

      Whas happening with ncap_var_cnf_dmn is weird. when the two variables are made to conform the is NO memory leak.
      when the function is called when the variables already conform then we get leakage. Try running the following.

      valgrind --leak-check=yes --show-reachable=yes --tool=memcheck ncap -v -O -s "t=three_dmn_var_dbl+three_dmn_var_int" in.nc foo.nc >& vg_lst

      Regards Henry

       
      • Charlie Zender

        Charlie Zender - 2005-06-29

        Good progress so far. I'll a look at your example.
        It always helps to have a specific example like this.

        Charlie

         
    • Nobody/Anonymous

      Hi Henry,

      I took a crack or two at the leak in ncap_var_cnf_dmn().
      I've committed one significant change, which still leaves
      the original problem, but may be an improvement.
      I didn't want to change more until I hear what you think.

      First:
      Conditionally free both $1 and $3 in var,var addition in ncap_yacc.y

      The reason for this is that clearly at least one of them
      is sometimes not being free()d.
      This parallels the reasoning in nco_cnf_dmn.c lines 329-338.
      For some reason, that patch does not work, but it's clear to
      me that there's a problem in the memory management there.

      Please look at this changes and see if it makes sense to you.
      If the second change does, then clearly it should be applied to
      the other variable-involved operations in ncap_yacc.y.
      Otherwise it should be ripped out.

      Finally, note that unlike your previous statement, I always
      see memory leaks with addition et al. regardless of whether the
      variables already conform, so that a simpler test case is

      valgrind --leak-check=yes --show-reachable=yes
      --suppressions=${HOME}/c++/valgrind.txt --tool=memcheck ncap -O -v -s
      "foo=one+one" -p ${HOME}/nco/data -l /tmp in.nc ~/foo.nc > ~/foo 2>&1

      Thanks,
      Charlie

       
      • Charlie Zender

        Charlie Zender - 2005-06-29

        Hi Henry,

        I have managed to convince myself that the cause of the large
        memory leak in ncap is due to dangling pointers to the input
        variables to ncap_var_cnf_dmn(). You can see in lines 342-343
        of nco_cnf_dmn.c where I attempted to free() these pointers.

          if(*var_1 != var_1_org) var_1_org=nco_var_free(var_1_org);
          if(*var_2 != var_2_org) var_2_org=nco_var_free(var_2_org);

        However, doing so causes much breakage and I'm not sure why.
        Am hoping you can investigate further.

        Perhaps there's another part of ncap which I'm not aware of
        that depends on the contents of the input variables not being
        free'd in that routine. That would be an error that valgrind
        could point you to, i.e., which code tries to use the free'd
        memory when you uncomment those lines.

        My earlier modification to ncap_yacc.y lines 478-479 is still
        in, as I think it is also necessary, in principle. But you might
        want to back it out while tracking down the ncap_var_cnf_dmn()
        problem. Let me know if you get to another sticking point and
        I can dig deeper.

        Thanks,
        Charlie

         
    • Nobody/Anonymous

      Hi Charlie,
      Currently taking a long hard look at
      nco_var_cnf_dmn(). Will let you know when I get
      some positive results

      Didn't you get any similar problems with it when you leakproofed the other operators ?

      Regards Henry

       
      • Charlie Zender

        Charlie Zender - 2005-06-30

        >Didn't you get any similar problems with it when you leakproofed >the other operators ?

        No. There is no "leak" per se in nco_var_cnf_dmn().
        It is used extensively by ncwa and ncbo.
        However, those programs know to clean up after themselves.

        The problem, if any, is in the way ncap_var_cnf_dmn()
        wraps nco_var_cnf_dmn(), and loses track of the initial variables.
        If I could get those free() statements in ncap_var_cnf_dmn()
        to work, then I think the problem would be solved.
        The fact that they don't work leads me to suspect that ncap
        has other dependencies on the entering variables which I've
        forgotten or never knew. Feel free to give me a call now.
        Understanding the nco_var_cnf_dmn() algorithm, and thus
        how to wrap it, is non-trivial...

        Charlie

         
    • Nobody/Anonymous

      Hi Charlie,
      Commited ncap code 
      Almost finished memory tidy up
      still left with a problem in ncap_var_stretch()
      c.f
      valgrind --leak-check=yes --show-reachable=yes --tool=memcheck ncap -O -v -s "a7[time,lat,lon]=3.14d"  in.nc foo.nc >& vg_cmd2

      Regards Henry

       
    • Charlie Zender

      Charlie Zender - 2005-07-08

      Hi Henry,

      I just committed the patch to fix the above-mentioned leak in
      ncap_var_stretch(). Good news! This means that all known ncap
      leaks that are fixable, are fixed! All the operators are leak free
      except for a small leak in ncatted which is of no consequence.

      Charlie

       

Log in to post a comment.