|
From: David B. <dav...@gm...> - 2013-01-07 11:11:12
|
On Mon, Jan 7, 2013 at 2:01 AM, John Reiser <jr...@bi...> wrote:
> On 01/06/2013 01:24 PM, David Bar wrote:
> > That's true for memcheck if interested in memory corruption issues.
>
> >
>
> > What about massif? It seems that if we only want memory measurements,
> wouldn't it be enough to just handle malloc/free/etc.?
>
> If memory leaks matter then it is not obvious that just handling
> "malloc/free/etc" suffices. If all you want is
> {max,min,integral}(malloc - free)
> and massif doesn't perform as you like, then talk directly to massif.
>
I'm not using Massif as a memory leak detection tool (well, sometimes, but
usually not).
I want all the functionality Massif gives, to get detailed memory analysis,
not just the overall consumption.
However, I fail to see why to do what Massif does, translation and JITing
is required.
I work in a large enterprise, with huge programs with hundreds of DLLs
being loaded. I can tell you that there are currently two projects going on
in my enterprise which mimic Massif functionality using only malloc/free
hooks. Yes, it's idiotic, especially with two such projects - I'm not
calling the shots here...
Valgrind is just deemed too slow and annoying to run, as we have to wait
for several minutes until the program starts to run, and then it
works sluggishly, which is also problematic for daemons which are expected
to respond to requests under a reasonable time limit.
I see the development of such other tools as a waste of time. Massif
already does a great job, and already has great infrastructure in place for
getting called on hooks, getting the backtrace along with debug symbols,
etc.
>
> > What if I want to run memcheck just to find memory leaks? Doesn't seem
> to me that I need actual instrumentation here of all code, no?
>
> Show me the code, or at least explain in detail why this should be true.
> I don't believe the claim. In particular, I have programs which
> use stem+leaf storage for collections of pointers, and these programs
> require _more_ than 100% "observation" just to find memory leaks.
>
> > Even for the complete memcheck functionality, some use cases may live
> well enough with just instrumenting a part of the program.
>
> Extremely unlikely. Sabotaging only log(n) of the memory references can
> totally invalidate nearly every program ever written.
>
> >
> > Say I have a huge program, which contain tons of DLLs, most of them not
> under my responsibility and/or I can't fix the bugs in them. If I just want
> to memcheck my DLL, and am willing to live with the assumption that memory
> my DLL allocates is only read/written by my DLL, it seems reasonable to only
> > translate and handle my DLL, no?
>
> If you are willing to assume that some DLL is an island unto itself,
> then why not test it that way? Also, unless the DLL has been through
> an actual theorem prover or an extensive battery of tests then I
> don't believe the claim "my DLL is an island."
>
>
I'm not saying the theoretical DLL in question is "an island". It is being
used by the application, and it uses various I/Ss from other DLLs.
I sometimes write new code in 1 or 2 DLLs that run within huge programs,
along with hundreds of other DLLs, which were already tested in the past.
It's not trivial to write a test program which uses my new code in the
specific DLL, and properly reinitialize all the other DLLs so that my DLL
would be able to use them.
If all I want to check is that my code behaved well on memory it allocated
and free'd, it seems reasonable to me to tell Valgrind to avoid all the
hard work on the other parts of the program. Yes, I understand, this will
not be accurate, but it may be good enough. For a large application it may
take minutes for the initialization to finish under Valgrind, which is
frustrating.
Perhaps memcheck isn't a good example. But I believe that Massif is, and
perhaps also Cachegrind. Not familiar enough with other Valgrind tools to
say if it may apply to them also.
Again, I know that cachegrind would give inaccurate results if it only sees
some of the instructions executed, but again, if I just want to check a new
fancy algorithm in my specific DLL, it is good enough.
You wanted a specific example - here's one - say I have implemented a table
which is supposed to be cache-efficient. All memory sitting on the table is
allocated within the examined DLL, and when my API is called I don't give
back pointers to the actual data, but only give out copies. And the same
for stores in the table - I copy the data to memory my DLL allocates, and
put pointers in the hash table to the copied data. Assume I provide
other functionality, such as iteration on the table, automatic expiration
of entries, etc.
I may not know the exact use pattern of my hash table by the application. I
could go and gather stats, and write a simulator, but why bother?
If I could just fire-up cachegrind on the application, and it would work
fast, the I could quickly compare the cache behavior of my new
implementation, compared with the old implementation of the same
functionality.
Perhaps it would be good enough if there was a cache that Valgrind would
save code after translation/JIT, and would use it if the checksum/timestamp
of the DLL hasn't changed. This would mean that only the first run would be
slow, and other runs would at least have fast initialization time.
Was caching ever considered?
|