|
From: Bart V. A. <bar...@gm...> - 2006-08-22 16:18:47
|
Hello Julian,
Regarding interception of malloc() and free(): the only reason drd
intercepts those is for recording the call stack at the time malloc() is
called, such that this call stack can be included when reporting a data
race. I don't think this causes a lot of overhead ?
Before I can say something about the memory use of drd, I have to
explain its algorithm. In short, the data race detection algorithm works as
follows:
- A process being analyzed by drd consists of a number of threads.
- Each thread consists of a sequence of actions. The actions relevant to drd
are: memory load, memory store, and synchronization actions (lock mutex,
unlock mutex, create thread, join thread, ...).
- The sequence of load and stores performed by a single thread between two
successive synchronization actions is called a segment.
- The drd tool records the order in which segments are executed. Within a
thread, this order is represented by a single integer number. The order over
threads is represented by something called a "vector clock". This is a
standard way of representing the partial order relationship between actions
performed by different threads.
- For each segment it is recorded via a three-level bitmap which memory
locations have been read from or written to (at the lowest level, two bits
are needed per byte: one bit representing read access, one representing
write access).
- Actions within a thread are always ordered, actions performed by different
threads are only considered as ordered when an order has been enforced by
synchronization actions.
- A data race is defined as two threads that access the same memory
location, where at least one of the two threads performs a write action, and
the order between the two accesses is not enforced by a synchronization
action.
- Segments that can no longer be involved in a data race are freed
(VG_(free)()).
This means that the memory consumption of the drd tool is proportional to:
- the number of threads running simultaneously.
- the number of segments allocated within a thread.
- the amount of unused memory allocated within the bitmap allocated for a
segment. When e.g. iterating over a byte-array, and only reading every
1024th byte, only 0.1% of the memory allocated for the bitmap will be used
-- very inefficient.
Or: it's not yet clear to me which of the above three reasons applies to
konqueror.
It must be possible however to run software like konqueror under drd, since
it is possible with DIOTA. DIOTA uses the same approach as drd. One of the
differences between DIOTA and drd that I know of, is that DIOTA uses a
9-level bitmap while drd only uses a 3-level bitmap.
I hope the above explanation is comprehensible ?
On 8/22/06, Julian Seward <js...@ac...> wrote:
>
>
> > So now, I can start konqueror and it runs for ~ 60 seconds
> > (doing fontconfig crap) but I had to control-C it before the konq
> > window appeared, due to memory use exceeding 450MB.
>
> Update: I can get the konq main window, but it exhausts my 2G swap
> partition before it can render the first page. At the point the process
> died its total size was using about 2600M. Watching it with top,
> there were many places where it seemed to increase in size at a
> rate of almost 20MB/sec, and I believe it averaged 10MB/sec overall.
>
> One thing I observe is that you're intercepting malloc/free. Is that
> necessary? Does that make the algorithm work better somehow?
>
> J
>
--
Met vriendelijke groeten,
Bart Van Assche.
|