|
From: Nicholas N. <n.n...@gm...> - 2009-04-15 22:28:16
|
On Thu, Apr 16, 2009 at 5:03 AM, Dominic Account
<zer...@go...> wrote:
> Dear Nicholas
>
> I hope it is fine to bug you directly with my question about CacheGrind:
It's better to contact the lists, as other people may be able to
answer your questions better and/or faster than me.
> First of all thank you for writing CacheGrind. It is a very good
> starting point for me!
> I am currently trying to extend it to support multi-core cache
> simulation (MESI protocol, 1:1 thread/L1 cache mapping).
>
> However, there is one thing which I find puzzling:
>
> From what I understand CacheGrind tries to combine
> "read/write/instruction-events"
> in order to improve performance ("addEvent_Dw" e.g. merges writes with
> preceding reads Dr+Dw=Dm)
>
> The instrumentation in "flushEvents" - however - turns all Dm-events
> into Dr-instrumentation.
>
> I assume this hides all writes which get merged in "addEvent_Dw" and
> all writes that happen
> in Dm-events constructed in "cg_instrument".
>
> Thus the cache-statistics are partially wrong !? The number of writes
> should be too low.
>
> I stumbled about this when I included memory bus event annotations in
> "InstrInfo".
> "log_0I_1Dw_cache_access" and "log_1I_1Dw_cache_access" never ever
> reported locked
> writes (locked = the Intel instruction prefix for cache exclusive
> reads) but "log_0I_1Dr_cache_access"
> would happily report locked reads. For my CMP-cache simulation I must not loose
> those writes...
>
> I temporarily disabled merging in "addEvent_Dw" and immediately saw
> locked writes!
>
> My current assumption is that CacheGrind was not designed to be really
> accurate - or would you consider it a bug?
It's been several years since I wrote that code and my memory of the
cache simulation stuff is hazy. What I remember is that a "modify"
event is, for some reason, equivalent to a "read" event, in terms of
what the cache has to do. So converting "modify" events into "read"
events is reasonable. Ie. it's a deliberate decision.
I can't remember now the hardware details of why this is so; Josef
might have a suggestion. Whether it is true for multiprocessor
machines is another question.
Hope this helps.
Nick
|