Re: [Valgrind-developers] Characteristics of VG simulated CPU

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Am 10.04.2015 um 10:44 schrieb Alex:
> Can someone provide a quick explanation what are the characteristics
> of VG simulated CPU (cache, cores, core speed, threads)?

Cachegrind/Callgrind simulate one 2-level cache hierarchy with separate
L1 data and L1 instuction caches, and unified L2. L1 and L2 are inclusive
(not strict inclusive) with write-allocate and LRU replacement.
Cache parameters (associativity/sizes) are taken per default from the CPU
you run VG on. For newer Intel CPUs with L3, the real L3 parameters are
used for the L2 in the cache model.

As events, you get number of instructions executed (= fetched from L1),
data read and written from/to L1, L1D/L1I and L2 misses.

Valgrind serializes threads with a timeslice of 100000 superblocks (which
may be around a million guest instructions), and all threads make use of the
same cache hierarchy. So it would compare to a single-core with
corresponding
timeslicing.

Any time estimation has to be derived from the collected event counters,
and you can use your own assumed core speed here. It allows a
1 clock/instruction core performance plus some penalties for L1/L2 misses.

The counters do not provide information about whether an evicted cache
line was dirty or clean; thus, the cache model not even makes a distinction
between write-through/write-back L1 or L2, as it does not matter.

However, in Callgrind, optionally you can switch on tracking of
dirty/clean in L2 (--simulate-wb=yes), resulting on more event types.
Further, "-simulate-hwpref=yes" enables simulation of a page-based
L2 stream hardware prefetcher (which makes every prefetched line into
an L2 hit).

> I need
> benchmark client code for different hardware with my VG tool plugin.

What does your tool provide?

Josef

> 
> Alex.
>