|
From: Josef W. <Jos...@gm...> - 2015-04-10 13:02:40
|
Am 10.04.2015 um 10:44 schrieb Alex: > Can someone provide a quick explanation what are the characteristics > of VG simulated CPU (cache, cores, core speed, threads)? Cachegrind/Callgrind simulate one 2-level cache hierarchy with separate L1 data and L1 instuction caches, and unified L2. L1 and L2 are inclusive (not strict inclusive) with write-allocate and LRU replacement. Cache parameters (associativity/sizes) are taken per default from the CPU you run VG on. For newer Intel CPUs with L3, the real L3 parameters are used for the L2 in the cache model. As events, you get number of instructions executed (= fetched from L1), data read and written from/to L1, L1D/L1I and L2 misses. Valgrind serializes threads with a timeslice of 100000 superblocks (which may be around a million guest instructions), and all threads make use of the same cache hierarchy. So it would compare to a single-core with corresponding timeslicing. Any time estimation has to be derived from the collected event counters, and you can use your own assumed core speed here. It allows a 1 clock/instruction core performance plus some penalties for L1/L2 misses. The counters do not provide information about whether an evicted cache line was dirty or clean; thus, the cache model not even makes a distinction between write-through/write-back L1 or L2, as it does not matter. However, in Callgrind, optionally you can switch on tracking of dirty/clean in L2 (--simulate-wb=yes), resulting on more event types. Further, "-simulate-hwpref=yes" enables simulation of a page-based L2 stream hardware prefetcher (which makes every prefetched line into an L2 hit). > I need > benchmark client code for different hardware with my VG tool plugin. What does your tool provide? Josef > > Alex. > |