|
From: Jason P. <met...@gm...> - 2012-12-07 20:14:37
|
Josef, The difference with ferret is in the D1 misses. Even with a native input set of data, there are almost no LL misses on this machine. I'm unsure as to what type of space ferret actually triggers its workload in. As far as prefetching goes, I am under the impression from the Valgrind documentation that Cachegrind does not include a prefetch algorithm. Another larger issue I'm having is that even a benchmark as simple as blackscholes is showing only 1523 I1 misses in the cache sim, and then when run on real hardware, Perf reports an enormous 54,259,977 I1 load misses. This is another question I would like to pose. Thanks, Jason On Fri, Dec 7, 2012 at 3:06 PM, Josef Weidendorfer <Jos...@gm...> wrote: > Am 07.12.2012 20:46, schrieb Jason Palaszewski: > >> Hi, I'm trying to compare the number of cache misses (D1, I1, and LL) >> between what Perf gives me (on the hardware itself) vs. what >> Cachegrind thinks the number of misses should be. The server machine >> has two Sandy Bridge Intel Xeon E5-2430 CPUs on it, and the PARSEC 3.0 >> suite (compiled in gcc-serial format, single threaded) is being run >> through cachegrind for analysis to obtain the number of D1, I1, and LL >> misses vs. the number of real misses on the hardware obtained by >> running the same benchmark binaries through Perf and counting D1 load >> and store misses as well as I1 misses. A ratio of the Perf misses to >> Cachegrind misses holds to about a factor of 1-2x. However, some >> benchmarks like ferret have a much higher number of misses on Perf >> than on Cachegrind. > > > Is this LL misses, or L1 misses? For L1 misses, you may observe much > more misses as real caches are asynchronous, ie. consecutive loads to > the same line will give as much misses as loads, while in Cachegrind > after the first miss all others will be hits. > > Hm. Is ferret pure user-space, or does it trigger work in the kernel? > It may be that the kernel side evicts a lot of data from the cache, > and this becomes visible via user-level cache misses. > > Another possibility is that hardware prefetching is too clever, and > evicts lines to enable prefetching of data which is not actually used. > > Josef > > > Has anyone else done analysis on Perf results vs. >> >> Cachegrind simulated results by running benchmarks on both of these? >> The machines are also running RedHat 5. Thanks for any information. >> >> >> ------------------------------------------------------------------------------ >> LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial >> Remotely access PCs and mobile devices and provide instant support >> Improve your efficiency, and focus on delivering more value-add services >> Discover what IT Professionals Know. Rescue delivers >> http://p.sf.net/sfu/logmein_12329d2d >> _______________________________________________ >> Valgrind-users mailing list >> Val...@li... >> https://lists.sourceforge.net/lists/listinfo/valgrind-users >> > |