|
From: Per F. <fra...@cs...> - 2004-06-04 12:04:12
|
Hi, I'm wondering exactly what it means that the SSE instructions are implemented. I tried running a small program written to benefit from prefetching (and it *does* run faster with prefetching) but got no changes in the cache statistics from cachegrind. Thanks in advance for any reply /Per |
|
From: Tom H. <th...@cy...> - 2004-06-04 12:51:30
|
In message <Pine.GSO.4.33.0406041359400.25947-100000@ygg>
Per Fransson <fra...@cs...> wrote:
> I'm wondering exactly what it means that the SSE instructions are
> implemented. I tried running a small program written to benefit from
> prefetching (and it *does* run faster with prefetching) but got no changes
> in the cache statistics from cachegrind.
As far as I know all SSE and SSE2 instructions are implemented with
one single exception. The prefetch instructions however are simply
ignored because they have no effect on a program other than possibly
speeding it up, so it will still execute correctly without them.
As you point out however, and as somebody else pointed out the other
day for the first time, that does mean that cachegrind can produce
misleading results, so it may be that we will have to start recording
the prefetchs so that cachegrind can process them.
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|
|
From: Josef W. <Jos...@gm...> - 2004-06-04 13:19:43
|
On Friday 04 June 2004 14:04, Per Fransson wrote: > Hi, > > I'm wondering exactly what it means that the SSE instructions are > implemented. I tried running a small program written to benefit from > prefetching (and it *does* run faster with prefetching) but got no changes > in the cache statistics from cachegrind. What do you expect? Prefetching does not change much about absolute miss numbers (they could go up, though). The effect of prefetching is overlapping activity and reduced latency. But these can't be reflected by the event numbers cachegrind does produce. And as Cachegrind does not simulate any time (very complex), it's difficult to get out any metric which would be able to reflect prefetch effects. Still, of course it is possible to simulate the effect of the SSE prefetch instruction. Either simply ignore L2 misses because of prefetches, or introduce an event type "L2 Miss because of Prefetch", and for any cycle estimation formula (like in KCachegrind), give this event type a coefficient of 0, i.e. suppose any prefetch can be done fully overlapping with computation. For Calltree, I once implemented a primitive hardware prefetch algorithm (stream detection), and ignored any L2 misses generated by the algorithm. The problem with simulation of prefetching is the unknown and changing properties/algorithms implemented in the HW, even for software prefetching (i.e. the SSE prefetch instruction): Lately in a small benchmark I found out that in the P4, any software prefetching is ignored when this would lead to a TLB miss (makes sense, as this gives TLB pollution). But on the Pentium-M (Banias), these prefetches are done even if generating TLB misses. This makes the usage of any software prefetching very fragile and difficult in practice. Josef > > Thanks in advance for any reply > > /Per > > > > ------------------------------------------------------- > This SF.Net email is sponsored by the new InstallShield X. > From Windows to Linux, servers to mobile, InstallShield X is the one > installation-authoring solution that does it all. Learn more and > evaluate today! http://www.installshield.com/Dev2Dev/0504 > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users |