|
From: <sv...@va...> - 2005-11-13 17:57:38
|
Author: njn
Date: 2005-11-13 17:57:32 +0000 (Sun, 13 Nov 2005)
New Revision: 5117
Log:
Inline cachesim_*_doref(). This gains about 5--10% in speed.
Modified:
trunk/cachegrind/cg_sim.c
Modified: trunk/cachegrind/cg_sim.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- trunk/cachegrind/cg_sim.c 2005-11-13 16:52:56 UTC (rev 5116)
+++ trunk/cachegrind/cg_sim.c 2005-11-13 17:57:32 UTC (rev 5117)
@@ -108,7 +108,10 @@
cachesim_initcache(config, &L); =
\
} =
\
=
\
-static /* __inline__ */ =
\
+/* This attribute forces GCC to inline this function, even though it's *=
/ \
+/* bigger than its usual limit. Inlining gains around 5--10% speedup. *=
/ \
+__attribute__((always_inline)) =
\
+static __inline__ =
\
void cachesim_##L##_doref(Addr a, UChar size, ULong* m1, ULong *m2) =
\
{ =
\
register UInt set1 =3D ( a >> L.line_size_bits) & (L.sets_mi=
n_1); \
|
|
From: Josef W. <Jos...@gm...> - 2005-11-13 21:59:23
|
On Sunday 13 November 2005 18:57, sv...@va... wrote: > Author: njn > Date: 2005-11-13 17:57:32 +0000 (Sun, 13 Nov 2005) > New Revision: 5117 > > Log: > Inline cachesim_*_doref(). This gains about 5--10% in speed. If you can force inlining, you should be able to get rid of the macro at all by moving L to a parameter without any negative impact (L is a constant at the inlined places). This would make the code far better for reading. I tried it in Callgrind, but my experiments made performance worse. No idea why. Josef |
|
From: Dirk M. <dm...@gm...> - 2005-11-14 13:48:47
|
On Sunday 13 November 2005 22:57, Josef Weidendorfer wrote: > I tried it in Callgrind, but my experiments made performance worse. > No idea why. You two probably run two different versions of gcc. inlining heuristics are heavily tuned in recent versions of GCC. Dirk |