|
From: John R. <jr...@bi...> - 2012-06-02 15:21:41
|
On 06/02/2012 07:11 AM, Florian Krohm wrote: > On 06/02/2012 03:40 AM, Philippe Waroquiers wrote: >> On Sat, 2012-06-02 at 00:03 -0400, Florian Krohm wrote: >>> >From 40 bytes to 32 bytes on LP64. From 20 to 16 bytes on ILP32. >>> Same procedure as in previous patch for Qop. This time putting the Triop >>> bits into a separate structure IRTriop. >>> >>> Here are some numbers showing the memory allocated by VEX in bytes for >>> amd64/ppc64/s390x. The net is: it's always a win and the savings are >>> pretty consistent 4-5% across platforms. >> That is nice. >> What is the related performance improvement ? > > I did not measure. The perf bucket is not very good measuring the effect > of these micro improvements -- unfortunately. While the reduction in size is welcome (especially to "nice" sizes such as 32 and 16 bytes), the cost of the additional indirection could be large due to memory latency, fragmentation of additional allocation, cache pressure or direct cache misses. Such effects would be visible in the wall-clock time that is reported by the bash shell "time" function (or similar); "perf" is not needed. So, what does "time" say about the before+after wall-clock latency? -- |