|
From: Julian S. <js...@ac...> - 2015-01-19 11:54:29
|
On 19/01/15 12:17, Florian Krohm wrote: >> As of revs 14874/3070, the branch now runs perf/bz2 in 10% less time >> and perf/tinycc in 12% less time, compared to trunk. At least on >> Intel Haswell. > > Nice! How much of that is due to inlining LibVEX_Alloc (and possibly > other generic tweaks)? It's very workload dependent. Because the JIT now generates somewhat more code than before, the generic tweaks -- which are only inlining of LibVEX_Alloc and hregAMD64_BLAH() -- are necessary to avoid a loss of performance for test cases that generate a lot of code but do little work, most notably perf/bigcode2. For test cases which do a lot of computation but don't have much code (bz2, fbench, ffbench, tinycc) I suspect the generic tweaks have little effect, because the JIT time is small and the the tweaks only improve the performance of the JIT. So most of the improvements you see there are due to improvements in the generated code. Currently it does 64- and 32-bit shadow loads in-line. My plan is to extend that to 16- and 8-bit loads too, and then fill in the arm32 bits to see how well it works on arm32. J |