|
From: Matthias S. <zz...@ge...> - 2015-06-17 19:43:46
|
On 05.06.2015 15:33, sv...@va... wrote: > Author: sewardj > Date: Fri Jun 5 14:33:46 2015 > New Revision: 15317 > > Log: > arm32-linux only: add handwritten assembly helpers for > MC_(helperc_LOADV32le), MC_(helperc_LOADV16le) and > MC_(helperc_LOADV8). This improves performance by around 5% to 7% in > the best case, for run-of-the-mill integer code. > > Hi! I tried this for x86. I am not yet 100% sure if the code is now the fastest possible way. * Alignment of function and of jump targets? necessary at all? * Should the byte result from the secmap be stored in %dl or extended to 32bits and be in %edx? * Should the asm functions be defined in a .s file? Then the debug info can also point to the file and line number. But the ifdef-ing will be harder. I tried to compare: 1. valgrind-trunk compiled with gcc-4.8.4 2. valgrind-trunk compiled with gcc-4.9.2 3. my modified valgrind-trunk compiled with gcc-4.9.2 The results are mixed. code from gcc-4.8.4 was faster than gcc-4.9.2. and hand-optimized code was sometimes faster than gcc-4.8.4. Maybe I should compile hand-optimized version also with gcc-4.8.4. Which test (of the perf suite or something else) should I use. I did run the perf suite hosted in callgrind. The numbers for tinycc test are attached. In this case helperc_LOADV32le seems to be more than 13% faster than the gcc-4.9.2 version. and it might be the same speed as gcc-4.8.4 version, but I don't know exactly how to understand the created numbers. Will "perf stat" applied to normal vg-in-place give better results? At least it runs a lot faster. Regards Matthias |