|
From: Julian S. <js...@ac...> - 2008-03-12 11:59:42
|
Nuno Sorry to be slow replying. Slowness is because I don't have any good alternative suggestions. > > As the Valgrind paper says, these are fairly heavyweight optimisations > > for a binary translation system. You could try to do some more > > compiler-style optimisations, but I think the scope for improvement there > > is not so great. But I could be wrong -- Julian, what do you think? > > Uhm, it seems that most easy optimizations are already implemented, as I > would expect. Anyway I'll take a closer look (maybe next week) to what is > implemented to see if we have any chance do improve anything at all. > Inter-block optimizations (e.g. inlining, GCSE, ...) would be really cool, > but I'm not sure how Vex is suited for that. I agree .. earlier in the life of the project there was a lot of effort put into doing good IR level optimisation; and then before 3.3.0 another round of iropt and code generator tuning. So most of the easy and even the not-so-easy stuff is already done. From a general usability perspective, when a Valgrind tool runs too slowly to be usable, that is almost always a problem to do with the tool and not to do with the core code generation mechanisms. So speeding up the core doesn't improve usability much. Hacking on the tools usually makes a much bigger difference. It depends what you want to do. If you want to do some hacking on classical compiler transformations then perhaps there are not so many interesting opportunities left in Valgrind. > > A very interesting project you could be try would be to implement > > chaining for Vex -- the Valgrind paper (above) talks about this. > > (Section 2.3.6 of http://www.valgrind.org/docs/phd2004.pdf discusses the > > old implementation of chaining that was in pre-Vex Valgrind -- you can > > see the actual implementation in Valgrind 2.4.1 at > > http://www.valgrind.org/downloads/old.html.) > > Sounds interesting, but maybe this requires too much valgrind internals > knowledge, but I'll take a look into this as well. Another thing you could chase is to consider enhancing the superblock formation. Currently vex follows unconditional branches and calls when forming superblocks, but stops at indirect and conditional branches. I experimented with the usual simple heuristic for conditional branches: assume backwards branches taken and forward not taken, and extended it to follow conditional branches on that basis. Often made performance worse, though; although the translations might be a bit faster, they are also a lot bigger (more I1 misses) and the JIT of course runs more slowly too. Not worth the hassle I reckon. > > this in section 5.4). A more interesting thing is to speed up any of the > > real existing tools, especially Memcheck, since that's the most widely > > used. > > Yes, but I'm not sure I can justify such a project about dynamic program > analysis in a virtual execution class. The class is targeted at dynamic > binary translation, binary interpretation, intra-block optimizations, > virtualization, and so on. Yes, I see that. Nevertheless if it does happen that working on a tool is OK, it might be worth looking at Callgrind. It's a great tool (I have used it quite a lot in '08) but a little slow; and from some initial profiles it looks like there are some possibilities for speedup in there. Also, porting the branch-mispredict profiling from Cachegrind into Callgrind would be a cool thing to do. J |