|
From: Evan G. <th...@gm...> - 2008-08-28 15:44:05
|
On Aug 28, 2008, at 7:32 AM, Ivan Jager wrote: > On Wed, 27 Aug 2008, Evan Geller wrote: >> My port is nearly done, minus floating point and VFP and neon and >> some other >> stuff, but it runs plenty of stuff just fine. Yes memcheck works >> fine too. >> Please don't start a port now :) It'd be a shame to have two >> separate ARM >> ports. >> >> And for the record it's a huge amount of work. Ivan's code isn't >> working >> either, since the flags aren't being set properly, and > > Oh, you didn't tell me about this. In which cases is it broken? I'm not even sure any more. It took me a good week or so to get the flags right, they were a total pain haha. It wasn't your code that was broken I don't think, I think it was CAB's code that you converted into IR (some of the stuff you commented "this doesn't look right...") Plus, there was some stuff being passed to the helper itself (carry out, etc.) that wasn't being set right. > > >> removing the flag thunks slows things down. > > Oh :(, how much is it slowing things down? I would have expected it > to be faster than what was already there, given that VEX could > optimize it. I could change it so that VEX would optimize away the > flags that aren't being used. > > My reasoning for not using thunks was: 1. ARM instructions have an S > bit which indicates whether or not you want them to set the > condition codes, and gcc only sets this bit when they will be used. > and 2. Many ARM instructions only set some of the CCs, meaning that > the thunk method would need to force the previous thunk again for > those instructions. > > But, perhaps GCC's optimizer is so much better than VEX's that it's > still faster to make the C call and have an extra branch based on > CC_OP before getting to basically the same code. Well, it's not so much that the code itself runs slower, it's that VEX is forced to traverse a massive IR tree. When memcheck instruments a binary, it tries trace everything through each IROp, meaning the code gets inflated significantly, as well as memcheck having to trace through that massive IR tree. Memcheck is better at tracking through helpers since it can see the flag setting as one discrete operation. It was a definitely a good move to remove the thunk itself, since x86 tends to be [flag set] [flag set] [flag set] [check flags], whereas ARM is [flag set] [check flag] [check flag], but setting the flags in IR makes for a lot of duplicate code in the translation cache and a lot of stuff to traverse. Sorry about that first email not being too well thought out, I was about to miss my shuttle home when I saw Shachen's email, so I sorta banged it out. > > >> Currently the flags are being calculated in IR but >> I plan to make that a helper. The tree is pretty dirty right now, I >> might >> just post a tarball somewhere so people can play with it. I haven't >> really >> talked to anyone from the valgrind project about actually >> integrating it in. >> The code is just really hashy right now, but it works. I'll try and >> have it >> up by next friday :-D > > Looking forward to seeing that, so I can do some testing. :) > > Ivan |