|
From: Nicholas N. <nj...@ca...> - 2003-11-27 12:02:41
|
Hi, I've given REP prefix handling in vg_to_ucode.c a thorough overhaul. There was a whole lot of code duplication in there, eg. the codegen_REPE_SCAS() contained all the code in codegen_SCAS(), (ditto for all the other ones), and various other stupidities. All up I cut about 220 lines of code, and generalised it so that adding missing REP prefixed instructions in the future will be a whole lot easier. I also clarified the difference between REP and REPE a bit (briefly, the same code is used for REP and REPE, but REPE only applies to scas & cmps, whereas REP applies to lods, stos, ins, outs and movs). I have a couple of questions about this before I commit it in the HEAD, though. 1. What is the point of "rep lods"? AFAICT, it loads multiple words from the address pointed to by %esi into %eax. I assume Intel just let you REP-prefix it for consistency with the other string ops? Valgrind doesn't handle it currently, I don't imagine anyone ever used it. 2. Are REP prefixes widely used? Because our current implementation of them is pretty sucky. In particular, fetching the D-flag via a C call every time around the loop must be hurting us badly (the D-flag can never change in the middle of a REP-loop -- that requires a CLD/STD, right?) I could quite easily pull the C call out the front so it's only done once per REP. 3. I've seen some other opportunities for code factoring in vg_to_ucode.c. In particular there are loads of jumps like this: uInstr1(cb, JMP, 0, Literal, 0); uLiteral(cb, d32); uCond(cb, CondAlways); for which a function could be factored out. This would cut code, which is good. But there'll be lots of small changes, which would be bad for anyone who has fiddled with vg_to_ucode.c in their workspace. Would committing this annoy anyone? I can hold off if so. N |
|
From: Nicholas N. <nj...@ca...> - 2003-11-27 12:17:48
|
On Thu, 27 Nov 2003, Nicholas Nethercote wrote: > I also clarified the difference between REP and REPE a bit > (briefly, the same code is used for REP and REPE, but REPE only applies to > scas & cmps, whereas REP applies to lods, stos, ins, outs and movs). This is unclear: by "the same code" I mean the same byte (0xF3) is used in the machine code, ie. 0xF3 means REP or REPE, depending on the following instruction. N |
|
From: Tom H. <th...@cy...> - 2003-11-27 12:22:07
|
In message <Pin...@gr...>
Nicholas Nethercote <nj...@ca...> wrote:
> 2. Are REP prefixes widely used? Because our current implementation of
> them is pretty sucky. In particular, fetching the D-flag via a C call
> every time around the loop must be hurting us badly (the D-flag can never
> change in the middle of a REP-loop -- that requires a CLD/STD, right?)
> I could quite easily pull the C call out the front so it's only done once
> per REP.
Well REPZ and REPNZ are used quite a lot for inlining various string
operations. A quick scan of one of our programs shows 1424 REPZ/REPNZ
prefixes in a 10Mb executable.
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|
|
From: Dirk M. <dm...@gm...> - 2003-11-27 12:26:41
|
On Thursday 27 November 2003 13:02, Nicholas Nethercote wrote: > 1. What is the point of "rep lods"? AFAICT, it loads multiple words from > the address pointed to by %esi into %eax. I think you can quickly scan for a 0 byte/word/dword this way. the REP instructions became quite unpopular lately as doing it the "normal" ways is meanwhile faster with recent CPUs rather than using the REPxx stuff. Afaik no compiler generates those sequences anymore, so you'll only hit in very old applications that contain 5 year old assembler sequences. As long as nobody complains, don't bother. |
|
From: Nicholas N. <nj...@ca...> - 2003-11-27 12:52:33
|
On Thu, 27 Nov 2003, Dirk Mueller wrote:
> > 1. What is the point of "rep lods"? AFAICT, it loads multiple words from
> > the address pointed to by %esi into %eax.
>
> I think you can quickly scan for a 0 byte/word/dword this way.
No, that's what REP{E,NE} SCAS is for. LODS can only be prefixed with
REP, so the LODS must be done N times, where N is in %ecx.
> Afaik no compiler generates those sequences anymore, so you'll only hit in
> very old applications that contain 5 year old assembler sequences. As long as
> nobody complains, don't bother.
Ok, so far, one vote in favour of speeding these up, one against. I'll
probably do it, because it will be pretty easy.
N
|
|
From: Dirk M. <dm...@gm...> - 2003-11-27 12:55:51
|
On Thursday 27 November 2003 13:52, Nicholas Nethercote wrote: > Ok, so far, one vote in favour of speeding these up, one against. I'll > probably do it, because it will be pretty easy. I was not voting against speeding it up. I was voting against spending much time on which particular REPxx instruction works with LODS etc. |
|
From: Nicholas N. <nj...@ca...> - 2003-11-27 13:33:53
|
On Thu, 27 Nov 2003, Dirk Mueller wrote: > > Ok, so far, one vote in favour of speeding these up, one against. I'll > > probably do it, because it will be pretty easy. > > I was not voting against speeding it up. I was voting against spending much > time on which particular REPxx instruction works with LODS etc. Oh, sorry. There's no problem with the prefixes and LODS, they're correct as is. As for frequency of REP prefixes, there seem to be plenty in different programs on my system -- even in valgrind.so (compiled by GCC 3.2.2), so I will definitely do it. N |
|
From: Jeremy F. <je...@go...> - 2003-11-27 16:59:39
|
On Thu, 2003-11-27 at 04:02, Nicholas Nethercote wrote: > 2. Are REP prefixes widely used? Because our current implementation of > them is pretty sucky. In particular, fetching the D-flag via a C call > every time around the loop must be hurting us badly (the D-flag can never > change in the middle of a REP-loop -- that requires a CLD/STD, right?) > I could quite easily pull the C call out the front so it's only done once > per REP. I was doing some profiling with oprofile the other day, and the D flag helper was one of the top 5 functions in valgrind.so when running cc1 (sorry, no concrete numbers on hand at the moment). So that would be good. > This would cut code, which is good. But there'll be lots of small > changes, which would be bad for anyone who has fiddled with vg_to_ucode.c > in their workspace. Would committing this annoy anyone? I can hold off > if so. Fine by me. J |