Re: [Valgrind-users] REPE then 0xF

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Dominic Mazzoni wrote:
> Was anyone else able to reproduce the REPE then 0xF problem using my test 
> program?  Just curious...

 From IA-32 Intel Architecture Software Developer's Manual, Volume 2: Instruction
Set Reference (24547107.pdf on developer.intel.com; perhaps 2454108.pdf by now):

CMPSS -- Compare Scalar Single-Precision Floating-Point Values
   F3 0F C2 /r ib  CMPSS xmm1, xmm2/m32, imm8

CMPSD -- Compare Scalar Double-Precision Floating-Point Values
   F2 0F C2 /r ib  CMPSD xmm1, xmm3/m32, imm8

MOVQ2DQ -- Move Quadword from MMX to XMM Register
   F3 0F D6  MOVQ2DQ xmm, mm

MOVDQ2Q -- Move Quadword from XMM to MMX Register
   F2 0F D6  MOVDQ2Q mm, xmm

CVTDq2PD -- Convert Packed Doubleword Integers to Packed Double-Precision Floating-
       Point Values
   F3 0F E6  CVTDQ2PD xmm1, xmm2/m64

CVTD2DQ  -- Convert Packed Double-Precision Floating-Point Values to Packed
       Doubleword Integers
   F2 0F E6  CVTPD2DQ xmm1, xmm2/m128

So seeing "F3 0F ..." is not surprising: it is specified behavior, and very
reasonable for a compiler to generate CMPSS.

Furthermore, past generations of hardware always accepted prefixes in any order.
So in theory "0F F3 ..." would be equivalent.  Perhaps there is a move to
require that 0F be the last prefix (it might simplify decoding), but that
would break backwards compatibility in many cases.  0F could be required
as last only for new instructions, of which CMPSS certainly is one.  But
then the checking to require 0F last, only in these 6 cases, might cost more
than just allowing arbitrary order all the time.  [Basically, the opcode
space is over-full.]

-- 
John Reiser, jr...@Bi...