From: <lut...@fr...> - 2005-03-06 14:55:26
|
Hi, here is a big patch to the x86-64 disassembler that implements the changes I announced earlier. It supersedes the patch I sent on 2005-02-19. It provides and uses a framework to correctly use the prefixes and the width bit to determine the operand size and to correctly extend the register fields of the instructions. To test this I had to work through all printers, prefilters, arg-type and instruction-format definitions and rewrite some of them. In the course of this I took the liberty to fix several bugs that happened to cross my way. For details see below. I did one change that may be controversial: Because the processor sign-extends 32-bit immediate values to 64 bits when operating on qwords it was more regular to print most immediates as signed values. I will send a separate mail about this lest this mail gets too long. There are lots of things left to do. At least: Adding the REX prefix to more instructions, adding more support for the #x66 prefix and adding support for the SSE instructions. Nevertheless the patch is complete in itself and IMHO the disassembler is in a much better shape with it than without it. With kind regards, and see below for the details and the patch Lutz Euler The detailed changes (to insts.lisp, where not otherwise mentioned, and not repeating what I wrote on 2005-02-19) with examples of their effects: - General: * correct typos in the comments for deftype reg and deftype full-reg * correct one or two other typos in comments * replace c-ism by lisp-ism in comment for +default-operand-size+ * remove instruction-format reg-dir (unused) * add type declarations to function print-reg * extend *byte-reg-names*, *word-reg-names* and *dword-reg-names* to all 16 registers each (rationale: if the disassembler gets out of sync or is offered data instead of instruction bytes, don't signal an array indexing error so easily, instead decode even registers that the compiler/assembler does not use) - To correctly determine the address size: Currently: 408A41F1 MOV EAX, [ECX-15] With the patch: 408A41F1 MOV AL, [RCX-15] * replace *default-address-size* with +default-address-size+ and correct its value to :qword, correct comment * add functions inst-operand-size and inst-operand-size-default-qword * change print-addr-reg to always use +default-address-size+ * remove arg-type addr-reg (no longer needed) - Treatment of REX prefix and width bit to determine operand size and to extend register fields: Currently: 4D31F6 XOR R14, RSI 4F8B440101 MOV RAX, [RCX+RAX+1] 408A41F1 MOV EAX, [ECX-15] With the patch: 4D31F6 XOR R14, R14 4F8B440101 MOV R8, [R9+R8+1] 408A41F1 MOV AL, [RCX-15] * add prefilters prefilter-wrxb, prefilter-reg-r, prefilter-reg-b * new arg-type wrxb * change prefilter-reg/mem to use the new way to get at the REX bits. Make it use REX.B and REX.X to extend the base and index field at all. * same with prefilter-width, move it towards the beginning of the file to have the prefilters in the order their fields are in the instructions * adapt arg-type width to this, move it towards the front, too * change print-reg to use inst-operand-size * add arg-type reg-b (we need to distinguish two register arg-types depending on which REX bit extends them) * change all define-instruction-formats with rex to use the new wrxb field and arg-type * remove arg-type rex-reg/mem and sized-rex-reg/mem, use reg/mem and sized-reg/mem instead * remove function print-rex-reg/mem (no longer used) - Clean up word-... things: (word-reg and reg are treated no differently already without my patch; that they both exist seems to be a relict of an earlier incomplete attempt to correctly treat the width bit and the #x66 prefix, already in the x86 port. Same with word-accum.) * remove prefilter-word-reg * remove arg-type word-reg/mem and word-reg, use reg/mem and reg or reg-b instead * remove functions print-word-reg and print-word-reg/mem * remove arg-type word-accum, use accum instead - Treatment of instructions with a default operand size of 64 bits: Currently: 48FF5009 CALL BYTE PTR [RAX+9] 48FF75F0 PUSH BYTE PTR [RBP-16] 57 PUSH EDI With the patch: 48FF5009 CALL QWORD PTR [RAX+9] 48FF75F0 PUSH QWORD PTR [RBP-16] 57 PUSH RDI * new function print-reg/mem-with-width * new printer function print-sized-reg/mem-default-qword * new arg-type sized-reg/mem-default-qword * new instruction formats reg/mem-default-qword and rex-reg/mem-default-qword * use these in push, pop, call, jmp * add function print-reg-default-qword, arg-type reg-b-default-qword, instruction formats reg-no-width-default-qword and rex-reg-no-width-default-qword and use them in push and pop. (This corrects printing of the register size for the variants of these instructions without REX prefix. These are currently not used by the assembler but should be to reduce code size). - Sizes of and sign extension of immediate data: Currently: Wrong length of immediate, losing sync: 48B908000000 MOV RCX, 8 0000 ADD [EAX], EAX 0000 ADD [EAX], EAX Missing sign extension of 32-bit immediate to 64 bits: 48C745E8F8FFFFFF MOV QWORD PTR [RBP-24], 4294967288 With the patch: 48B90800000000000000 MOV RCX, 8 48C745E8F8FFFFFF MOV QWORD PTR [RBP-24], -8 * remove arg-type imm-data-upto-dword and imm-data * add arg-type signed-imm-data, signed-imm-data-upto-qword and signed-imm-data-default-qword, make them use inst-operand-size[-default-qword] * adapt all uses of these fields * remove arg-type signed-imm-dword * (in disassem.lisp) allow length 64 in read-signed-suffix * add arg-type imm-byte * use it in shift-inst-printer-list instead of signed-imm-byte (not that that makes a difference for the expected values of the immediate of 0 ... 63, but in case someone puts in some unreasonable value ...) - To be able to correctly print operand sizes in memory references: (for examples see above and movsx below) * (in target-insts.lisp) change print-mem-access: change argument order to honor conventions from insts.lisp, move determination of size out of this function and therefore rename the argument print-size-p to width, extend comment * adapt the caller in insts.lisp and make it use inst-operand-size or inst-operand-size-default-qword as appropriate - Clean up confusion about length of op fields with and without width bit: Currently: The shift-inst printer misdetects IMUL as RCL, losing sync: 4869D0 RCL RAX, CL 0001 ADD [ECX], EAX 0000 ADD [EAX], EAX With the patch: 4869D000010000 IMUL RDX, RAX, 256 * add width field to the instruction-format rex-reg, shrink op field to 4 bits. * same in rex-reg-reg/mem and rex-reg/mem (op is 7 bits instead of 8) * adapt usage in mov, lea, test and xchg instructions * correct rex-reg/mem-imm usage in the shift instructions and in mov and remove comment saying it doesn't work for 8-bit register yet (it does now) * same in arith-inst-printer-list (two places, #x80/81 and #x82/83) * remove #x82 encoding in arith-inst-printer-list (invalid in 64-bit mode) - Fix bit test instructions: (the immediate variant always uses an unsigned byte independent of the operand size) * use arg-type imm-byte instead of imm-data - Fix movsx[d] and movzx: (the size of the source (reg/mem) should be determined by the opcode, the size of the destination (reg) by the operand size) Currently: 48 BYTE #X48 0FB6C0 MOVZX EAX, EAX 48 BYTE #X48 63 BYTE #X63 C8488BC1 ENTER 35656, 193 With the patch: 480FB6C0 MOVZX RAX, AL 4863C8 MOVSXD RCX, EAX * rename print-byte-reg/mem to print-sized-byte-reg/mem, replace arg-type byte-reg/mem with sized-byte-reg/mem (was used only in SETcc, which can use the sized version, so we avoid adding another arg-type) * new functions print-sized-[d]word-reg/mem and arg-types sized-[d]word-reg/mem * add instruction-formats [rex-]ext-reg-reg/mem-no-width * remove (reg nil :type 'reg) in the define-instructions for movsx, movzx and movsxd (it is superfluous) * extend the printers in movsx and movzx to use the fixed-size arg-types * movsxd needs only the rex-variant because it should not be used without a REX prefix with the W bit set (it would do a movzxd otherwise, see the AMD docs) - Fix imul: Currently: Incorrectly identified as IMUL: 0FAE5DE8 IMUL EBX, [EBP-24] With the patch: Not decoded, should be LDMXCSR (printer not defined yet) 0F BYTE #X0F AE SCASB 5D POP RBP * formerly, 0F AE was incorrectly interpreted as imul (should be group 15 instead). Therefore change the corresponding imul printer by replacing arg-type ext-reg-reg/mem with ext-reg-reg/mem-no-width and adapting the op field. * replace the arg-type imm-word with signed-imm-data in the imul printer for opcode #x69 (to not resurrect bug 245a and more after the previous changes) * add rex-variants to the printers for the opcodes #x69 and #x6B * remove arg-type imm-word (no longer used) |