| 
      
      
      From: SourceForge.net <no...@so...> - 2012-03-27 14:52:30
      
     | 
| Support Requests item #3367437, was opened at 2011-07-14 11:20 Message generated for change (Comment added) made by nobody You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=206208&aid=3367437&group_id=6208 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 1 Private: No Submitted By: nasm64developer (nasm64developer) Assigned to: Nobody/Anonymous (nobody) Summary: PINSRB issue with 8-bit GPR Initial Comment: Officially (V)PINSRB (SSE4.1) is defined as Vo(,Ho),Mb,Ib and Vo(,Ho),Ry,Ib. As that M/R operand really is a byte, it is okay to let the assembler accept all of possible R sizes: 8/16/32/64. NASM's insns.dat has an entry for 32, as well as 8, but lacks 16 and 64. However, the entry for 8 suffers from a minor problem: AH/CH/DH/BH will result in code which actually accesses SPL/BPL/SIL/DIL instead. To illustrate: .byte 0x66, 0x0F, 0x3A, 0x20, 0xC0, 0x00 # pinsrb xmm0 ,al,0 .byte 0x66, 0x0F, 0x3A, 0x20, 0xC1, 0x00 # pinsrb xmm0 ,cl,0 .byte 0x66, 0x0F, 0x3A, 0x20, 0xC2, 0x00 # pinsrb xmm0 ,dl,0 .byte 0x66, 0x0F, 0x3A, 0x20, 0xC3, 0x00 # pinsrb xmm0 ,bl,0 .byte 0x66, 0x40, 0x0F, 0x3A, 0x20, 0xC4, 0x00 # pinsrb xmm0 ,spl,0 .byte 0x66, 0x40, 0x0F, 0x3A, 0x20, 0xC5, 0x00 # pinsrb xmm0 ,bpl,0 .byte 0x66, 0x40, 0x0F, 0x3A, 0x20, 0xC6, 0x00 # pinsrb xmm0 ,sil,0 .byte 0x66, 0x40, 0x0F, 0x3A, 0x20, 0xC7, 0x00 # pinsrb xmm0 ,dil,0 .byte 0x66, 0x41, 0x0F, 0x3A, 0x20, 0xC0, 0x00 # pinsrb xmm0 ,r8b,0 .byte 0x66, 0x41, 0x0F, 0x3A, 0x20, 0xC1, 0x00 # pinsrb xmm0 ,r9b,0 .byte 0x66, 0x41, 0x0F, 0x3A, 0x20, 0xC2, 0x00 # pinsrb xmm0 ,r10b,0 .byte 0x66, 0x41, 0x0F, 0x3A, 0x20, 0xC3, 0x00 # pinsrb xmm0 ,r11b,0 .byte 0x66, 0x41, 0x0F, 0x3A, 0x20, 0xC4, 0x00 # pinsrb xmm0 ,r12b,0 .byte 0x66, 0x41, 0x0F, 0x3A, 0x20, 0xC5, 0x00 # pinsrb xmm0 ,r13b,0 .byte 0x66, 0x41, 0x0F, 0x3A, 0x20, 0xC6, 0x00 # pinsrb xmm0 ,r14b,0 .byte 0x66, 0x41, 0x0F, 0x3A, 0x20, 0xC7, 0x00 # pinsrb xmm0 ,r15b,0 # and .byte 0x66, 0x0F, 0x3A, 0x20, 0xC4, 0x00 # pinsrb xmm0 ,ah,0 .byte 0x66, 0x0F, 0x3A, 0x20, 0xC5, 0x00 # pinsrb xmm0 ,ch,0 .byte 0x66, 0x0F, 0x3A, 0x20, 0xC6, 0x00 # pinsrb xmm0 ,dh,0 .byte 0x66, 0x0F, 0x3A, 0x20, 0xC7, 0x00 # pinsrb xmm0 ,bh,0 On a SSE4.1-capable machine, using 64-bit mode: (gdb) disassemble Dump of assembler code for function main: 0x0000000100000f62 <main+0>: pinsrb $0x0,%eax,%xmm0 0x0000000100000f68 <main+6>: pinsrb $0x0,%ecx,%xmm0 0x0000000100000f6e <main+12>: pinsrb $0x0,%edx,%xmm0 0x0000000100000f74 <main+18>: pinsrb $0x0,%ebx,%xmm0 0x0000000100000f7a <main+24>: rex pinsrb $0x0,%esp,%xmm0 0x0000000100000f81 <main+31>: rex pinsrb $0x0,%ebp,%xmm0 0x0000000100000f88 <main+38>: rex pinsrb $0x0,%esi,%xmm0 0x0000000100000f8f <main+45>: rex pinsrb $0x0,%edi,%xmm0 0x0000000100000f96 <main+52>: pinsrb $0x0,%r8d,%xmm0 0x0000000100000f9d <main+59>: pinsrb $0x0,%r9d,%xmm0 0x0000000100000fa4 <main+66>: pinsrb $0x0,%r10d,%xmm0 0x0000000100000fab <main+73>: pinsrb $0x0,%r11d,%xmm0 0x0000000100000fb2 <main+80>: pinsrb $0x0,%r12d,%xmm0 0x0000000100000fb9 <main+87>: pinsrb $0x0,%r13d,%xmm0 0x0000000100000fc0 <main+94>: pinsrb $0x0,%r14d,%xmm0 0x0000000100000fc7 <main+101>: pinsrb $0x0,%r15d,%xmm0 0x0000000100000fce <main+108>: pinsrb $0x0,%esp,%xmm0 0x0000000100000fd4 <main+114>: pinsrb $0x0,%ebp,%xmm0 0x0000000100000fda <main+120>: pinsrb $0x0,%esi,%xmm0 0x0000000100000fe0 <main+126>: pinsrb $0x0,%edi,%xmm0 ... (gdb) run The program being debugged has been started already. Start it from the beginning? (y or n) y [as expected: no #UD exceptions] You may wish to review insns.dat, to add entries for 16 and 64. (Note that this may affect instructions other than PINSRB. So a review of all INSR and EXTR is needed.) Also, you may wish to document that the use of the high byte registers does result in different low byte registers in the end. It might be difficult to teach NASM to not accept the high byte register for just this one case -- so merely documenting it is a perfectly acceptable solution IMO. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2012-03-27 07:52 Message: Im thankful for the blog post.Much thanks again. Fantastic. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2012-03-27 07:31 Message: I really enjoy the post.Much thanks again. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2012-03-27 07:06 Message: Thank you for your article post.Much thanks again. Want more. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2012-03-27 06:56 Message: Im thankful for the blog article.Thanks Again. Much obliged. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2012-03-27 06:19 Message: I really liked your post.Thanks Again. Fantastic. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2012-03-24 04:06 Message: cf0fG2 Really appreciate you sharing this blog. Really Great. ---------------------------------------------------------------------- Comment By: H. Peter Anvin (hpa) Date: 2011-07-14 11:58 Message: The high byte case is already handled by byte code 325 which was created specifically for this instruction. ---------------------------------------------------------------------- Comment By: Cyrill Gorcunov (cyrillos) Date: 2011-07-14 11:57 Message: Thanks a lot, nasm64developer, i'll take a look as only time permits. side-note: I have your bmi tests files in my queue as well, just didn't manage to find time slot for merging them. actually i believe _all_ instructions we provide should be represented in this manner, ie text stream and then hex stream. So nasm should compile text stream and compare the results with hex stream provided. second-side-note: if you have some patches don't hesitate to send them us -- i guess we would merge them ;) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=206208&aid=3367437&group_id=6208 |