Menu

Converting LzmaDecOpt.asm from masn to nasm syntax

Help
Pete
2020-01-18
2021-04-03
  • Pete

    Pete - 2020-01-18

    In LZMA SDK 19.00 an optimized decompress Asm module, LzmaDecOpt.asm is introduced.

    Has anyone succeeded in converting this from MASM to NASM syntax? It's a worthwhile project since it could significantly speed up decompression.

    While I've been able to convert macros, defines, includes, and function calls, I'm still not getting everything I need to get it to compile properly.

    Thank you for anyone withing to take a run at this!

     
  • Sam Tansy

    Sam Tansy - 2020-05-17

    Did you try with original 16.02/16.04 files from windows? They are much simplier and manageable. Then you could build on that.
    Compare windows 7zip masm source and p7zip nasm source to see what's different.
    You can try also assemble that file with masm ans disassemble to nasm with ndisasm (from nasm package). You also can have a look at Agner Fog's objconv and c2nasm (fork, on github).

     
  • Pete

    Pete - 2020-05-17

    I started with the Windows ASM file. The MASM directives don't convert 1:1. The disassembly listing helped somewhat. I found one source, fast_lzma2 which has a modified version of it in Intel format (which compiles nicely with gcc). But fast_lzma2 is not compatible with LZMA. But Conor's .S file helped a lot to set this up. Since I wrote this post I made a lot of headway, and LzmaDecOpt.asm compiles. It's just a matter of making sure I have the correct pointer sizes and that it in fact will work as a decompression engine. Stay tuned. Thank you

     
  • Pete

    Pete - 2020-05-20

    Well one problem solved. The REG_PARAM_0-4 registers had to be changed for *nix in 7zAsm.asm. Thanks to Conor McCarthy. The attached file has a FIXME in it. Anyone with knowledge as to how to backtrace the error, your help is appreciated. The file is almost complete!

    # for WIN64-x64 ABI:
    .equ REG_PARAM_0, r1
    .equ REG_PARAM_1, r2
    .equ REG_PARAM_2, r8
    .equ REG_PARAM_3, r9
    # for System V AMD64 ABI:
    .equ REG_PARAM_0, r7
    .equ REG_PARAM_1, r6
    .equ REG_PARAM_2, r2
    .equ REG_PARAM_3, r1
    

    But after some iterations, this error occurs.

    Thread 2 "lrzip" hit Breakpoint 1, 0x0000000000459840 in _LzmaDec_DecodeReal_3 ()
    (gdb) c
    Continuing.
    
    Thread 2 "lrzip" received signal SIGSEGV, Segmentation fault.
    0x000000000045a2aa in copy_match.out ()
    

    Here is the section from copy_match where the error occurs. Seatch FIXME.

    ; *** FIXME***
    ; after some iterations, invalid memory address in RDI t0_R
    ; Dump of assembler code for function copy_match.out:
    ;   0x000000000045a2a7 <+0>:     add    %r12,%rdi
    ;=> 0x000000000045a2aa <+3>:     movzbl (%rdi),%ebx
    ;   0x000000000045a2ad <+6>:     add    %rdx,%rdi
    ;   0x000000000045a2b0 <+9>:     neg    %rdx
    (gdb) i r r12 rdi ebx rdx
    ;r12            0x7ffff00008c0   140737219922112
    ;rdi            0x7ffefa146981   140733094062465 *** this shows error "Cannot access memory at 0x#####
    ;ebx            0x0      0
    ;rdx            0x2      2
    ; *** END FIXME ***
            add     t0_R, dic
            movzx   sym, byte [t0_R]
            add     t0_R, cnt_R
            neg     cnt_R
            ; lea     r1, [dicPos - 1]
    copy_common:
            dec     dicPos
            ; cmp   LOC rep0, 1
            ; je    rep0Label
    
            ; t0_R - src_lim
            ; r1 - dest_lim - 1
            ; cnt_R - (-cnt)
    
            IsMatchBranch_Pre
            inc     cnt_R
            jz      copy_end
    
     
  • Sam Tansy

    Sam Tansy - 2020-05-21

    Maybe author can give you some comment about his code and these registers. He knows what they are.

    --
    Did you try to debug it step by step with simple input? Nasm can be compiled with debug info and put into visual debugger like qtcreator, eclipse, codeblocks, whatever IDE you like.

    You could also try to compile with masm and nasm and disassemble compiled objects to see the difference.
    My favourite method in such scenario is to put nop around the interesting fragment (so I can easily find and "frame" it) and see the difference this way.
    You can even run nasm and masm compiled programs side by side in debugger (preferably visual) and compare the difference.

    I found some example of similar conversion with comment about synteax differences. Not sure how relevant it is in your situation but hope it will help.

     

    Last edit: Sam Tansy 2020-05-21
  • Igor Pavlov

    Igor Pavlov - 2020-05-21

    to debug lzma use smaller files with simpler data:
    1
    12
    123
    11
    111
    1111
    123123

    when some big file doesn't work, reduce it for 20% and try again, then do it agan, then you will have smallest file that doesn't work and you can debug it.

    Also you can call lzmadecode just for one symbol (one iteration).
    And you can log full lzma state to file after each symbol to compare with original states of the decoder in C version.
    you can use simple lc=0 option for that. So the state will be smallest.
    I did it when I developed lzma decoder in asm, and I didn't use step-by-step debugger.

     

    Last edit: Igor Pavlov 2020-05-21
  • Pete

    Pete - 2020-05-21

    Thanks, @ipavlov, for the suggestions. I'll keep at it. I'm wondering if I'm popping the wrong registers on return.

     
  • Sam Tansy

    Sam Tansy - 2020-05-21

    I'm not in a mood to think too much so I tell what to do: I guess you adopted this LzmaDecOpt.asm to p7zip 16.02, so you have makefile/s. So tell me how to add this file (i mean makefile and stuff) to existing 7zip-16.02/04 and I will compile and decompile it for you.

     
  • Pete

    Pete - 2020-05-21

    @tansy, actually I adopted this file for lrzip using lzma SDK 19.00. I was inspired by p7zip and only use the multi threading files Threads.c and Threads.h. I compiled it and created listing, disassembled, and ran through gdb. I'm not sure it's compatible with 16.02. You can compile with
    nasm -Dx64 -f elf64 -g -F dwarf -o LzmaDecOpt.o -l LzmaDecOpt.lst LzmaDecOpt.asm (all on one line, of course)

    Be aware I left out a comment mark on line 992 which must be corrected or a compile error will occur.
    (gdb) i r r12 rdi ebx rdx
    so add
    ; (gdb) i r r12 rdi ebx rdx

    Thanks.

     
  • Conor McCarthy

    Conor McCarthy - 2020-05-22

    Line 945, I'd check the values in l_limit, dicPos, and cnt_R are all <= dictionary size. Line 948, check sym_R is a valid length. Also on lines 952 & 953 check l_dic_Spec has not been corrupted, and 1 <= l_rep0 <= dict size.

     
  • Pete

    Pete - 2020-05-22

    @conor42 l_dicSpec and l_rep0 are corrupt. They seem to get assigned properly at the beginning. Since I did use some global substitutions in editing, maybe I confused LOC and LOC_0 somewhere along the way, or GLOB and GLOB_2. Thanks for the tip. Now I can work backwards. Probably something stupid!

    On a very small test file in copy_match, l_dic_Spec is totally corrupted.
    dicSize 33,554,432
    l_limit 0
    dicPos 0
    cnt_R 313
    sym_R 3
    * l_dicSpec 0x697a726c2f746967 INVALID
    * l_rep0 -168965056 (4126002240) Garbage

     
  • Pete

    Pete - 2020-05-22

    Thank you @conor42, @ipavlov. There were two problems.
    1. The COPY_VAR and RESTORE_VAR Macros assumed the source and destination member names were the same. I prefixed all local var equates with l_ so COPY_VAR would copy the variable to the wrong place. Of course, RESTORE_VAR would not work also. Ex.

    %macro COPY_VAR 2 ; macro name
            mov     t0, GLOB_2(%1) ; name
            mov     LOC_0(%2), t0 ; name, t0
    
    1. In the
    ; ---------- LITERAL MATCHED ----------
    
            LIT_PROBS LOC(l_lpMask)
    

    section, I did not use the LOC Macro and did not prefix the lpMask variable with l_

    After this, the file decompresses. I will now clean up the code for readability and test thoroughly and benchmark with lrzip.

    An initial run with lrzip yields a significant percentage improvement. Source file: 2,871,971,840 bytes. Compressed file: 1,873,488,679 bytes.
    Without Assembly
    Decompressing...
    100% 2738.93 / 2738.93 MB
    Average DeCompression Speed: 78.229MB/s
    [OK] - 2871971840 bytes
    Total time: 00:00:35.57

    With Assembly
    Decompressing...
    100% 2738.93 / 2738.93 MB
    Average DeCompression Speed: 130.381MB/s
    [OK] - 2871971840 bytes
    Total time: 00:00:21.83

    Percentage improvement: 38% on time,
    Thank you all again for suggestions and support.

     
  • Sam Tansy

    Sam Tansy - 2020-05-23

    Wow, congrats! This calls for a drink.

    Btw, is someone able to provide me with original LzmaDecOpt.asm masm compiled object (for windows), I just wanted to test something but I don't intend to do download gigabytes of Visual Studio + SDK + PDK + god one knows what XYZDK...

     
  • Sam Tansy

    Sam Tansy - 2021-03-31

    Can you @pete show ../x86/7zAsm.asm? I tried to convert it to test it with p7zip but not successfully. Nasm is not my cup of tea.

     
    • Pete

      Pete - 2021-03-31

      @tansy, the 19.00 SDK introduced some new features. Perhaps take a peek at my github for lrzip-next, goto src/lzma/ASM/x86. Here's the file in any event. HTH

       
      • Sam Tansy

        Sam Tansy - 2021-04-03

        Thanks, i will.

         

Log in to post a comment.