Did you try with original 16.02/16.04 files from windows? They are much simplier and manageable. Then you could build on that.
Compare windows 7zip masm source and p7zip nasm source to see what's different.
You can try also assemble that file with masm ans disassemble to nasm with ndisasm (from nasm package). You also can have a look at Agner Fog's objconv and c2nasm (fork, on github).
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I started with the Windows ASM file. The MASM directives don't convert 1:1. The disassembly listing helped somewhat. I found one source, fast_lzma2 which has a modified version of it in Intel format (which compiles nicely with gcc). But fast_lzma2 is not compatible with LZMA. But Conor's .S file helped a lot to set this up. Since I wrote this post I made a lot of headway, and LzmaDecOpt.asm compiles. It's just a matter of making sure I have the correct pointer sizes and that it in fact will work as a decompression engine. Stay tuned. Thank you
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Well one problem solved. The REG_PARAM_0-4 registers had to be changed for *nix in 7zAsm.asm. Thanks to Conor McCarthy. The attached file has a FIXME in it. Anyone with knowledge as to how to backtrace the error, your help is appreciated. The file is almost complete!
Maybe author can give you some comment about his code and these registers. He knows what they are.
--
Did you try to debug it step by step with simple input? Nasm can be compiled with debug info and put into visual debugger like qtcreator, eclipse, codeblocks, whatever IDE you like.
You could also try to compile with masm and nasm and disassemble compiled objects to see the difference.
My favourite method in such scenario is to put nop around the interesting fragment (so I can easily find and "frame" it) and see the difference this way.
You can even run nasm and masm compiled programs side by side in debugger (preferably visual) and compare the difference.
I found some example of similar conversion with comment about synteax differences. Not sure how relevant it is in your situation but hope it will help.
Last edit: Sam Tansy 2020-05-21
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
to debug lzma use smaller files with simpler data:
1
12
123
11
111
1111
123123
when some big file doesn't work, reduce it for 20% and try again, then do it agan, then you will have smallest file that doesn't work and you can debug it.
Also you can call lzmadecode just for one symbol (one iteration).
And you can log full lzma state to file after each symbol to compare with original states of the decoder in C version.
you can use simple lc=0 option for that. So the state will be smallest.
I did it when I developed lzma decoder in asm, and I didn't use step-by-step debugger.
Last edit: Igor Pavlov 2020-05-21
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm not in a mood to think too much so I tell what to do: I guess you adopted this LzmaDecOpt.asm to p7zip 16.02, so you have makefile/s. So tell me how to add this file (i mean makefile and stuff) to existing 7zip-16.02/04 and I will compile and decompile it for you.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
@tansy, actually I adopted this file for lrzip using lzma SDK 19.00. I was inspired by p7zip and only use the multi threading files Threads.c and Threads.h. I compiled it and created listing, disassembled, and ran through gdb. I'm not sure it's compatible with 16.02. You can compile with nasm -Dx64 -f elf64 -g -F dwarf -o LzmaDecOpt.o -l LzmaDecOpt.lst LzmaDecOpt.asm (all on one line, of course)
Be aware I left out a comment mark on line 992 which must be corrected or a compile error will occur. (gdb) i r r12 rdi ebx rdx
so add ; (gdb) i r r12 rdi ebx rdx
Thanks.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Line 945, I'd check the values in l_limit, dicPos, and cnt_R are all <= dictionary size. Line 948, check sym_R is a valid length. Also on lines 952 & 953 check l_dic_Spec has not been corrupted, and 1 <= l_rep0 <= dict size.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
@conor42l_dicSpec and l_rep0 are corrupt. They seem to get assigned properly at the beginning. Since I did use some global substitutions in editing, maybe I confused LOC and LOC_0 somewhere along the way, or GLOB and GLOB_2. Thanks for the tip. Now I can work backwards. Probably something stupid!
On a very small test file in copy_match, l_dic_Spec is totally corrupted.
dicSize 33,554,432
l_limit 0
dicPos 0
cnt_R 313
sym_R 3
* l_dicSpec 0x697a726c2f746967 INVALID
* l_rep0 -168965056 (4126002240) Garbage
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you @conor42, @ipavlov. There were two problems.
1. The COPY_VAR and RESTORE_VAR Macros assumed the source and destination member names were the same. I prefixed all local var equates with l_ so COPY_VAR would copy the variable to the wrong place. Of course, RESTORE_VAR would not work also. Ex.
Btw, is someone able to provide me with original LzmaDecOpt.asm masm compiled object (for windows), I just wanted to test something but I don't intend to do download gigabytes of Visual Studio + SDK + PDK + god one knows what XYZDK...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
@tansy, the 19.00 SDK introduced some new features. Perhaps take a peek at my github for lrzip-next, goto src/lzma/ASM/x86. Here's the file in any event. HTH
In LZMA SDK 19.00 an optimized decompress Asm module, LzmaDecOpt.asm is introduced.
Has anyone succeeded in converting this from MASM to NASM syntax? It's a worthwhile project since it could significantly speed up decompression.
While I've been able to convert macros, defines, includes, and function calls, I'm still not getting everything I need to get it to compile properly.
Thank you for anyone withing to take a run at this!
Did you try with original 16.02/16.04 files from windows? They are much simplier and manageable. Then you could build on that.
Compare windows 7zip masm source and p7zip nasm source to see what's different.
You can try also assemble that file with masm ans disassemble to nasm with ndisasm (from nasm package). You also can have a look at Agner Fog's objconv and c2nasm (fork, on github).
I started with the Windows ASM file. The MASM directives don't convert 1:1. The disassembly listing helped somewhat. I found one source, fast_lzma2 which has a modified version of it in Intel format (which compiles nicely with gcc). But fast_lzma2 is not compatible with LZMA. But Conor's .S file helped a lot to set this up. Since I wrote this post I made a lot of headway, and LzmaDecOpt.asm compiles. It's just a matter of making sure I have the correct pointer sizes and that it in fact will work as a decompression engine. Stay tuned. Thank you
Well one problem solved. The REG_PARAM_0-4 registers had to be changed for *nix in 7zAsm.asm. Thanks to Conor McCarthy. The attached file has a FIXME in it. Anyone with knowledge as to how to backtrace the error, your help is appreciated. The file is almost complete!
But after some iterations, this error occurs.
Here is the section from copy_match where the error occurs. Seatch FIXME.
Maybe author can give you some comment about his code and these registers. He knows what they are.
--
Did you try to debug it step by step with simple input? Nasm can be compiled with debug info and put into visual debugger like qtcreator, eclipse, codeblocks, whatever IDE you like.
You could also try to compile with masm and nasm and disassemble compiled objects to see the difference.
My favourite method in such scenario is to put
nop
around the interesting fragment (so I can easily find and "frame" it) and see the difference this way.You can even run nasm and masm compiled programs side by side in debugger (preferably visual) and compare the difference.
I found some example of similar conversion with comment about synteax differences. Not sure how relevant it is in your situation but hope it will help.
Last edit: Sam Tansy 2020-05-21
to debug lzma use smaller files with simpler data:
1
12
123
11
111
1111
123123
when some big file doesn't work, reduce it for 20% and try again, then do it agan, then you will have smallest file that doesn't work and you can debug it.
Also you can call lzmadecode just for one symbol (one iteration).
And you can log full lzma state to file after each symbol to compare with original states of the decoder in
C
version.you can use simple lc=0 option for that. So the state will be smallest.
I did it when I developed lzma decoder in asm, and I didn't use step-by-step debugger.
Last edit: Igor Pavlov 2020-05-21
Thanks, @ipavlov, for the suggestions. I'll keep at it. I'm wondering if I'm popping the wrong registers on return.
I'm not in a mood to think too much so I tell what to do: I guess you adopted this
LzmaDecOpt.asm
to p7zip 16.02, so you have makefile/s. So tell me how to add this file (i mean makefile and stuff) to existing 7zip-16.02/04 and I will compile and decompile it for you.@tansy, actually I adopted this file for lrzip using lzma SDK 19.00. I was inspired by p7zip and only use the multi threading files Threads.c and Threads.h. I compiled it and created listing, disassembled, and ran through gdb. I'm not sure it's compatible with 16.02. You can compile with
nasm -Dx64 -f elf64 -g -F dwarf -o LzmaDecOpt.o -l LzmaDecOpt.lst LzmaDecOpt.asm
(all on one line, of course)Be aware I left out a comment mark on line 992 which must be corrected or a compile error will occur.
(gdb) i r r12 rdi ebx rdx
so add
; (gdb) i r r12 rdi ebx rdx
Thanks.
Line 945, I'd check the values in l_limit, dicPos, and cnt_R are all <= dictionary size. Line 948, check sym_R is a valid length. Also on lines 952 & 953 check l_dic_Spec has not been corrupted, and 1 <= l_rep0 <= dict size.
@conor42 l_dicSpec and l_rep0 are corrupt. They seem to get assigned properly at the beginning. Since I did use some global substitutions in editing, maybe I confused LOC and LOC_0 somewhere along the way, or GLOB and GLOB_2. Thanks for the tip. Now I can work backwards. Probably something stupid!
On a very small test file in
copy_match
, l_dic_Spec is totally corrupted.dicSize 33,554,432
l_limit 0
dicPos 0
cnt_R 313
sym_R 3
* l_dicSpec 0x697a726c2f746967 INVALID
* l_rep0 -168965056 (4126002240) Garbage
Thank you @conor42, @ipavlov. There were two problems.
1. The COPY_VAR and RESTORE_VAR Macros assumed the source and destination member names were the same. I prefixed all local var equates with
l_
so COPY_VAR would copy the variable to the wrong place. Of course, RESTORE_VAR would not work also. Ex.section, I did not use the LOC Macro and did not prefix the lpMask variable with
l_
After this, the file decompresses. I will now clean up the code for readability and test thoroughly and benchmark with
lrzip
.An initial run with
lrzip
yields a significant percentage improvement. Source file: 2,871,971,840 bytes. Compressed file: 1,873,488,679 bytes.Without Assembly
Decompressing...
100% 2738.93 / 2738.93 MB
Average DeCompression Speed: 78.229MB/s
[OK] - 2871971840 bytes
Total time: 00:00:35.57
With Assembly
Decompressing...
100% 2738.93 / 2738.93 MB
Average DeCompression Speed: 130.381MB/s
[OK] - 2871971840 bytes
Total time: 00:00:21.83
Percentage improvement: 38% on time,
Thank you all again for suggestions and support.
Wow, congrats! This calls for a drink.
Btw, is someone able to provide me with original LzmaDecOpt.asm masm compiled object (for windows), I just wanted to test something but I don't intend to do download gigabytes of Visual Studio + SDK + PDK + god one knows what XYZDK...
Can you @pete show
../x86/7zAsm.asm
? I tried to convert it to test it with p7zip but not successfully. Nasm is not my cup of tea.@tansy, the 19.00 SDK introduced some new features. Perhaps take a peek at my github for lrzip-next, goto src/lzma/ASM/x86. Here's the file in any event. HTH
Thanks, i will.