Joachim Henke

My Ohloh Profile

This widget needs to be configured before it can be displayed.

Show:

What's happening?

  • Followup: RE: fastcall

    For some further hints, I run both binaries through cachegrind. It's seems, that the speedup comes more from the reduced instructions executed than from something related to the caches. ~/p7zip_9.04$ cg_annotate cachegrind.out.8803 -------------------------------------------------------------------------------- I1 cache: 32768 B, 64 B, 8-way associative D1 cache:

    2009-11-05 12:38:35 UTC in p7zip

  • Followup: RE: fastcall

    I uploaded the compiled asm output to <http://j-o.users.sourceforge.net/download/7-zip/>. Despite both versions were compiled with exactly the same optimization flags , gcc decided to order the functions differently. This might be one reason for better caching...

    2009-11-04 12:55:43 UTC in p7zip

  • Followup: RE: fastcall

    Well, I can only make assumptions, why p7zip in Ubuntu was not compiled with the asm code, but I'm quite sure it has nothing to do with the assembler syntax. Recent versions of GCC and GNU assembler are even able to handle Intel syntax. I suppose, it's more regarding portability. Ubuntu's packages are based on the one's from Debian, and Debian is running on many architectures. And there is no...

    2009-11-03 17:31:55 UTC in p7zip

  • Followup: RE: fastcall

    I don't know, which Linux distributions compile p7zip with the crc asm code. You would have to check the source packages or ask the packages maintainers. I just can tell that p7zip in Ubuntu (probably the most popular desktop distro currently) 9.10 was not compiled with the asm code, which 7z b -mm=crc clearly shows. As I already mentioned at the end of the third post of this thread, there is...

    2009-11-03 12:57:06 UTC in p7zip

  • Followup: RE: fastcall

    To answer the second question first, doing asm in Linux is quite easy. I just was not aware, that you're interested in the asm speed with fastcall. Here is the complete patch: --- p7zip_9.04/Asm/x86/7zCrcT8U.asm +++ p7zip_9.04-fastcall/Asm/x86/7zCrcT8U.asm @@ -12,5 +12,5 @@ %endmacro -data_size equ (28) +data_size equ (20) crc_table equ (data_size +...

    2009-11-03 11:00:52 UTC in p7zip

  • Followup: RE: fastcall

    Yes, I compiled both versions - with the same parameters. The only difference was the patch above. I did not compile with the asm code. With fastcall, the produced code is a few instructions shorter, and therefore it might better fit in the cache (randomly). I repeated compilation and benchmarks on a Pentium4, 2.53GHz: ~$ p7zip_9.04/bin/7zr b 7-Zip (A) 9.04 beta Copyright...

    2009-11-02 21:45:36 UTC in p7zip

  • Followup: RE: fastcall

    Here is the patch: --- C/Types.h +++ C/Types.h @@ -104,5 +104,10 @@ #define MY_CDECL #define MY_STD_CALL + +#if __GNUC__ && __i386__ +#define MY_FAST_CALL __attribute__((__fastcall__, __noinline__)) +#else #define MY_FAST_CALL +#endif #endif And here are the benchmark results from a Core2 Duo: ~$...

    2009-11-02 15:57:29 UTC in p7zip

  • Followup: RE: fastcall

    On x86 (IA32 only), GCC supports fastcall (1st arg in ecx, 2nd arg in edx) via __attribute__((fastcall)) on other architectures you just get: warning: ‘fastcall’ attribute ignored I'll try to check, if I can measure any speed difference.

    2009-11-02 12:30:52 UTC in p7zip

  • Followup: RE: [PATCH] Linux huge pages support

    Currently, handling huge pages in Linux is somewhat cumbersome, and almost only used by enterprise software like database systems. With the upcoming kernel 2.6.32 allocating memory on huge pages will be just as easy as doing an mmap (from the application side) - but still, the administrator has to pre-configure, how much memory is reserved for huge pages. Well, you already can make use of...

    2009-10-30 16:40:24 UTC in p7zip

  • Followup: RE: [PATCH] Linux huge pages support

    -> If there is such big gain on PowerPC from large pages, why it was not enabled in Linux before? Do you mean in Linux generally or in the Linux version of p7zip? First I did a port of memlat to Linux x86 and verified that it works correctly by comparing the results to those from MemLat32.exe in Windows XP. Then I added support for PowerPC. If you are interested, you can download the...

    2009-10-29 19:07:45 UTC in p7zip

About Me

  • 2006-10-06 (3 years ago)
  • 1614580
  • j-o (My Site)
  • Joachim Henke

Send me a message