Menu

#195 upx fails to compress ELF executables created by Go compiler

open
None
5
2015-07-25
2011-09-12
Miki Tebeka
No

(default)[06:34] go-play $file hw
hw: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), not stripped
(default)[06:35] go-play $upx hw
Ultimate Packer for eXecutables
Copyright (C) 1996 - 2010
UPX 3.05 Markus Oberhumer, Laszlo Molnar & John Reiser Apr 27th 2010

File size Ratio Format Name
-------------------- ------ ----------- -----------
upx: hw: EOFException: premature end of file

Packed 1 file: 0 ok, 1 error.

You can get the file at http://dl.dropbox.com/u/706094/hw.bz2

Discussion

  • László Molnár

    • assigned_to: nobody --> jreiser
     
  • John Reiser

    John Reiser - 2011-09-12

    Than you for providing a link to the hw.bz2 example. That really helps!

    For the 'hw' file, the PT_LOADs do not cover everything that gets mapped at execution:
    $ readelf --segments hw
    Type Offset VirtAddr PhysAddr
    LOAD 0x0000000000000c00 0x0000000000400c00 0x0000000000400c00
    Thus in particular the 0xc00 bytes at Offset 0 and VirtAddr 0x400000 are not described.
    Looking further (aided by "readelf --headers hw"), the space after the last section header (offset 456 + 30*64) to the PT_INTERP (offset 0xbe4) is a "hole" that contains nothing and is all zero. Why does the Go language processor arrange executables in this fashion, and what is special about 0xc00?

    Looking at the source code in p_lx_elf.cpp function PackLinuxElf64amd::canPack(),
    // The first PT_LOAD64 must cover the beginning of the file (0==p_offset).
    // Just avoid the "rewind" when unpacking?
    //if (phdr->p_offset != 0) {
    // throwCantPack("invalid Phdr p_offset; try '--force-execve'");
    // return false;
    //}
    and the corresponding test for PackLinuxElf32::canPack() is _not_ commented out.

    So the quick-and-dirty response is to re-instate that commented-out test for 64-bit executables, with the result that upx will decline to pack them unless --force-execve. After that, it requires engineering to deal with the "undescribed space" below 0x400c00. Already there is logic to deal with undescribed space between PT_LOADs or after the last PT_LOAD, but not below the first PT_LOAD.

     
  • Miki Tebeka

    Miki Tebeka - 2011-09-12

    Russ Cox from the Go team gave the following answer (https://groups.google.com/d/msg/golang-nuts/UtrMwo7zM7M/O9WsnZdyfYMJ)

    An ELF PT_LOAD segment is allowed to specify an
    address that is not aligned according to the alignment
    it requests. In that case, the loader rounds the va
    and file offset down to the nearest boundary and
    increases the total size up by the same amount.
    Similarly, the total size is then rounded up to the
    nearest boundary.

     
  • Nobody/Anonymous

    A reply to the tebeka comment would be interesting...

     
  • John Reiser

    John Reiser - 2012-08-09

    In reply to the tebeka comment of 2011-09-12 10:05:35 PDT: For a PT_LOAD, as long as .p_align divides (.p_vaddr - .p_offset), then is is permissible for the manager of the memory address space to expand the mapped interval to a convenient set of pages which cover the interval of addresses. It is also permissible for the manager of the address space to honor the indicated range _exactly_: the executing process must not depend on bytes that lie outside the interval [.p_vaddr, .p_memsz + .p_vaddr). For instance, dl_iterate_phdr() might be undefined when PT_PHDR lies outside of all PT_LOAD. The decompression into memory by UPX stub at beginning of execution of a compressed program also depends on PT_PHDR being inside the first PT_LOAD. Thus the scheme used by the Go language processor is unreliable.

    The major problem arises during "upx --decompress ./my_app.compressed". It is required that the output be identical to the original never-compressed ./my_app. At compress time then UPX could expand the first PT_LOAD to cover the 0xc00 bytes of Go, by _changing_ the .p_vaddr. .p_filesz, and .p_memsz. But then the --decompress output would have those changes, and be different from the original. There is no convenient place to record the changes, and it is poor practice to add a quirk when Go's format already has problems.

    The easiest way to get things to work is to modify the executable "offline", before compressing via UPX, so that PT_LOAD{0].p_offset==0. Open Watcom 1.9 on MS Windows generates ELF executables with a similar configuration. I will upload a short utility "hemfix.c" which works for that case. Click on "Attached File" near the bottom of this page [there is an invisible button there: rollover and see the pointer change].

     
  • John Reiser

    John Reiser - 2012-08-09

    "fix the hem": lower PT_LOAD[0].p_vaddr to 0
    See updated attachment hemfix.c 2015-07-25.

     

    Last edit: John Reiser 2015-07-25
  • Peter Waller

    Peter Waller - 2014-05-30

    I'm the author of goupx which is just a reimplementation of hemfix which then calls upx. The main motivation was to make hemfix more accessible to the community.

    The docker binary is rather large at 17mb, so a good candidate for copmressing, but it doesn't work! The hem condition fails:

    https://github.com/pwaller/goupx/blob/master/hemfix/hemfix.go#L120

    I've attached the output of readelf --headers.

    Any ideas, John?

    Thanks,

    • Peter
     
  • John Reiser

    John Reiser - 2014-05-31

    I get Page Not Found (HTTP 404 error) when I try to access http://dl.dropboxusercontent.com/u/706094/hw.bz2 . Is there another way for me to get a copy of the file?

     
  • Peter Waller

    Peter Waller - 2014-06-01

    The old binary packs fine. The new one doesn't even with hemfix: http://pwaller.net/tmp/docker.bz2

    (note: I modified the hemfix.c source to replace the Elf32-like constants and types for Elf64. The output is "Not modified", which is the same as I get with the go rewrite, which supports both x64 and i386 in principle)

     
  • John Reiser

    John Reiser - 2014-06-01

    The PT_LOAD overlap in the file; here from offset 0xc00000 to 0xdd6af3 ["readelf --segments docker"]:

      Type           Offset             VirtAddr           PhysAddr~~~~
                     FileSiz            MemSiz              Flags  Align
      LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                     0x0000000000dd6af3 0x0000000000dd6af3  R E    200000
      LOAD           0x0000000000c00000 0x0000000001200000 0x0000000001200000
                     0x0000000000200cf4 0x000000000022b350  RW     200000
    

    The high alignment 0x200000 contributes to the problem. upx already deals with the other case: gaps in the file, especially when alignment is low (0x1000), but needs to be enhanced to handle overlap such as with this file "docker".

     
  • Peter Waller

    Peter Waller - 2014-06-01

    Is this something that can be fixed up with a quick change to hemfix in principle, or only fixed by changing UPX?

     
  • John Reiser

    John Reiser - 2014-06-01

    hemfix could get rid of the overlap by making the file roundup(0xdd6af3 - 0xc00000, 0x200000) bigger: Slide the 2nd PT_LOAD (and everything that follows it) so that the Offset becomes 0xe00000. upx itself ought to diagnose it better, then handle it correctly.

     
  • Peter Waller

    Peter Waller - 2014-06-01

    Please forgive my ignorance - Does that involve anything other than modifying the header, or do we need to actually insert NULL's into the file? If so, where? At the end?

     
  • Peter Waller

    Peter Waller - 2014-06-01

    Right, after having tried it I see the problem. You actually have to rearrange the file. Hm.

     
  • John Reiser

    John Reiser - 2014-06-01

    Now I see what is going on. hemfix is lowering the hem on each PT_LOAD. Don't do that. Instead, lower the hem only on the first PT_LOAD, the one that gets to Offset 0. So, put a "break;" in the loop after lowering the hem.

     
  • Peter Waller

    Peter Waller - 2014-06-01

    Nice. So it turns out that upx works fine on this binary without doing anything to it and we discovered a bug. Your solution works. Nice, thanks!

     
  • Miki Tebeka

    Miki Tebeka - 2015-06-09

    Sorry, didn't see the comment about missing hw.bz2. I've created a new one (with same problem) and uploaded it there.

     
  • John Reiser

    John Reiser - 2015-07-25

    @Miki: source p_lx_elf.cpp has been enhanced better to diagnose Go-language PT_LOAD. Repository at https://www.pysol.org:4443/hg/upx.hg

    Update attachment hemfix.c

    See also https://github.com/pwaller/goupx hemfix on Golang support running for all platforms

     

    Last edit: John Reiser 2015-07-28

Log in to post a comment.

MongoDB Logo MongoDB