Menu

#44 /proc/self/exe is wrong for linux ELF

closed-rejected
nobody
None
5
2006-10-13
2006-09-22
No

Hi,
my software needs to be relocatable and thus reads
from /proc/self/exe its location and uses it to "guess"
the prefix where it has been installed; e.g. if it's in
"/home/me/.local/bin/myprog" using /proc/self/exe it
understands that has been installed with
prefix=/home/me/.local

However, after compressing myprogram with UPX, it fails
to read (through readlink() API) the /proc/self/exe link.

This is a big problem for relocatable programs...

Discussion

  • John Reiser

    John Reiser - 2006-09-24

    Logged In: YES
    user_id=33687

    /proc/self/exe records a symlink to the first argument of
    execve(), but only as long as the process still has a page
    mapped from that file. When running a compressed executable
    on i386 and some other platforms, UPX does not retain any
    pages from that file once decompression has finished and
    execution of the original program begins, so the linux
    kernel drops the symlnk.

    Using /proc/self/exe is not a robust method of discovering
    the path that was specified to execve. In general, systems
    derived from UNIX don't have to provide that information.
    On Linux only, the kernel records the actual path as an
    "extra" environment string. The path that was specified to
    execve() begins just beyond the '\0' which terminates the
    character array of the last original environment variable.

    The portable way is to invoke the application via a
    shell-script wrapper which sets an environment variable
    appropriately. Of course, this means that the wrapper must
    change whenever the script is moved.

    The ability of the UPX runtime stub to decompress directly
    to memory, and create an address space which is identical to
    that of running the uncompressed program (without any
    remnants of the compressed file) is considered to be an
    advantage. The resulting disappearance of /proc/self/exe is
    not a bug in UPX.

     
  • John Reiser

    John Reiser - 2006-09-24
    • status: open --> open-rejected
     
  • Francesco Montorsi

    Logged In: YES
    user_id=591038

    I don't have good knowledge of Unix kernel logic but I
    understand why you say that's not a bug in UPX itself.

    However, since the wrapper script approach is not feasible
    (for me) and reading beyond the final \0 which terminates
    the last env var string looks like a sort of "hack" (to me),
    I wonder if UPX cannot provide the symlink information
    provided by /proc/self/exe in some other way to the
    compressed program (e.g. setting an environment variable
    with /proc/self/exe destination as content).
    Is this possible?

    It would be really an invaluable feature for me...

     
  • John Reiser

    John Reiser - 2006-09-24

    Logged In: YES
    user_id=33687

    If UPX were to provide the old contents of /proc/self/exe
    via an environment variable, then the runtime stub would
    grow larger. In addition to calling readlink(), the
    existing env[] would have to be extended by one variable,
    and the argv[] and env[] arrays would have to be moved by a
    variable amount to make room for the string. Guestimating,
    it would take around 100 bytes of code. This is more than
    it takes to find the existing extra environment string that
    the Linux kernel already has put into a good spot.

     
  • Francesco Montorsi

    Logged In: YES
    user_id=591038

    I've tried to read after the \0 of the last env var of the
    original "environ" of my C++ program.
    And in fact, there's a patch there but as you said this is
    the path that was specified to execve() and thus can be a
    relative path. E.g. if I run my test program from the
    console with "./myprog", in that memory address there's
    "./myprog".
    Which is not enough for my relocation aims...

    would it be possible to make UPX add that env variable (or
    even better recreate the /proc/self/exe link even if I
    really don't know if this is possible at all) as optional
    feature (to turn on with a command line option)?
    I think this feature is worth 100 bytes of code ;))

     
  • John Reiser

    John Reiser - 2006-09-25

    Logged In: YES
    user_id=33687

    What is wrong with a relative path? It's good enough for
    the kernel, after all, so it _must_ be good enough for the
    process.

    Having written and tested the code, on i386 it costs 93
    bytes to perform readlink("/proc/self/exe",,) and pass the
    result as a new environ[0] with name " " [three spaces].
    It also leaves an unused "hole" of 4095 - strlen(link_text)
    bytes on the stack. Closing the hole costs about 40 more
    bytes of code [that part is ugly.] On other $ARCH the cost
    can double.

     
  • Francesco Montorsi

    Logged In: YES
    user_id=591038

    >What is wrong with a relative path? It's good enough for
    >the kernel, after all, so it _must_ be good enough for the
    >process.
    you're right - sorry. I've found that my test program was
    changing the working directory down in some function calls
    and that made me impossible to use the relative path
    provided by the kernel. I've now placed the path detection
    code at the beginning of the program and it works.

    Just a last question: does this trick works for any linux
    kernel (i.e. should I check linux kernel version before
    reading past last \0 or it's always safe to do it) ?
    I couldn't find any mention of this (very useful!) trick
    with google... and I fear that, if this is an undocumented
    feature, it may disappear at some point from linux kernels
    and make my programs segfault...

    About an eventual modification of UPX: IMHO such feature
    really worths 133 bytes of code. E.g. I'm using UPX on a
    linux program which is about 6 Megabytes uncompressed and
    1.6 when compressed with UPX. 133, 200 or 1000 additional
    bytes to keep it relocatable are absolutely acceptable for me.
    Last, if the feature can be turned off (or on) using an
    option, what's the problem?
    If the developer doesn't care about that small additional
    code, he can use that option and workaround this problem...

    Very very last, if you're decided to not add this option,
    then I think it would be a wise thing to document the
    "you-can-read-past-last-\0" trick somewhere in UPX website...

    Thanks a lot!

     
  • John Reiser

    John Reiser - 2006-09-25

    Logged In: YES
    user_id=33687

    The "trick" of pathname to execve() being recorded just
    beyond the terminating '\0' of the last environment string
    has been in Linux for at least 7 years, in all 2.6, 2.4, and
    2.2 kernels; perhaps even in 2.0 and earlier. The kernel
    source is fs/exec.c, function do_execve():
    retval = copy_strings_kernel(1, &bprm->filename, bprm);
    followed by copy_strings() of envp and argv.

    Although 133 bytes is not much to your application, it is
    1/4 of a 512-byte disk sector, and this matters to small
    executables on small flash memory devices (USB memory stick,
    CompactFlash, SecureDigital cards, etc.) Putting the code
    into all the architectures supported by UPX is somewhat
    tedious, and making it optional adds to complexity. I'll
    think about it.

    Your suggestion for more documentation about UPX and
    /proc/self/exe is a good one, and I'll add a note.

     
  • John Reiser

    John Reiser - 2006-10-11

    Logged In: YES
    user_id=33687

    Please try the beta upx-2.90 available from
    http://upx.sourceforge.net/download/unstable/upx-2.90-i386_linux.tar.gz
    which captures the original readlink("/proc/self/exe") in
    the environment variable " " [three spaces]. Your code
    can still use the "extra" environment string that the Linux
    kernel sets up, too; the upx runtime stub does not touch it.

    If you are really serious about this sort of thing, then
    your code should be prepared to deal with the possiblity
    that even the upx runtime stub is not the first code which
    runs in a new process. Somebody else, such as rtldi
    (http://bitwagon.com/rtldi/rtldi.html) may have been run
    first instead; ad infinitum.

    Ultimately, the _only_ way to identify "the installation
    directory" for sure on a Linux system (or any UNIX,
    including SunOS, *BSD, etc.) is to hardwire it [for
    instance, as /opt/my_app, which is the officially supported
    way under Linux Standard Base], or for a wrapper script to
    pass along that information as an environment variable [and
    then the wrapper script must change when the files move.]
    wxWidget's attempt to "port" the Win32 idea of installation
    directory is fundamentally _NOT_ portable to non-Win32 systems.

    If the sole purpose is to locate dependent runtime modules
    such as shared libraries, then ld-linux.so.2 honors
    "${ORIGIN}" in DT_NEEDED names and R_PATH specifications.
    glibc2 stole this from Solaris when RedHat found it to be a
    checkbox item in a contract for Linux to replace Solaris.
    glibc2 does not document it, and the Solaris documentation
    is obscure, too.

     
  • Francesco Montorsi

    Logged In: YES
    user_id=591038

    I've just tried upx 2.90 and it works like a charm!
    Thanks!

    About the fact that the only sure way under *nix systems to
    get the installation directory is to hardcode it: I know but
    I'm targeting linux-only users which must have /proc mounted
    for other reasons (or the program won't be useful in any
    case) and with a recent kernel version so that I'm quite
    sure that reading /proc/self/exe (or " " env var) will
    give a highly reliable result.

    I'm marking this "bug" as closed; hope to see 2.90 released
    soon; thanks for your assistance with this issue!

     
  • Francesco Montorsi

    • status: open-rejected --> closed-rejected
     

Log in to post a comment.

MongoDB Logo MongoDB