#168 Text relocations in mpg123 library

1.24.x
closed-fixed
nobody
mpglib (31)
5
2017-05-30
2012-03-14
snowman123
No

Hi,

I just noticed that, on i686, there is a text relocation in the libmpg123.so library which is a potential security problem:

$ readelf -d /usr/lib/libmpg123.so.0.29.8 |grep TEXT
0x00000016 (TEXTREL) 0x0

This problem seem to be specific to the i686 architecture as there is no such text relocations on x86_64. Both the 1.13.4 and 1.13.6 (I haven't checked 1.13.5) stable versions have that problem as well as the 1.14 beta version.

Let me know if you need more informations.

Thanks.

Discussion

  • Thomas Orgis

    Thomas Orgis - 2012-03-15

    This is a known point if you enable the assembly optimizations on x86. The assembly is written without reference to the global offset table. And I am honest with you: That will not change unless someone provides a patch achieving that and this patch doesn't weaken the performance on register-starved x86.

    You have some options to avoid those relocations:

    1. Build without assembly code: ./configure --with-cpu=i386 (or --with-cpu=generic ... both use plain C code).
    2. Don't use libmpg123 as shared library (./configure --disable-shared).

    If you're about maximum security, I would generally recommend using the plain C decoder as single option, even just because it reduces the amount of code built into the library. Less code, less attack surface.

    It would be nice if you would confirm that limiting the built-in decoders fixes the occurence for you. Then we have a representative entry for this in the mpg123 tracker, too, not just at https://bugs.gentoo.org/show_bug.cgi?id=164504 .

     
  • snowman123

    snowman123 - 2012-03-16

    Thanks for the explanation and suggestions. I'll go with the first option as I maintain mpg123 for the Arch Linux distribution and we tend to prefer dynamic libraries over static ones. I can confirm that using the --with-cpu options for either generic_fpu, i386 or i586 removes the text relocation. I suppose that using --with-cpu=i585 will give more performant code than --with-cpu=generic_fpu. Is that assumption correct?

     
  • Thomas Orgis

    Thomas Orgis - 2012-03-16

    The picture regarding i586 optimization is not very clear, I suggest you do a little benchmarking yourself (time mpg123 -t /some/album/*.mp3).
    Actually, having whitnessed utter insanity with gcc's code generation on an actual x86 machine when investigating performance of the generic decoder... I guess i586 is at least more reliable. Some testing in a 32 bit chroot confirms that there still is some advantage (though in x86-64 mode, the generic code beats 32 bit i586 code on my box).

    And: Are we talking about a secure variant (hardened branch) of Arch packages or "the" mpg123 package for x86? If there is no absolute requirement for the non-presence of textrels, I would not like you dropping mpg123 performance out of principle.

    Of course, I'm nobody to enforce what you do in Arch, I just say I wouldn't like it;-)

     
  • snowman123

    snowman123 - 2012-03-21

    FTR, I asked on the dev mailing list about text relocations but no-one seem to care. So I'll continue building mpg123 with the assembly optimization. I guess this bug report can be closed now.

     
  • Thomas Orgis

    Thomas Orgis - 2012-05-01
    • status: open --> closed
     
  • Won Kyu Park

    Won Kyu Park - 2017-03-07

    this is a quick hack to fix x86 TEXTRELs problem.
    not much tested. a simple test with "mpg123 -t foobar.mp3" seems ok.
    (and mpg123 -t -w foo.wav foobar.mp3. to get the decoded wav file.)

    https://github.com/wkpark/mpg123/tree/textrels
    you can build it with ./configure --with-pic .... and make blahblah...

    $ readelf -d src/libmpg123/.libs/libmpg123.so |grep TEXT
    $
    $ readelf -d src/libmpg123/.libs/libmpg123.so
    Dynamic section at offset 0x56e20 contains 27 entries:
    Tag Type Name/Value
    0x00000001 (NEEDED) Shared library: [libm.so.6]
    0x00000001 (NEEDED) Shared library: [libdl.so.2]
    0x00000001 (NEEDED) Shared library: [libc.so.6]
    0x0000000e (SONAME) Library soname: [libmpg123.so.0]
    0x0000000c (INIT) 0x2f08
    0x0000000d (FINI) 0x441b4
    0x00000019 (INIT_ARRAY) 0x57a74
    0x0000001b (INIT_ARRAYSZ) 4 (bytes)
    0x0000001a (FINI_ARRAY) 0x57a78
    0x0000001c (FINI_ARRAYSZ) 4 (bytes)
    0x6ffffef5 (GNU_HASH) 0x138
    0x00000005 (STRTAB) 0x1160
    0x00000006 (SYMTAB) 0x5f0
    0x0000000a (STRSZ) 2954 (bytes)
    0x0000000b (SYMENT) 16 (bytes)
    0x00000003 (PLTGOT) 0x58000
    0x00000002 (PLTRELSZ) 848 (bytes)
    0x00000014 (PLTREL) REL
    0x00000017 (JMPREL) 0x2bb8
    0x00000011 (REL) 0x1f48
    0x00000012 (RELSZ) 3184 (bytes)
    0x00000013 (RELENT) 8 (bytes)
    ...

    (no TEXTREL entry found)

     
  • Thomas Orgis

    Thomas Orgis - 2017-03-11
    • status: closed --> open
    • Group: --> 1.24.x
     
  • Thomas Orgis

    Thomas Orgis - 2017-03-11

    Thanks for working on this. I need to boot an AMD box in 32 bit mode to verify the changes to 3DNow … and check if this has any performance impact.

    Do you have a comparison of decoding time for a given MP3 input with and without the GOT changes?

     
  • Won Kyu Park

    Won Kyu Park - 2017-03-12

    I have tested with the following conditon.
    OS : ubuntu 16.04.1 X86_64
    uname -a Linux heaven 4.8.0-32-generic #34~16.04.1-Ubuntu SMP Tue Dec 13 17:03:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux


    Reference condition 1) cross compile with -m32 option
    ./configure --with-cpu=x86 --enable-int-quality --enable-shared \ CC=gcc CFLAGS=" -m32 " --host=x86-pc-linux --target=x86-pc-linux

    $ readelf -d usr/local/lib/libmpg123.so |grep TEXT
    0x00000016 (TEXTREL) 0x0
    $

    $make install DESTDIR=/tmp/mpg123-shared/

    (and made a small script to set LD_LIBRARY_PATH=/usr/lib/:/tmp/mpg123-shared/usr/local/lib/ ...)

    mp3 audio files for testing
    A: http://www.loe.org/content/2017-03-10/loe_170210_web.mp3 (~50min)
    B: http://www.loe.org/content/2016-03-25/loe_160325_web.mp3 (~50min)

    selected decoder by mpg123 : SSE (mpg123 -v option)

    test command : time mpg123 -t -w out.wav input.mp3

    mp3 A : 0m6.113s / 0m6.059s / 0m5.545s (real time only)
    mp3 B : 0m7.380s / 0m6.431s / 0m6.351s

    $ md5sum loe_160325_web.wav
    4c2d5dd097f6a2f2d495f460a0d93355 loe_160325_web.wav
    $ md5sum loe_170210_web.wav
    aa6c228a334cef97be09e6efe01b08c9 loe_170210_web.wav


    TEXTRELs patched case 2) cross compiled with -m32 -DPIC option
    ./configure --with-cpu=x86 --enable-shared --enable-int-quality --with-pic \ CC=gcc CFLAGS=" -m32 -DPIC" --host=x86-pc-linux --target=x86-pc-linux

    $ readelf -d usr/local/lib/libmpg123.so |grep TEXT
    $ make install DESTDIR=/tmp/mpg123-notextrels/

    selected decoder by mpg123 : SSE

    $ time mpg123 -t -w out.wav input.mp3
    mp3 A : 0m7.118s / 0m5.614s / 0m6.045s / 0m6.137s (real time only)
    mp3 B : 0m7.148s / 0m7.384s / 0m7.208s / 0m6.798s

    $ md5sum loe_170210_web.wav
    aa6c228a334cef97be09e6efe01b08c9 loe_170210_web.wav (identical)


    and now all remaining decoders MMX/3DNOW/3DNOWEXT works nicely
    (build all decoders separatly by changing -DOPT_blahblah CFLAGS.)

    https://github.com/wkpark/mpg123/tree/no-textrels (patches are merged/rebased)

     
    Last edit: Won Kyu Park 2017-03-12
  • Thomas Orgis

    Thomas Orgis - 2017-03-12

    Thanks for investigating. Good that you got 32 bit support ready for testing (I'm on pure 64 bit since ten years or so … with a 32 bit chroot if need arises, but not on my remaining AMD 3DNow-sporting box yet). What CPU do you have? Some Athlon 64 variant, I presume?

    As to the measurements … they are not conclusive yet.

    textrel
     mp3 A : 0m6.113s / 0m6.059s / 0m5.545s = 5.9 +/- 0.3
     mp3 B : 0m7.380s / 0m6.431s / 0m6.351s = 6.7 +/- 0.6
    
    no textrel
    mp3 A : 0m7.118s / 0m5.614s / 0m6.045s / 0m6.137s = 6.2 +/- 0.6
    mp3 B : 0m7.148s / 0m7.384s / 0m7.208s / 0m6.798s = 7.1 +/- 0.2
    

    There seems to be a tendency for the PIC code to be a bit slower, but this is still within the margin of a standard deviation of the current mpg123. Can you try this:

    1. Read one test file into memory (decode it once, copy to /dev/shm)
    2. Run one instance of mpg123 each, with --loop 10 added.

    That should result in runtimes around one minute and hopefully gives a clearer picture about the performance difference, if any. If there is a difference, we need to discuss how to handle that (recommending static non-PIC builds to people who want utmost performance can be one solution).

    Oh, and then we'll get to the /* FIXME */ I just saw;-)

     
  • Won Kyu Park

    Won Kyu Park - 2017-03-12

    Yes
    I have AMD Phenom(tm) II X4 945 Processor (/proc/cpuinfo says 3dnowext,3dnow,sse2 etc.)

    I've made some script to run loop and also run with --loop 10 option too.

    TOP=`realpath .`
    for in `seq 0 9`; do
    LD_LIBRARY_PATH=/usr/lib:/lib:$TOP/usr/local/lib \
    /usr/bin/time -f "real %e" usr/local/bin/mpg123 -t /dev/shm/loe_170210_web.mp3
    done
    
    LD_LIBRARY_PATH=/usr/lib:/lib:$TOP/usr/local/lib \
    /usr/bin/time usr/local/bin/mpg123 --loop 10 -t /dev/shm/loe_170210_web.mp3
    

    TEXTRELs cases

    reference case #1: decoder SSE:

    $ readelf -d usr/local/lib/libmpg123.so |grep TEXT
     0x00000016 (TEXTREL)                    0x0
    
    • total 47.98sec (= 4.36 4.37 4.41 4.43 4.32 4.34 4.38 4.34 4.31 4.37 4.35)
    • total 43.51sec with --loop 10 option.

    one more try:
    - total 48.03sec (= 4.37 4.37 4.36 4.33 4.35 4.37 4.40 4.36 4.39 4.34 4.39)
    - total 42.19sec (with --loop 10)

    reference case #1-static : decoder SSE / configured with --enable-shared=no option

    $ ldd mpg123
    not a dynamic executable
    
    • total 47.49sec (=4.29 4.30 4.37 4.32 4.31 4.33 4.32 4.32 4.33 4.31 4.29)
    • total 41.60sec (with --loop 10)

    case #2: decoder MMX
    - total 64.38sec (=5.86 5.87 5.86 5.87 5.86 5.85 5.85 5.85 5.83 5.85 5.83)
    - total 56.95sec (with --loop 10)

    case #3: decoder 3dnow
    - total 77.07sec (=7.02 7.02 7.03 7.01 6.98 6.99 7.00 7.01 7.00 7.01 7.00)
    - total 68.69sec (with --loop 10)

    case #4 decoder 3dnowext
    - total 56.18sec (=5.09 5.09 5.09 5.11 5.12 5.10 5.11 5.11 5.14 5.12 5.10)
    - total 49.50sec (with --loop 10)


    No TEXRELs cases

    no textrels case #1 (decoder SSE)

    $ readelf -d usr/local/lib/libmpg123.so |grep TEXT
    $
    
    • total 48.22sec (= 4.40 4.39 4.37 4.38 4.40 4.36 4.38 4.41 4.39 4.36 4.38)
    • total 42.43sec with --loop 10 option

    one more try:
    - total 48.22sec (= 4.41 4.38 4.39 4.38 4.37 4.36 4.39 4.38 4.36 4.41 4.39)
    - total 41.98sec with --loop 10 option

    no textrels case #2 (decoder MMX)
    - total 64.37sec (= 5.87 5.83 5.87 5.91 5.83 5.86 5.83 5.83 5.84 5.86 5.84)
    - total 57.06sec (with --loop 10)

    no textrels case #3 (decoder 3dnow)
    - total 77.90sec (= 7.12 7.08 7.04 7.11 7.06 7.10 7.07 7.06 7.09 7.10 7.07)
    - total 69.48sec (with --loop 10)

    no textrels case #4 (decoder 3dnowext)
    - total 57.47sec (= 5.27 5.24 5.21 5.24 5.20 5.23 5.20 5.22 5.23 5.21 5.22)
    - total 50.43sec (with --loop 10)

     
  • Thomas Orgis

    Thomas Orgis - 2017-03-13

    So the case for SSE and MMX seems to be fine, while 3DNow suffers a bit. Thing is, yor CPU and also the Athlon II X3 I have around, are not the proper platforms to judge 3DNow performance. Indeed, an old Athlon or even K6-2/3 has different characteristics. But then, even I don't have one of those ready for operation anymore (while I do have a real i386 still ready;-).

    I'm leaning towards just applying this change now, to enable folks to enjoy their W^X relocationless mpg123 x86 binaries, but not after verifying things myself on my Athlon II in 32 bit mode. Oh … and again, what about this one FIXME left there?

    Thanks for your work in any case, I will need to have some time to have a close look at his, especially the mangle.h changes (why an assembler macro there when we already got cpp?). I am confident that the next mpg123 release will fix the textrels.

     
    • Won Kyu Park

      Won Kyu Park - 2017-03-13

      ...what about this one FIXME left there?

      the FIXME was in the dct64_3dnowext.S and already fixed in the last patch :)

      and the last commit fix dct64_sse.S to fix crash bug with some configure options (I guess --enable-real=no option)
      https://github.com/wkpark/mpg123/tree/no-textrels2 : merged commits (attached patch file)
      https://github.com/wkpark/mpg123/tree/textrels : all changes

      ... I will need to have some time to have a close look at his, especially the mangle.h changes (why an assembler macro there when we already got cpp?).

      as I already mentioned.. this is a quick hack :)

      currently mangle.h is only included in the *.s files. and I have no idea to insert GOT_GET macro like as x264(they removed their x86 PIC support by 5a8727adddf4 commit in 2008). the libvpx have some useful GOT macro in the vpx_ports/x86_abi_support.asm.
      FFmpeg also drop x86 PIC support by commit 3f87f39cb898 in 2009

      libvpx/x264/FFmpeg implement GET_GOT macro using YASM/NASM's macro and GNU AS have also similar macro feature but not good enough to make GET_GOT work. so I made cpp+macro mix hack. It seems not good enough to merged into mpg123 trunk but I guess enough to post on this issue.

      ... I am confident that the next mpg123 release will fix the textrels.

      Thankyou so much!

       
      Last edit: Won Kyu Park 2017-03-13
  • Thomas Orgis

    Thomas Orgis - 2017-05-07

    I spent quite some time now staring at benchmark results in a 32 bit chroot on my K10 machine and it is really hard to make out any reliable differences. For example, accurate rounding seems to be be generally faster on this setup. But numbers always vary. Getting at differences below 5 % really seems to be too tricky to be worth it nowadays.

    I'll roll the textrel stuff into the next release.

     
  • Thomas Orgis

    Thomas Orgis - 2017-05-30
    • status: open --> closed-fixed
     
  • Thomas Orgis

    Thomas Orgis - 2017-05-30

    So, the current release got rid of the relocations. Thanks for initiating this.

     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks