Menu

#491 lame 3.100 slower than 3.99.5

Speed
closed-fixed
nobody
None
5
2021-03-10
2018-03-11
infinity-2
No

Compared to 3.99.5 it is considerably slower - like some CPU features would not work.

I have attached two screenshots - one with 3.99.5 - resulting in a play/CPU ratio of 34x - and one with 3.100 - resulting in a play/CPU ratio of just 14x.

Of course nothing else changed between testing both besides the lame executable.

This was reported with the OS/2 port of lame, but turned out to most probably not be an OS/2 specific problem, but more a general one. see -> https://github.com/komh/lame-os2/issues/1#issuecomment-372090686

2 Attachments

Discussion

  • Elio Blanca

    Elio Blanca - 2018-03-16

    I did a series of tests on my own, and I can confirm this.
    On ubuntu 64 bit I have lame-3.99.5 running at 44x, lame-3.100 runs at 21x.
    On an old ubuntu 32 bit, almost the same results.
    This indeed is a big difference, and it happens regardless of nasm usage.

    I performed a run of gprof on both lame versions (nasm enabled) and cut & paste the relevant part, attaching report. The first noticeable thing is a big increase in execution time for the "k_34_4" routine, almost 1,5x slower. Rather, other routines run faster in 3.100 but this may be due to the execution order, as when I run lame-3.100 after 3.99.5, audio data are already cached.
    The code into "K_34_4" appears to be the same, so maybe the root reason are some subtle implicit (expensive) conversions taking place ?

     
  • Robert Hegemann

    Robert Hegemann - 2018-03-16

    Can't check it for quite some time, but it looks like there is some Debug code active. Try to define NDEBUG and re-compile the code.

     
    • Elio Blanca

      Elio Blanca - 2018-04-04

      Mmmhhh, maybe there's more.
      In my environment (ubuntu 32 and 64 bit distros) setting the macro NDEBUG during the build didn't change anything and the speed difference is still noticeable.

       
      • Robert Hegemann

        Robert Hegemann - 2018-04-06

        I'll try to find the change Alexander made to the building system, which prevents the NDEBUG macro doing its thing.

        Looks like it came with r6323:
        https://sourceforge.net/p/lame/svn/6323/

         

        Last edit: Robert Hegemann 2018-04-06
        • Elio Blanca

          Elio Blanca - 2018-05-15

          I reverted the NDEBUG-related part of that commit, I did run 'autoconf', then ./configure & make. Further, in my environment I set CPPFLAGS="-DNDEBUG"

          This build of lame 3.100.1 encodes my test wave file at 28.5x (several runs, of course)
          The lame-3.99.5 shipped with ubuntu 16.04 encodes the same file at 26.0x.
          So, you cought the point! And the latest lame achieves better result too.

          I want to perform further tests, where I won't set the environment CPPFLAGS and see how fast the build will get.
          Up to now, this seems to be the right direction.

          Edit:
          full comparison table for this build
          --preset extreme
          3.99.5 26.0x
          3.100.1 28.5x

          -q 0 -V 0
          3.99.5 26.1x
          3.100.1 28.3x

          -q 0 -V 0 -m s
          3.99.5 28.6x
          3.100.1 31.0x

          -q 0 -b 320
          3.99.5 6.5x
          3.100.1 6.3x

          -q 0 -b 320 -m s
          3.99.5 6.6x
          3.100.1 6.5x

           

          Last edit: Elio Blanca 2018-05-15
  • Elio Blanca

    Elio Blanca - 2018-05-25

    New test session: same lame 3.100.1 svn 6432, NDEBUG commit reverted, then launched "autoconf" to get new configure. I took care not to define NDEBUG in my environment.
    Same hardware, same distro.

    --preset extreme
    3.99.5 40.8x
    3.100.1 47.3x

    -q 0 -V 0
    3.99.5 41.1x
    3.100.1 47.5x

    -q 0 -V 0 -m s
    3.99.5 45.6x
    3.100.1 53.8x

    -q 0 -b 320
    3.99.5 11.4x
    3.100.1 11.6x

    -q 0 -b 320 -m s
    3.99.5 11.7x
    3.100.1 12.0x

    [the encodings seem surprisingly faster now but this is actually due to a heavy x265 encoding in background during previous session]

    Indeed, it seems that commit 6323 caused the slow down.

     
  • Vyacheslav

    Vyacheslav - 2018-12-04

    Same problem with r6435. Slow encoding on Mac OS 10.11.6, MinGW 5.3, MinGW 7.3 (x64).
    Configure doesn't add optimization flags for gcc. I've set them manually:
    export CFLAGS ="-O3 -ffast-math -funroll-loops -Wall"
    and started configure again. And result is excellent. Encoding speed is 3x faster.

     
  • Pasi Niemi

    Pasi Niemi - 2018-12-20

    Any chance to releasing a fix for this? Seems that Elio found the problem and this is blocking my upgrade.

     
  • Jang Ryeol

    Jang Ryeol - 2018-12-28

    Actually, it is gcc optimization problem rather than NDEBUG symbol.

    There is a patch for this in the mailing list.

    The case statement fails to match as the braces are stripped when the
    configure.in is converted to the configure script.
    
    This is bad, as the configure script then decides to set GCC to false,
    and no optimisation flags are applied.
    
    Signed-off-by: Joel Stanley <joel@...>
    ---
    v2: The first fix was incorrect, as I was testing the changes
    by hacking configure not configure.in. When generating configure, single
    quotes are removed.
    
    This has been tested on x86_64 and ppc64le.
    
     configure.in | 2 +-
     1 file changed, 1 insertion(+), 1 deletion(-)
    
    diff --git a/configure.in b/configure.in
    index cea313d..aee44a6 100644
    --- a/configure.in
    +++ b/configure.in
    @@ -96,7 +96,7 @@ if test "x${GCC}" = "xyes"; then
            AC_MSG_CHECKING(version of GCC)
            GCC_version="`${CC} --version | sed -n '1s/^[[^ ]]* (.*) //;s/ .*$//;1p'`"
            case "${GCC_version}" in 
    
    -       [0-9]*[0-9]*)
    +       [[0-9]]*[[0-9]]*)
                AC_MSG_RESULT(${GCC_version})
                ;;
            *)
    -- 
    2.17.1
    
     

    Last edit: Jang Ryeol 2018-12-28
    • Elio Blanca

      Elio Blanca - 2019-01-02

      Finally, this is due to an unfortunate sum of both issues.
      Using information from both this thread and the ML, the lame executable is supposed to have a noticeable speed boost.

       
  • Alexander Leidinger

    The NDEBUG part is there on purpose, it is to have an early exit on bad input values. This is to make lame abort the encoding when it encounters bad input, instead of generating garbage. If we would revert that part, we will open up the executable to buffer overflow issues.

    The configure part for gcc version detection looks not OK for me. It makes the glob pattern bash specific. A normal POSIX shell does not have the notation on "[[...]]".
    Can someone please test this instead?:

     % svn diff configure.in
    Index: configure.in
    ===================================================================
    --- configure.in        (Revision 6449)
    +++ configure.in        (Arbeitskopie)
    @@ -96,7 +96,7 @@
                    AC_MSG_CHECKING(version of GCC)
                    GCC_version="`${CC} --version | sed -n '1s/^[[^ ]]* (.*) //;s/ .*$//;1p'`"
                    case "${GCC_version}" in
    
    -               [0-9]*[0-9]*)
    +               [:digit:]*[:digit:]*)
                            AC_MSG_RESULT(${GCC_version})
                            ;;
                    *)
    
     
    • Elio Blanca

      Elio Blanca - 2020-05-25

      This is to make lame abort the encoding when it encounters bad input, instead of generating garbage. If we would revert that part, we will open up the executable to buffer overflow issues.

      Maybe I'm a bit hard to understand but, should different problems be solved using different approaches?
      If one feeds lame with garbage, then lame will create garbage, but it's OK, this is not lame's responsibility - it's just an encoder.
      On the other hand, do the buffer overflow issues still apply after the many Robert's fixes?

       
      • Alexander Leidinger

        Unfortunately there are cases were it not only produces garbage when you feed garbage, but it also crashes or there are buffer over-/underruns when you feed garbage, and this is not ok, specially when we talk about the library being used in a 3rd party application.

        A part of the fixes for crashes was to enable the asserts even in release-builds, as they catch a lot of such cases. Maybe not all of the asserts are needed, but nobody did the work to find out which ones are not needed.

        If someone is interested in doing the work:

        • make sure you compile with the asserts active
        • run a lot of code-fuzzing tests on lame
        • mark all asserts which trigger in some way as needed (e.g. qith an unique tag i na comment)
        • hope that you run enough fuzzing to have catched all possibilities
        • convert all those asserts into a different kind of error-return (a library should return wit han error instead of abort()ing)
        • submit the changes

        If we have this (and you would have to provide good arguments for the "hope" part), we could switch back the NDEBUG change.

        Alternatively someone could to a performance analysis (profile guided analysis) of with and without asserts, and comapre them to detect the 1-3 asserts which cause this performance drop, and come up with a better way to detect the issues without causing the performance drop.

         
        • Elio Blanca

          Elio Blanca - 2020-05-25

          Those are actually good points, so good it would be a pity getting them lost into a (soon-to-be-closed) ticket. Anyone willing to take them and develop solutions would be greatly helped by finding such a guide - I see all this post deserves to be pasted into a (revised?) TODO, shipping with the lame tarball. What do you think?

           
  • Alexander Leidinger

    Forget what I wrote about the gcc version case, this is not some bashism, this is autoconf stuff suppressing a single [. I understood the issue now and I am working on a fix (for more places than just this one specific case statement).

     
  • Alexander Leidinger

    In SVN there is now a fix for the gcc version detection in the configure script. Anyone who had the issue before: can you pleas etest and validate that it works for you?

     
  • Fabian Greffrath

    I have just tested the curent trunk with
    gcc.exe (Rev2, Built by MSYS2 project) 9.3.0

    Optimization flags are now applied. However I had to remove "lame_init_old" from include/libmp3lame.sym in order to get it to compile:

    --- include/libmp3lame.sym (Revision 6455)
    +++ include/libmp3lame.sym (Arbeitskopie)
    @@ -1,5 +1,4 @@
    lame_init
    -lame_init_old
    lame_set_num_samples
    lame_get_num_samples
    lame_set_in_samplerate

     
  • Alexander Leidinger

    The symbol is now removed in SVN. Clang is not as picky as gcc about it...

     
  • Fabian Greffrath

    libmp3lame will need to have its SONAME bumped because of this. It's an ABI-incompatible change!

     
  • Alexander Leidinger

    Theoretically yes. In short: this is something we should have sone 13 years ago, now it is too late.

    Long:
    Practically this is the case since 2007 (r5557). We've made several releases since then. So we didn't had this symbol in libmp3lame since 13 years. I have the impression that a "recent" gcc is more picky about this and errors out. A recent clang (v10) is not picky at all: without lame_init_old removed from the symbol file there is no trace at all about it in the exportet symbols:

    % nm ./libmp3lame/.libs/libmp3lame.so.0.0.0 | grep lame_init
    0000000000028730 T lame_init
    0000000000026e20 T lame_init_bitstream
    0000000000025270 T lame_init_params
    

    And even if there would be a symbol, it matters only if it is used by a program which uses the library (and then we are back to r5557 13 years ago).

     

    Last edit: Alexander Leidinger 2020-05-06
  • Alexander Leidinger

    • status: open --> closed-fixed
     
  • Alexander Leidinger

    Fixes are in v3.101.

     

Log in to post a comment.

MongoDB Logo MongoDB