Compared to 3.99.5 it is considerably slower - like some CPU features would not work.
I have attached two screenshots - one with 3.99.5 - resulting in a play/CPU ratio of 34x - and one with 3.100 - resulting in a play/CPU ratio of just 14x.
Of course nothing else changed between testing both besides the lame executable.
This was reported with the OS/2 port of lame, but turned out to most probably not be an OS/2 specific problem, but more a general one. see -> https://github.com/komh/lame-os2/issues/1#issuecomment-372090686
I did a series of tests on my own, and I can confirm this.
On ubuntu 64 bit I have lame-3.99.5 running at 44x, lame-3.100 runs at 21x.
On an old ubuntu 32 bit, almost the same results.
This indeed is a big difference, and it happens regardless of nasm usage.
I performed a run of gprof on both lame versions (nasm enabled) and cut & paste the relevant part, attaching report. The first noticeable thing is a big increase in execution time for the "k_34_4" routine, almost 1,5x slower. Rather, other routines run faster in 3.100 but this may be due to the execution order, as when I run lame-3.100 after 3.99.5, audio data are already cached.
The code into "K_34_4" appears to be the same, so maybe the root reason are some subtle implicit (expensive) conversions taking place ?
Can't check it for quite some time, but it looks like there is some Debug code active. Try to define NDEBUG and re-compile the code.
Mmmhhh, maybe there's more.
In my environment (ubuntu 32 and 64 bit distros) setting the macro NDEBUG during the build didn't change anything and the speed difference is still noticeable.
I'll try to find the change Alexander made to the building system, which prevents the NDEBUG macro doing its thing.
Looks like it came with r6323:
https://sourceforge.net/p/lame/svn/6323/
Last edit: Robert Hegemann 2018-04-06
I reverted the NDEBUG-related part of that commit, I did run 'autoconf', then ./configure & make. Further, in my environment I set CPPFLAGS="-DNDEBUG"
This build of lame 3.100.1 encodes my test wave file at 28.5x (several runs, of course)
The lame-3.99.5 shipped with ubuntu 16.04 encodes the same file at 26.0x.
So, you cought the point! And the latest lame achieves better result too.
I want to perform further tests, where I won't set the environment CPPFLAGS and see how fast the build will get.
Up to now, this seems to be the right direction.
Edit:
full comparison table for this build
--preset extreme
3.99.5 26.0x
3.100.1 28.5x
-q 0 -V 0
3.99.5 26.1x
3.100.1 28.3x
-q 0 -V 0 -m s
3.99.5 28.6x
3.100.1 31.0x
-q 0 -b 320
3.99.5 6.5x
3.100.1 6.3x
-q 0 -b 320 -m s
3.99.5 6.6x
3.100.1 6.5x
Last edit: Elio Blanca 2018-05-15
New test session: same lame 3.100.1 svn 6432, NDEBUG commit reverted, then launched "autoconf" to get new configure. I took care not to define NDEBUG in my environment.
Same hardware, same distro.
--preset extreme
3.99.5 40.8x
3.100.1 47.3x
-q 0 -V 0
3.99.5 41.1x
3.100.1 47.5x
-q 0 -V 0 -m s
3.99.5 45.6x
3.100.1 53.8x
-q 0 -b 320
3.99.5 11.4x
3.100.1 11.6x
-q 0 -b 320 -m s
3.99.5 11.7x
3.100.1 12.0x
[the encodings seem surprisingly faster now but this is actually due to a heavy x265 encoding in background during previous session]
Indeed, it seems that commit 6323 caused the slow down.
Same problem with r6435. Slow encoding on Mac OS 10.11.6, MinGW 5.3, MinGW 7.3 (x64).
Configure doesn't add optimization flags for gcc. I've set them manually:
export CFLAGS ="-O3 -ffast-math -funroll-loops -Wall"
and started configure again. And result is excellent. Encoding speed is 3x faster.
Any chance to releasing a fix for this? Seems that Elio found the problem and this is blocking my upgrade.
Actually, it is gcc optimization problem rather than NDEBUG symbol.
There is a patch for this in the mailing list.
Last edit: Jang Ryeol 2018-12-28
Finally, this is due to an unfortunate sum of both issues.
Using information from both this thread and the ML, the lame executable is supposed to have a noticeable speed boost.
The NDEBUG part is there on purpose, it is to have an early exit on bad input values. This is to make lame abort the encoding when it encounters bad input, instead of generating garbage. If we would revert that part, we will open up the executable to buffer overflow issues.
The configure part for gcc version detection looks not OK for me. It makes the glob pattern bash specific. A normal POSIX shell does not have the notation on "[[...]]".
Can someone please test this instead?:
Maybe I'm a bit hard to understand but, should different problems be solved using different approaches?
If one feeds lame with garbage, then lame will create garbage, but it's OK, this is not lame's responsibility - it's just an encoder.
On the other hand, do the buffer overflow issues still apply after the many Robert's fixes?
Unfortunately there are cases were it not only produces garbage when you feed garbage, but it also crashes or there are buffer over-/underruns when you feed garbage, and this is not ok, specially when we talk about the library being used in a 3rd party application.
A part of the fixes for crashes was to enable the asserts even in release-builds, as they catch a lot of such cases. Maybe not all of the asserts are needed, but nobody did the work to find out which ones are not needed.
If someone is interested in doing the work:
If we have this (and you would have to provide good arguments for the "hope" part), we could switch back the NDEBUG change.
Alternatively someone could to a performance analysis (profile guided analysis) of with and without asserts, and comapre them to detect the 1-3 asserts which cause this performance drop, and come up with a better way to detect the issues without causing the performance drop.
Those are actually good points, so good it would be a pity getting them lost into a (soon-to-be-closed) ticket. Anyone willing to take them and develop solutions would be greatly helped by finding such a guide - I see all this post deserves to be pasted into a (revised?) TODO, shipping with the lame tarball. What do you think?
Forget what I wrote about the gcc version case, this is not some bashism, this is autoconf stuff suppressing a single [. I understood the issue now and I am working on a fix (for more places than just this one specific case statement).
In SVN there is now a fix for the gcc version detection in the configure script. Anyone who had the issue before: can you pleas etest and validate that it works for you?
I have just tested the curent trunk with
gcc.exe (Rev2, Built by MSYS2 project) 9.3.0
Optimization flags are now applied. However I had to remove "lame_init_old" from include/libmp3lame.sym in order to get it to compile:
--- include/libmp3lame.sym (Revision 6455)
+++ include/libmp3lame.sym (Arbeitskopie)
@@ -1,5 +1,4 @@
lame_init
-lame_init_old
lame_set_num_samples
lame_get_num_samples
lame_set_in_samplerate
The symbol is now removed in SVN. Clang is not as picky as gcc about it...
libmp3lame will need to have its SONAME bumped because of this. It's an ABI-incompatible change!
Theoretically yes. In short: this is something we should have sone 13 years ago, now it is too late.
Long:
Practically this is the case since 2007 (r5557). We've made several releases since then. So we didn't had this symbol in libmp3lame since 13 years. I have the impression that a "recent" gcc is more picky about this and errors out. A recent clang (v10) is not picky at all: without lame_init_old removed from the symbol file there is no trace at all about it in the exportet symbols:
And even if there would be a symbol, it matters only if it is used by a program which uses the library (and then we are back to r5557 13 years ago).
Last edit: Alexander Leidinger 2020-05-06
Fixes are in v3.101.