The enclosed patch change libjpeg-turbo to call libjpeg_general_init when the library is loaded. This function in turn calls architecture specific routines to query for processor capabilities etc as would be appropriate.
So on arm for example this means the auxv is examined for neon.
In all cases the init_simd code is now just called once when the as part of the libjpeg_general_init -> libjpeg_arch_specific_init path. The noted race condition involving init_simd is eliminated with this patch.
Further copious calls to init_simd found in simd/jsimd_arm.c and simd/jsimd_i386.c are removed.
make; make test passes on linux arm and linux x86_64. While code exists to support OSX, Windows and Linux i386 they have not been specifically tested.
Makefile.am | 2
jlibinit.c | 149 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
jsimd.h | 2
simd/jsimd_arm.c | 148 ++++++-----------------------------------------------
simd/jsimd_i386.c | 48 -----------------
5 files changed, 170 insertions(+), 179 deletions(-)
What are the benefits?
It is not like /proc/cpuinfo parsing is used now just because I have not considered the other options. Please take a look at
https://sourceforge.net/tracker/?func=detail&aid=3291291&group_id=303195&atid=1278160
which contains a reference to
http://lists.arm.linux.org.uk/lurker/message/20110426.215344.2ccef634.en.html
FWIW, internally Android also does CPU features detection by parsing /proc/cpuinfo in 'android_getCpuFeatures()' function (android.git.kernel.org is down at the moment, so the link is to NVIDIA repository):
http://nv-tegra.nvidia.com/gitweb/?p=android/platform/ndk.git;a=blob;f=sources/cpufeatures/cpu-features.c;h=c46b884f8eeb9b1b37dbf27c20dde1be5372906c;hb=froyo
The cpu features detection story on ARM has been really ugly, with no hope on horizon. Actually I'm myself in favor of
http://lists.linaro.org/pipermail/linaro-dev/2011-September/007282.html
and did suggest the same to ARM people a few years ago (with no success). Being able to identify CPU features using only CPU instructions without touching any OS specific things would have solved a lot of problems.
I have several problems with this proposed modification:
-- It introduces significant ELF dependencies. The current code at least theoretically builds and runs on non-ELF x86 systems.
-- It doesn't work on OS X. (see above.) OS X is a non-ELF system.
-- It is not likely to work with non-GCC compilers.
-- It could create problems when statically linking libjpeg-turbo into a dynamic library,
-- Does this work if libjpeg-turbo is statically linked into an application? If so, I don't see how.
-- It fixes a problem which isn't really a problem. I am open to being convinced otherwise, but unless the init_simd() function has an identifiable problem with idempotency, I don't see the danger in accidentally calling it 2 or 3 times (which would only happen in multi-threaded situations.) I do, however, see a lot of potential dangers with relying on the dynamic loader to initialize things for us.
Closing as WNF for now. Please feel free to add additional comments.