The Marshmallow compiler or build system enables SSE4.1 instructions set by default so that the binaries are unable to run on older CPUs without SSE4.1. Need to find a way to workaround it.
What happened in marshmallow is that x86_64 target have the following additional features enabled,
compared to lollipop, where SSE4 was not enabled by build system make rules.
These settings are in build project, file core/combo/arch/x86_64/x86_64.mk
x86 status
SSE4, SSE4_1 and SSE4_2 are not enable in x86 builds and retro-compatibility "would have been" ensured by:
ARCH_X86_HAVE_SSSE3 := false
ARCH_X86_HAVE_MOVBE := false
ARCH_X86_HAVE_POPCNT := false
# Some intrinsic functions used by libcxx only exist for prescott or newer CPUs.
arch_variant_cflags := \
-march=prescott \
In principle marshmallow x86 build should work on AMD Athlon X2 and later (AMD CPUs having SSE3).
Surprise!
NOW ATTENTION: "BUT and there is a BUT",
in practice marshmallow-x86 x86 build will not work on AMD even if SSSE3 is not set in build system combo rules.
In a similar way even if we wanted to disable SSE4 in x86_64.mk, it was experimented in lollipop that some prebuilt binaries (e.g. clang) have been built to generate targets having SSE4 (e.g. surfaceflinger), even if in clang building rules and C/CPPFLAGS there is no trace of -msse4.
Concepts currently under evalutation:
1) hack the prebuilt binaries
2) rebuild the toolchains, to have unpolluted compiler binaries
3) (gorgeus) integrate some kernel trap based mechanism to emulate SSE4* opcodes (like in xnu OPEMU, which uses SSEPlus, only problem is: it is implemented on top of xnu which is a BSD based kernel )
Priorities
In my understanding x86_64 now requires at least sandybridge, but Intel users having problem could use x86 builds
AMD users should also be addressed, because since lollipop-x86 AMD owners have no joy with android-x86.
My idea was to try with xnu opemu porting to linux kernel traps, because most of the opcode emulation is already there (SSE4, SSSE3 based on SSEPlus), but it is medium-high complexity to port the stack/state part.
For a kernel expert may be not so difficult.
One of the many doubts I have is if opcodes emulation needs to be supported just for user space, because opemu implements both kernel/user space traps+emulation.
Mauro
Last edit: Mauro Rossi 2016-01-06
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I did some research. the problem is when building with target x86(_64)-linux-android, clang/gcc enables sse4.2 and popcnt by default for x86_64 and ssse3 on x86. so adding -mno-sse4.2 -mno-popcnt -mno-ssse3 to arch_variant_cflags will restore the desired behavior. with this change, I have x86_64 booting on a qemu Conroe cpu.
now the problem becomes apps with x86 native libs, they by default will generate sse4/popcnt instructions. we may in the end need kernel trap for this case.
👍
1
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am assuming it is best to start on Mesa 11.2, since that appears to be recently integrated. However, it doesn't appear to be s trivial as just enabling some flags in external/mesa. I was going to try pursuing this for fun, but I wanted to leverage any work you might have done.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
As a follow-up for the "avoid sse4 in x86_64 builds" objective the full list of changes needed for marshamallow-x86 is reported here:
1) build project needs changes to avoid invoking SSE4* architecture features and Wu Zhen proposed addition to arch_variant_cflags in core/combo for x86_64.mk
3) external/clang requires sse4.2 and popcnt features forced disabled,
this was inspired by pstglia, I only changed it a little to enforce sse4.2 and popcnt not used, even if invoked.
4) prebuilts/clang/linux-x86/host/3.6 requires some hack to set -sse4.2 and -popcnt instead of +sse4.2, +popcnt (binary hex editor was used)
This hack was inspired by pstglia, I only changed it a little
NOTE: This solves a specific SIGILL error of Google Playstore com.android.vending,
I don't know if the ARM translator is still libhoudini prebuilt binary or if there is some sse4, popcnt optimiztion that could be disabled.
The next step would probably avoid this problem, but my assumption is that
_ _memcmp16 on top of ARM translator will not perform better than generic C implementation.
6) kernel trap based opcode emulation (NEXT EPISODE)
As already assessed, sse4.2 and popcnt may still appear ad libitum in the ARM translator and other x86_64 libraries.
Even if there will be a price in instruction cycles to pay, this should be minimized if all frequent cases (as an assumption the ones related to ARM translator) will be kept under control.
M.
Last edit: Mauro Rossi 2016-01-24
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Last weeks were used to study about the implementation of OPEMU and current patch possible improvements and to start playing/trying to build those in current 4.4 kernel branch.
OPEMU
X86 Emulation of SSS3, SSE4_1, SSE4_2, POPCNT with all instructions/registers cases requires a disassembler functionality in OPEMU libudis86 is used to find Opcode and instruction lenght,then the emulation is performed using SSEplus Reference emulation code (not SSE, SSE2, SSE3 intrisics optimizations are in place).
Latest implementation dropped the 32bit instructions cases.
but building errors are popping up like popcorn, due to OPEMU implementation on top of Darwin/Mach/BSD.
Current traps.c patch improvements option
If we could assume that Apps need to be compatible with Atom first generations,
then it would be an hazard for Apps developers to rely on SSE4_1, SSE4_2, POPCNT instructions.
This would lead to the possibility to apply all the changes mentioned in previous post,
to avoid SSE4_X/POPCNT instructions, complemented with handling of libhoudini.so new cases (which are a little bit different compared to the previous implementation done on top of kitkat-x86.
In order to update the list of offending instructions in latest libhoudini.so libraries for 32 bit, 64 bit build and 64bit kernel with 32 bit build, which have a different libhoudini.so library each,
a script tool was used: opcode.sh: https://gist.github.com/rindeal/72af275f05d44e10ebca
and the full list of exceptions was collected for the 3 flavors of libhoudini.so library.
At this point the task of expanding the switch...case hierarchical set of emulation paths has to be extended with all the SSE4_X/POPCNT instructions, this would require some time but is just a mechanical set of steps.
Most of all, I was interested to exploit the SSE2/SSE3 optimized implementation of SSEplus emulation, to improve current patch fuctionality and performance,
for example using include/map/SSEPlus_MAP_SSE3.h header that would provide automatically emulations functions definition according intrisics available (SSE3) without need of changing current emulated instructios function names,
but when I added the SSEplus headers in traps.c I faced a lot of building errors due to (std) libraries not found and also because of these types that require some sse2 flags to build kernel
m128 m128d m128i m64
I could not complete the build.
At this point I temporarily abandoned the optimized intructions way, to try to complete the set REFerence C based emulation of instructions when...I found about...
x86_emulator.c in XEN
Which is able to disassemble and emulate, but in my understanding, it does not emulate SSSE3, SSE4_1, SSE4_2, POPCNT and is again C code only (no SSE, SSE2, SSE3 optimizations)
But I'm not entirely sure.
I think I need to onboard a kernel(s) expert to accelerate and to guide me.
Mauro
Last edit: Mauro Rossi 2016-03-18
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm posting on this old ticket to discuss the possbility to use
Intel® Software Development Emulator as a mean to act as instruction emulator,
for SSE4.1, SSE4.2 and without impacts on the kernel.
I read about this in gaming blogs, because some Games binary were released with AVX optimizations and this caused problems to people with Core 2 Duo/Quad, who used sde as a workaround.
Provided that there is a performance penalty, I'd like to try to see if we could have x86_64 builds running on Core 2 Duo, Pentium 4 Sk775 and ... x86 on Pentium 4 Sk478
If it may work, even with some overhead, it would be interesting to experiment x86_64 on amd64 cpu,
but is there a way to wrap all executables with 'sde --' command or use it to initialize everything in Android code execution?
Mauro
Last edit: Mauro Rossi 2017-01-14
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Status: open Milestone: 1.0 Labels: sse4 Created: Tue Jan 05, 2016 04:55 PM UTC by Chih-Wei Huang Last Updated: Sat Jan 14, 2017 08:44 PM UTC Owner: Mauro Rossi
The Marshmallow compiler or build system enables SSE4.1 instructions set
by default so that the binaries are unable to run on older CPUs without
SSE4.1. Need to find a way to workaround it.
In my case I had a first hurdle in uploading the full set of sde files to installed system,
I used a adb-push script found on the internet, with some tweaks I came to have the full package
in system/vendor/bin
Then I read that running sde - or for x86_64 build sde64 for 64bit processes and sde for 32 bit processes(?) while 32 bit build should be easier to try.
Neverthe less I tried to launch on x86_64, using full path for executable:
/system/vendor/bin/sde64 -- start
but in the end I saw a "File not found error:" due to some libraries missing
In order to see the library dependencies I used on Ubuntu the following command:
ldd ./sde
but I don't think we currently have those libraries naming in Android installation,
we may need to create some hardlink to try sde (32bit),
while I'm completely clueless about how to use sde64 for 64 bit processes and sde for 32 bit ones
in x86_64 build.
Mauro
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Most probaly not, last attempts where with kernel trap to emulate SSE4
and to use sde with emulate all option sde -- [executable] but I think SElinux needs to be disabled to continue sde tests
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
No, those cpu do not support the minimum requirements for x86_64 Android ABI,
in 2019 you can send you comment to AOSP and see what you are able to achieve from them.
Cheers!
Last edit: Mauro Rossi 2019-04-17
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Well, but Wu Zhen up there wrote "I have x86_64 booting on a qemu Conroe cpu"
Can't find any working ISOs though.
I wanted to compile it myself but got an error on Marshmallow source:
"...OverlayTouchActivity.java:18: The import android.view.WindowManager.LayoutParams.PRIVATE_FLAG_HIDE_NON_SYSTEM_OVERLAY_WINDOWS cannot be resolved"
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sorry, I was away from sourceforge since a long time, I logged now and I just saw your two messages
Conroe has SSE3 so according to Android ABI only x86 iso will boot
SSSE3 and SSE4 emulation implemented by Wu Zhen with SSEplus does not allow to boot x86_64 on Conroe
marshamallow-x86 is too old and unmantained, try nougat-x86 which is still mantained.
If you have recent AMD processor x86_64 will work just fine AMD E1-6010 works for me
Mauro
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Wu Zhen: adding -mno-sse4.2 -mno-popcnt -mno-ssse3 to arch_variant_cflags will restore the desired behavior. with this change, I have x86_64 booting on a qemu Conroe cpu.
now the problem becomes apps with x86 native libs, they by default will generate sse4/popcnt instructions. we may in the end need kernel trap for this case.
This means that you may build android-x86 OS with binaries that do not use those instructions,
but then the applications will still use all of them and the emulation based on SSEplus is now incomplete, as far as I know
It's a lot more easy to pick x86 iso than to dig into low level instruction emulation code in kernel traps, it is so complex that only experts can deal with it, there are better ways to use the small time available I cannot affort to spend years in learning when in years you just have the next CPU family it does not make any sense
Mauro
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
x86_64 status
What happened in marshmallow is that x86_64 target have the following additional features enabled,
compared to lollipop, where SSE4 was not enabled by build system make rules.
These settings are in build project, file core/combo/arch/x86_64/x86_64.mk
x86 status
SSE4, SSE4_1 and SSE4_2 are not enable in x86 builds and retro-compatibility "would have been" ensured by:
In principle marshmallow x86 build should work on AMD Athlon X2 and later (AMD CPUs having SSE3).
Surprise!
NOW ATTENTION: "BUT and there is a BUT",
in practice marshmallow-x86 x86 build will not work on AMD even if SSSE3 is not set in build system combo rules.
In a similar way even if we wanted to disable SSE4 in x86_64.mk, it was experimented in lollipop that some prebuilt binaries (e.g. clang) have been built to generate targets having SSE4 (e.g. surfaceflinger), even if in clang building rules and C/CPPFLAGS there is no trace of -msse4.
Concepts currently under evalutation:
1) hack the prebuilt binaries
2) rebuild the toolchains, to have unpolluted compiler binaries
3) (gorgeus) integrate some kernel trap based mechanism to emulate SSE4* opcodes (like in xnu OPEMU, which uses SSEPlus, only problem is: it is implemented on top of xnu which is a BSD based kernel )
Priorities
In my understanding x86_64 now requires at least sandybridge, but Intel users having problem could use x86 builds
AMD users should also be addressed, because since lollipop-x86 AMD owners have no joy with android-x86.
My idea was to try with xnu opemu porting to linux kernel traps, because most of the opcode emulation is already there (SSE4, SSSE3 based on SSEPlus), but it is medium-high complexity to port the stack/state part.
For a kernel expert may be not so difficult.
One of the many doubts I have is if opcodes emulation needs to be supported just for user space, because opemu implements both kernel/user space traps+emulation.
Mauro
Last edit: Mauro Rossi 2016-01-06
I did some research. the problem is when building with target x86(_64)-linux-android, clang/gcc enables sse4.2 and popcnt by default for x86_64 and ssse3 on x86. so adding -mno-sse4.2 -mno-popcnt -mno-ssse3 to arch_variant_cflags will restore the desired behavior. with this change, I have x86_64 booting on a qemu Conroe cpu.
now the problem becomes apps with x86 native libs, they by default will generate sse4/popcnt instructions. we may in the end need kernel trap for this case.
Hi Wu Zhen, any update on https://sourceforge.net/p/android-x86/opengles/5/ (Enable software rendering by mesa llvmpipe)?
I am assuming it is best to start on Mesa 11.2, since that appears to be recently integrated. However, it doesn't appear to be s trivial as just enabling some flags in external/mesa. I was going to try pursuing this for fun, but I wanted to leverage any work you might have done.
Work In Progress as of January, 24th
As a follow-up for the "avoid sse4 in x86_64 builds" objective the full list of changes needed for marshamallow-x86 is reported here:
1) build project needs changes to avoid invoking SSE4* architecture features and Wu Zhen proposed addition to arch_variant_cflags in core/combo for x86_64.mk
https://github.com/maurossi/build/commits/x86_64_ssse3_builds
2) bionic needs ssse3-memcmp-slm.S to replace sse4-memcmp-slm.S
https://github.com/maurossi/bionic/commits/x86_64_ssse3-memcmp
3) external/clang requires sse4.2 and popcnt features forced disabled,
this was inspired by pstglia, I only changed it a little to enforce sse4.2 and popcnt not used, even if invoked.
https://github.com/maurossi/clang/commits/ssse3_x86_64_builds
4) prebuilts/clang/linux-x86/host/3.6 requires some hack to set -sse4.2 and -popcnt instead of +sse4.2, +popcnt (binary hex editor was used)
This hack was inspired by pstglia, I only changed it a little
https://github.com/maurossi/prebuilts_clang_linux-x86_host_3.6/commits/ssse3_x86_64_builds
5) art project used _ _memcmp16 ARM builtin also for x86 and x86_64 inducing the ARM translator to use ILLEGAL Instructions for Core2
https://github.com/maurossi/art/commits/x86_64_ssse3_builds
NOTE: This solves a specific SIGILL error of Google Playstore com.android.vending,
I don't know if the ARM translator is still libhoudini prebuilt binary or if there is some sse4, popcnt optimiztion that could be disabled.
The next step would probably avoid this problem, but my assumption is that
_ _memcmp16 on top of ARM translator will not perform better than generic C implementation.
6) kernel trap based opcode emulation (NEXT EPISODE)
As already assessed, sse4.2 and popcnt may still appear ad libitum in the ARM translator and other x86_64 libraries.
Even if there will be a price in instruction cycles to pay, this should be minimized if all frequent cases (as an assumption the ones related to ARM translator) will be kept under control.
M.
Last edit: Mauro Rossi 2016-01-24
Status update as of March 17th:
Last weeks were used to study about the implementation of OPEMU and current patch possible improvements and to start playing/trying to build those in current 4.4 kernel branch.
OPEMU
X86 Emulation of SSS3, SSE4_1, SSE4_2, POPCNT with all instructions/registers cases requires a disassembler functionality in OPEMU libudis86 is used to find Opcode and instruction lenght,then the emulation is performed using SSEplus Reference emulation code (not SSE, SSE2, SSE3 intrisics optimizations are in place).
Latest implementation dropped the 32bit instructions cases.
Current status is that OPEMU first building "bricks" are in place here:
https://github.com/maurossi/linux/tree/kernel-4.4_OPEMU_64bit
but building errors are popping up like popcorn, due to OPEMU implementation on top of Darwin/Mach/BSD.
Current traps.c patch improvements option
If we could assume that Apps need to be compatible with Atom first generations,
then it would be an hazard for Apps developers to rely on SSE4_1, SSE4_2, POPCNT instructions.
This would lead to the possibility to apply all the changes mentioned in previous post,
to avoid SSE4_X/POPCNT instructions, complemented with handling of libhoudini.so new cases (which are a little bit different compared to the previous implementation done on top of kitkat-x86.
In order to update the list of offending instructions in latest libhoudini.so libraries for 32 bit, 64 bit build and 64bit kernel with 32 bit build, which have a different libhoudini.so library each,
a script tool was used: opcode.sh: https://gist.github.com/rindeal/72af275f05d44e10ebca
and the full list of exceptions was collected for the 3 flavors of libhoudini.so library.
The full list of opcodes is here: http://www.mediafire.com/download/k2umj452x5k461j/sse4_opcodes.txt
At this point the task of expanding the switch...case hierarchical set of emulation paths has to be extended with all the SSE4_X/POPCNT instructions, this would require some time but is just a mechanical set of steps.
Most of all, I was interested to exploit the SSE2/SSE3 optimized implementation of SSEplus emulation, to improve current patch fuctionality and performance,
for example using include/map/SSEPlus_MAP_SSE3.h header that would provide automatically emulations functions definition according intrisics available (SSE3) without need of changing current emulated instructios function names,
but when I added the SSEplus headers in traps.c I faced a lot of building errors due to (std) libraries not found and also because of these types that require some sse2 flags to build kernel
m128
m128d
m128i
m64
I could not complete the build.
At this point I temporarily abandoned the optimized intructions way, to try to complete the set REFerence C based emulation of instructions when...I found about...
x86_emulator.c in XEN
Which is able to disassemble and emulate, but in my understanding, it does not emulate SSSE3, SSE4_1, SSE4_2, POPCNT and is again C code only (no SSE, SSE2, SSE3 optimizations)
But I'm not entirely sure.
I think I need to onboard a kernel(s) expert to accelerate and to guide me.
Mauro
Last edit: Mauro Rossi 2016-03-18
Hi Mauro,
About traps.c: Was you thinking in something like this? (see attachment)
I used gcc's built in popcnt, which is sw implemented when -msse4.2 is not used
Hi,
I'm posting on this old ticket to discuss the possbility to use
Intel® Software Development Emulator as a mean to act as instruction emulator,
for SSE4.1, SSE4.2 and without impacts on the kernel.
I read about this in gaming blogs, because some Games binary were released with AVX optimizations and this caused problems to people with Core 2 Duo/Quad, who used sde as a workaround.
"Emulate everything" mode should even emulate SSE3, SSSE3 and enhanced instructions sets after SSE4.x - https://software.intel.com/en-us/articles/intel-software-development-emulator#everything
Provided that there is a performance penalty, I'd like to try to see if we could have x86_64 builds running on Core 2 Duo, Pentium 4 Sk775 and ... x86 on Pentium 4 Sk478
If it may work, even with some overhead, it would be interesting to experiment x86_64 on amd64 cpu,
but is there a way to wrap all executables with 'sde --' command or use it to initialize everything in Android code execution?
Mauro
Last edit: Mauro Rossi 2017-01-14
Errata corrige: Pentium 4 Sk478 could run x86 builds
As a dump/silly attempt, I tried using sde by copying it's files to
/system/bin and calling "start" using it.
But didn't work...
x86:/ # sde -- start
In:
Thread: 0
Exception code: ACCESS_INVALID_ADDRESS
Exception Class: 2
Faulty AccessType : 0
Exception address: 0xb2ffab68
C: Tool (or Pin) caused signal 11 at PC 0xb2ffab68
Segmentation fault
2017-01-14 18:50 GMT-02:00 Mauro Rossi maurossi@users.sf.net:
Related
Misc: #3
Hi Paulo Sergio,
In my case I had a first hurdle in uploading the full set of sde files to installed system,
I used a adb-push script found on the internet, with some tweaks I came to have the full package
in system/vendor/bin
Then I read that running sde - or for x86_64 build sde64 for 64bit processes and sde for 32 bit processes(?) while 32 bit build should be easier to try.
Neverthe less I tried to launch on x86_64, using full path for executable:
/system/vendor/bin/sde64 -- start
but in the end I saw a "File not found error:" due to some libraries missing
In order to see the library dependencies I used on Ubuntu the following command:
ldd ./sde
but I don't think we currently have those libraries naming in Android installation,
we may need to create some hardlink to try sde (32bit),
while I'm completely clueless about how to use sde64 for 64 bit processes and sde for 32 bit ones
in x86_64 build.
Mauro
do you have for fixing nougat's SIGILL on webview.apk?
Most probaly not, last attempts where with kernel trap to emulate SSE4
and to use sde with emulate all option sde -- [executable] but I think SElinux needs to be disabled to continue sde tests
So, folks, we are in 2019 already :)
Is there any opportunity to run x86_64 on a something like Core2Duo or Quad CPUs?
Last edit: Mauro Rossi 2019-04-17
No, those cpu do not support the minimum requirements for x86_64 Android ABI,
in 2019 you can send you comment to AOSP and see what you are able to achieve from them.
Cheers!
Last edit: Mauro Rossi 2019-04-17
Well, but Wu Zhen up there wrote "I have x86_64 booting on a qemu Conroe cpu"
Can't find any working ISOs though.
I wanted to compile it myself but got an error on Marshmallow source:
"...OverlayTouchActivity.java:18: The import android.view.WindowManager.LayoutParams.PRIVATE_FLAG_HIDE_NON_SYSTEM_OVERLAY_WINDOWS cannot be resolved"
i have a quick question related to this discussion my latptop have SSE 4.1 ,will it work with android? which Android pefect for it?
Sorry, I was away from sourceforge since a long time, I logged now and I just saw your two messages
Conroe has SSE3 so according to Android ABI only x86 iso will boot
SSSE3 and SSE4 emulation implemented by Wu Zhen with SSEplus does not allow to boot x86_64 on Conroe
marshamallow-x86 is too old and unmantained, try nougat-x86 which is still mantained.
If you have recent AMD processor x86_64 will work just fine AMD E1-6010 works for me
Mauro
Wu Zhen: adding -mno-sse4.2 -mno-popcnt -mno-ssse3 to arch_variant_cflags will restore the desired behavior. with this change, I have x86_64 booting on a qemu Conroe cpu.
now the problem becomes apps with x86 native libs, they by default will generate sse4/popcnt instructions. we may in the end need kernel trap for this case.
This means that you may build android-x86 OS with binaries that do not use those instructions,
but then the applications will still use all of them and the emulation based on SSEplus is now incomplete, as far as I know
It's a lot more easy to pick x86 iso than to dig into low level instruction emulation code in kernel traps, it is so complex that only experts can deal with it, there are better ways to use the small time available I cannot affort to spend years in learning when in years you just have the next CPU family it does not make any sense
Mauro