Menu

#3 Remove or workaround SSE4.1 requirement

1.0
open
sse4 (1)
2020-06-12
2016-01-05
No

The Marshmallow compiler or build system enables SSE4.1 instructions set by default so that the binaries are unable to run on older CPUs without SSE4.1. Need to find a way to workaround it.

Related

Misc: #3

Discussion

  • Mauro Rossi

    Mauro Rossi - 2016-01-06

    x86_64 status

    What happened in marshmallow is that x86_64 target have the following additional features enabled,
    compared to lollipop, where SSE4 was not enabled by build system make rules.

    ARCH_X86_HAVE_SSE4 := true
    ARCH_X86_HAVE_SSE4_1 := true
    ARCH_X86_HAVE_SSE4_2 := true
    

    These settings are in build project, file core/combo/arch/x86_64/x86_64.mk

    x86 status

    SSE4, SSE4_1 and SSE4_2 are not enable in x86 builds and retro-compatibility "would have been" ensured by:

    ARCH_X86_HAVE_SSSE3 := false
    ARCH_X86_HAVE_MOVBE := false
    ARCH_X86_HAVE_POPCNT := false
    
    # Some intrinsic functions used by libcxx only exist for prescott or newer CPUs.
    arch_variant_cflags := \
        -march=prescott \
    

    In principle marshmallow x86 build should work on AMD Athlon X2 and later (AMD CPUs having SSE3).

    Surprise!

    NOW ATTENTION: "BUT and there is a BUT",
    in practice marshmallow-x86 x86 build will not work on AMD even if SSSE3 is not set in build system combo rules.

    In a similar way even if we wanted to disable SSE4 in x86_64.mk, it was experimented in lollipop that some prebuilt binaries (e.g. clang) have been built to generate targets having SSE4 (e.g. surfaceflinger), even if in clang building rules and C/CPPFLAGS there is no trace of -msse4.

    Concepts currently under evalutation:

    1) hack the prebuilt binaries

    2) rebuild the toolchains, to have unpolluted compiler binaries

    3) (gorgeus) integrate some kernel trap based mechanism to emulate SSE4* opcodes (like in xnu OPEMU, which uses SSEPlus, only problem is: it is implemented on top of xnu which is a BSD based kernel )

    Priorities

    In my understanding x86_64 now requires at least sandybridge, but Intel users having problem could use x86 builds

    AMD users should also be addressed, because since lollipop-x86 AMD owners have no joy with android-x86.

    My idea was to try with xnu opemu porting to linux kernel traps, because most of the opcode emulation is already there (SSE4, SSSE3 based on SSEPlus), but it is medium-high complexity to port the stack/state part.

    For a kernel expert may be not so difficult.

    One of the many doubts I have is if opcodes emulation needs to be supported just for user space, because opemu implements both kernel/user space traps+emulation.

    Mauro

     

    Last edit: Mauro Rossi 2016-01-06
  • Wu Zhen

    Wu Zhen - 2016-01-07

    I did some research. the problem is when building with target x86(_64)-linux-android, clang/gcc enables sse4.2 and popcnt by default for x86_64 and ssse3 on x86. so adding -mno-sse4.2 -mno-popcnt -mno-ssse3 to arch_variant_cflags will restore the desired behavior. with this change, I have x86_64 booting on a qemu Conroe cpu.
    now the problem becomes apps with x86 native libs, they by default will generate sse4/popcnt instructions. we may in the end need kernel trap for this case.

     
    👍
    1
    • Xemi

      Xemi - 2016-03-25

      Hi Wu Zhen, any update on https://sourceforge.net/p/android-x86/opengles/5/ (Enable software rendering by mesa llvmpipe)?

      I am assuming it is best to start on Mesa 11.2, since that appears to be recently integrated. However, it doesn't appear to be s trivial as just enabling some flags in external/mesa. I was going to try pursuing this for fun, but I wanted to leverage any work you might have done.

       
    • boulder

      boulder - 2019-03-27
      Post awaiting moderation.
  • Mauro Rossi

    Mauro Rossi - 2016-01-24

    Work In Progress as of January, 24th

    As a follow-up for the "avoid sse4 in x86_64 builds" objective the full list of changes needed for marshamallow-x86 is reported here:

    1) build project needs changes to avoid invoking SSE4* architecture features and Wu Zhen proposed addition to arch_variant_cflags in core/combo for x86_64.mk

    https://github.com/maurossi/build/commits/x86_64_ssse3_builds

    2) bionic needs ssse3-memcmp-slm.S to replace sse4-memcmp-slm.S

    https://github.com/maurossi/bionic/commits/x86_64_ssse3-memcmp

    3) external/clang requires sse4.2 and popcnt features forced disabled,
    this was inspired by pstglia, I only changed it a little to enforce sse4.2 and popcnt not used, even if invoked.

    https://github.com/maurossi/clang/commits/ssse3_x86_64_builds

    4) prebuilts/clang/linux-x86/host/3.6 requires some hack to set -sse4.2 and -popcnt instead of +sse4.2, +popcnt (binary hex editor was used)
    This hack was inspired by pstglia, I only changed it a little

    https://github.com/maurossi/prebuilts_clang_linux-x86_host_3.6/commits/ssse3_x86_64_builds

    5) art project used _ _memcmp16 ARM builtin also for x86 and x86_64 inducing the ARM translator to use ILLEGAL Instructions for Core2

    https://github.com/maurossi/art/commits/x86_64_ssse3_builds

    NOTE: This solves a specific SIGILL error of Google Playstore com.android.vending,
    I don't know if the ARM translator is still libhoudini prebuilt binary or if there is some sse4, popcnt optimiztion that could be disabled.
    The next step would probably avoid this problem, but my assumption is that
    _ _memcmp16 on top of ARM translator will not perform better than generic C implementation.

    6) kernel trap based opcode emulation (NEXT EPISODE)

    As already assessed, sse4.2 and popcnt may still appear ad libitum in the ARM translator and other x86_64 libraries.
    Even if there will be a price in instruction cycles to pay, this should be minimized if all frequent cases (as an assumption the ones related to ARM translator) will be kept under control.

    M.

     

    Last edit: Mauro Rossi 2016-01-24
  • Mauro Rossi

    Mauro Rossi - 2016-03-17

    Status update as of March 17th:

    Last weeks were used to study about the implementation of OPEMU and current patch possible improvements and to start playing/trying to build those in current 4.4 kernel branch.

    OPEMU

    X86 Emulation of SSS3, SSE4_1, SSE4_2, POPCNT with all instructions/registers cases requires a disassembler functionality in OPEMU libudis86 is used to find Opcode and instruction lenght,then the emulation is performed using SSEplus Reference emulation code (not SSE, SSE2, SSE3 intrisics optimizations are in place).

    Latest implementation dropped the 32bit instructions cases.

    Current status is that OPEMU first building "bricks" are in place here:
    https://github.com/maurossi/linux/tree/kernel-4.4_OPEMU_64bit

    but building errors are popping up like popcorn, due to OPEMU implementation on top of Darwin/Mach/BSD.

    Current traps.c patch improvements option

    If we could assume that Apps need to be compatible with Atom first generations,
    then it would be an hazard for Apps developers to rely on SSE4_1, SSE4_2, POPCNT instructions.
    This would lead to the possibility to apply all the changes mentioned in previous post,
    to avoid SSE4_X/POPCNT instructions, complemented with handling of libhoudini.so new cases (which are a little bit different compared to the previous implementation done on top of kitkat-x86.

    In order to update the list of offending instructions in latest libhoudini.so libraries for 32 bit, 64 bit build and 64bit kernel with 32 bit build, which have a different libhoudini.so library each,
    a script tool was used: opcode.sh: https://gist.github.com/rindeal/72af275f05d44e10ebca
    and the full list of exceptions was collected for the 3 flavors of libhoudini.so library.

    The full list of opcodes is here: http://www.mediafire.com/download/k2umj452x5k461j/sse4_opcodes.txt

    At this point the task of expanding the switch...case hierarchical set of emulation paths has to be extended with all the SSE4_X/POPCNT instructions, this would require some time but is just a mechanical set of steps.

    Most of all, I was interested to exploit the SSE2/SSE3 optimized implementation of SSEplus emulation, to improve current patch fuctionality and performance,
    for example using include/map/SSEPlus_MAP_SSE3.h header that would provide automatically emulations functions definition according intrisics available (SSE3) without need of changing current emulated instructios function names,

    but when I added the SSEplus headers in traps.c I faced a lot of building errors due to (std) libraries not found and also because of these types that require some sse2 flags to build kernel

    m128
    m128d
    m128i
    m64

    I could not complete the build.

    At this point I temporarily abandoned the optimized intructions way, to try to complete the set REFerence C based emulation of instructions when...I found about...

    x86_emulator.c in XEN

    Which is able to disassemble and emulate, but in my understanding, it does not emulate SSSE3, SSE4_1, SSE4_2, POPCNT and is again C code only (no SSE, SSE2, SSE3 optimizations)
    But I'm not entirely sure.

    I think I need to onboard a kernel(s) expert to accelerate and to guide me.
    Mauro

     

    Last edit: Mauro Rossi 2016-03-18
  • Paulo Travaglia

    Paulo Travaglia - 2016-03-24

    Hi Mauro,

    About traps.c: Was you thinking in something like this? (see attachment)

    I used gcc's built in popcnt, which is sw implemented when -msse4.2 is not used

     
  • Mauro Rossi

    Mauro Rossi - 2017-01-14

    Hi,

    I'm posting on this old ticket to discuss the possbility to use
    Intel® Software Development Emulator as a mean to act as instruction emulator,
    for SSE4.1, SSE4.2 and without impacts on the kernel.

    I read about this in gaming blogs, because some Games binary were released with AVX optimizations and this caused problems to people with Core 2 Duo/Quad, who used sde as a workaround.

    "Emulate everything" mode should even emulate SSE3, SSSE3 and enhanced instructions sets after SSE4.x - https://software.intel.com/en-us/articles/intel-software-development-emulator#everything

    Provided that there is a performance penalty, I'd like to try to see if we could have x86_64 builds running on Core 2 Duo, Pentium 4 Sk775 and ... x86 on Pentium 4 Sk478

    If it may work, even with some overhead, it would be interesting to experiment x86_64 on amd64 cpu,
    but is there a way to wrap all executables with 'sde --' command or use it to initialize everything in Android code execution?

    Mauro

     

    Last edit: Mauro Rossi 2017-01-14
  • Mauro Rossi

    Mauro Rossi - 2017-01-14

    Errata corrige: Pentium 4 Sk478 could run x86 builds

     
    • Paulo Travaglia

      Paulo Travaglia - 2017-01-15

      As a dump/silly attempt, I tried using sde by copying it's files to
      /system/bin and calling "start" using it.
      But didn't work...

      x86:/ # sde -- start
      In:
      Thread: 0
      Exception code: ACCESS_INVALID_ADDRESS
      Exception Class: 2
      Faulty AccessType : 0
      Exception address: 0xb2ffab68
      C: Tool (or Pin) caused signal 11 at PC 0xb2ffab68
      Segmentation fault

      2017-01-14 18:50 GMT-02:00 Mauro Rossi maurossi@users.sf.net:

      Errata corrige: Pentium 4 Sk478 could run x86 builds

      Status: open
      Milestone: 1.0
      Labels: sse4
      Created: Tue Jan 05, 2016 04:55 PM UTC by Chih-Wei Huang
      Last Updated: Sat Jan 14, 2017 08:44 PM UTC
      Owner: Mauro Rossi

      The Marshmallow compiler or build system enables SSE4.1 instructions set
      by default so that the binaries are unable to run on older CPUs without
      SSE4.1. Need to find a way to workaround it.


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/android-x86/misc/3/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       

      Related

      Misc: #3

  • Mauro Rossi

    Mauro Rossi - 2017-01-19

    Hi Paulo Sergio,

    In my case I had a first hurdle in uploading the full set of sde files to installed system,
    I used a adb-push script found on the internet, with some tweaks I came to have the full package
    in system/vendor/bin

    Then I read that running sde - or for x86_64 build sde64 for 64bit processes and sde for 32 bit processes(?) while 32 bit build should be easier to try.

    Neverthe less I tried to launch on x86_64, using full path for executable:

    /system/vendor/bin/sde64 -- start

    but in the end I saw a "File not found error:" due to some libraries missing

    In order to see the library dependencies I used on Ubuntu the following command:

    ldd ./sde

    but I don't think we currently have those libraries naming in Android installation,
    we may need to create some hardlink to try sde (32bit),

    while I'm completely clueless about how to use sde64 for 64 bit processes and sde for 32 bit ones
    in x86_64 build.

    Mauro

     
  • Xuefer

    Xuefer - 2017-01-21

    do you have for fixing nougat's SIGILL on webview.apk?

     
    • Mauro Rossi

      Mauro Rossi - 2017-05-20

      Most probaly not, last attempts where with kernel trap to emulate SSE4
      and to use sde with emulate all option sde -- [executable] but I think SElinux needs to be disabled to continue sde tests

       
  • boulder

    boulder - 2019-03-27

    So, folks, we are in 2019 already :)
    Is there any opportunity to run x86_64 on a something like Core2Duo or Quad CPUs?

     

    Last edit: Mauro Rossi 2019-04-17
  • Mauro Rossi

    Mauro Rossi - 2019-04-17

    No, those cpu do not support the minimum requirements for x86_64 Android ABI,
    in 2019 you can send you comment to AOSP and see what you are able to achieve from them.
    Cheers!

     

    Last edit: Mauro Rossi 2019-04-17
  • boulder

    boulder - 2019-04-17

    Well, but Wu Zhen up there wrote "I have x86_64 booting on a qemu Conroe cpu"
    Can't find any working ISOs though.
    I wanted to compile it myself but got an error on Marshmallow source:
    "...OverlayTouchActivity.java:18: The import android.view.WindowManager.LayoutParams.PRIVATE_FLAG_HIDE_NON_SYSTEM_OVERLAY_WINDOWS cannot be resolved"

     
  • Akram Mokhtar

    Akram Mokhtar - 2020-01-26

    i have a quick question related to this discussion my latptop have SSE 4.1 ,will it work with android? which Android pefect for it?

     
  • Mauro Rossi

    Mauro Rossi - 2020-06-12

    Sorry, I was away from sourceforge since a long time, I logged now and I just saw your two messages

    Conroe has SSE3 so according to Android ABI only x86 iso will boot
    SSSE3 and SSE4 emulation implemented by Wu Zhen with SSEplus does not allow to boot x86_64 on Conroe

    marshamallow-x86 is too old and unmantained, try nougat-x86 which is still mantained.

    If you have recent AMD processor x86_64 will work just fine AMD E1-6010 works for me
    Mauro

     
  • Mauro Rossi

    Mauro Rossi - 2020-06-12

    Wu Zhen: adding -mno-sse4.2 -mno-popcnt -mno-ssse3 to arch_variant_cflags will restore the desired behavior. with this change, I have x86_64 booting on a qemu Conroe cpu.
    now the problem becomes apps with x86 native libs, they by default will generate sse4/popcnt instructions. we may in the end need kernel trap for this case.

    This means that you may build android-x86 OS with binaries that do not use those instructions,
    but then the applications will still use all of them and the emulation based on SSEplus is now incomplete, as far as I know

    It's a lot more easy to pick x86 iso than to dig into low level instruction emulation code in kernel traps, it is so complex that only experts can deal with it, there are better ways to use the small time available I cannot affort to spend years in learning when in years you just have the next CPU family it does not make any sense

    Mauro

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.