Menu

#113 Lack of runtime cpu flags detection

1.3.1
open
Erik
None
5
2015-10-03
2014-12-07
prezi
No

In new flac 1.3.1 using sse/avx/avx2 to encode is great thing. But since now it is not more possible to prepare binary for x86 without sse/avx/avx2 and with such extentions. Compiliation is not the answer: e.g. I need it for portable embedded system for x86.

Discussion

  • Erik

    Erik - 2014-12-07
    • assigned_to: Erik
     
    • prezi

      prezi - 2014-12-07

      On 07/12/14 20:07, Erik wrote:

      • assigned_to: Erik
      • Comment:

      Have you /tested/ this on such an embedded systsm? If so, please
      provide more information about the CPU flags of that system.


      [bugs:#423] http://sourceforge.net/p/flac/bugs/423 Lack of runtime
      cpu flags detection

      Status: open
      Group: 1.3.1
      Created: Sun Dec 07, 2014 05:11 PM UTC by prezi
      Last Updated: Sun Dec 07, 2014 05:11 PM UTC
      Owner: Erik

      In new flac 1.3.1 using sse/avx/avx2 to encode is great thing. But
      since now it is not more possible to prepare binary for x86 without
      sse/avx/avx2 and with such extentions. Compiliation is not the answer:
      e.g. I need it for portable embedded system for x86.


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/flac/bugs/423/
      https://sourceforge.net/p/flac/bugs/423

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/
      https://sourceforge.net/auth/subscriptions

      This is Fujitsu Futro A250.
      System is buildroot on uclibs. Guys from buildr

      cat /proc/cpuinfo

      processor : 0
      vendor_id : AuthenticAMD
      cpu family : 5
      model : 10
      model name : Geode(TM) Integrated Processor by AMD PCS
      stepping : 2
      cpu MHz : 498.027
      cache size : 128 KB
      physical id : 0
      siblings : 1
      core id : 0
      cpu cores : 1
      apicid : 0
      initial apicid : 0
      fdiv_bug : no
      f00f_bug : no
      coma_bug : no
      fpu : yes
      fpu_exception : yes
      cpuid level : 1
      wp : yes
      flags : fpu de pse tsc msr cx8 sep pge cmov clflush mmx mmxext
      3dnowext 3dnow
      bogomips : 996.05
      clflush size : 32
      cache_alignment : 32
      address sizes : 32 bits physical, 32 bits virtual
      power management:

      This is summarize of configure script:

      Configuration summary :

       FLAC version : ........................ 1.3.1
      
       Host CPU : ............................ i586
       Host Vendor : ......................... buildroot
       Host OS : ............................. linux-uclibc
      
       Compiler is GCC : ..................... yes
       GCC version : ......................... 4.7.3
       Compiler is Clang : ................... no
       SSE optimizations : ................... yes
       Asm optimizations : ................... yes
       Ogg/FLAC support : .................... yes
      

      strace flac -8 01_On_ira.wav

      execve("/usr/bin/flac", ["flac", "-8", "01_On_ira.wav"], [/ 23 vars
      /]
      ) = 0
      mmap2(NULL, 4096, PROT_READ|PROT_WRITE,
      MAP_PRIVATE|MAP_ANONYMOUS|MAP_UNINITIALIZED, -1, 0) = 0xb776a000
      open("/home/prezi/buildroot-2014.02/output/build/flac-1.3.1/src/libFLAC/.libs/libFLAC.so.8",
      O_RDONLY) = -1 ENOENT (No such file or directory)
      open("/lib/libFLAC.so.8", O_RDONLY) = -1 ENOENT (No such file or
      directory)
      open("/lib/libFLAC.so.8", O_RDONLY) = -1 ENOENT (No such file or
      directory)
      open("/usr/lib/libFLAC.so.8", O_RDONLY) = 3
      fstat(3, {st_mode=S_IFREG|0755, st_size=182477, ...}) = 0
      mmap2(NULL, 4096, PROT_READ|PROT_WRITE,
      MAP_PRIVATE|MAP_ANONYMOUS|MAP_UNINITIALIZED, -1, 0) = 0xb7769000
      read(3,
      "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\260k\0\0004\0\0\0"...,
      4096) = 4096
      mmap2(NULL, 184320, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
      0xb773c000
      mmap2(0xb773c000, 179384, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3,
      0) = 0xb773c000
      mmap2(0xb7768000, 2253, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3,
      0x2c000) = 0xb7768000
      close(3) = 0
      munmap(0xb7769000, 4096) = 0
      open("/home/prezi/buildroot-2014.02/output/build/flac-1.3.1/src/libFLAC/.libs/libm.so.0",
      O_RDONLY) = -1 ENOENT (No such file or directory)
      open("/lib/libm.so.0", O_RDONLY) = 3
      fstat(3, {st_mode=S_IFREG|0755, st_size=49156, ...}) = 0
      mmap2(NULL, 4096, PROT_READ|PROT_WRITE,
      MAP_PRIVATE|MAP_ANONYMOUS|MAP_UNINITIALIZED, -1, 0) = 0xb7769000
      read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0
      \33\0\0004\0\0\0"..., 4096) = 4096
      mmap2(NULL, 57344, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb772e000
      mmap2(0xb772e000, 45740, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3,
      0) = 0xb772e000
      mmap2(0xb773a000, 4100, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3,
      0xb000) = 0xb773a000
      close(3) = 0
      munmap(0xb7769000, 4096) = 0
      open("/home/prezi/buildroot-2014.02/output/build/flac-1.3.1/src/libFLAC/.libs/libgcc_s.so.1",
      O_RDONLY) = -1 ENOENT (No such file or directory)
      open("/lib/libgcc_s.so.1", O_RDONLY) = 3
      fstat(3, {st_mode=S_IFREG|0644, st_size=86464, ...}) = 0
      mmap2(NULL, 4096, PROT_READ|PROT_WRITE,
      MAP_PRIVATE|MAP_ANONYMOUS|MAP_UNINITIALIZED, -1, 0) = 0xb7769000
      read(3,
      "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\200\36\0\0004\0\0\0"..., 4096)
      = 4096
      mmap2(NULL, 90112, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7718000
      mmap2(0xb7718000, 85408, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3,
      0) = 0xb7718000
      mmap2(0xb772d000, 448, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3,
      0x15000) = 0xb772d000
      close(3) = 0
      munmap(0xb7769000, 4096) = 0
      open("/home/prezi/buildroot-2014.02/output/build/flac-1.3.1/src/libFLAC/.libs/libc.so.0",
      O_RDONLY) = -1 ENOENT (No such file or directory)
      open("/lib/libc.so.0", O_RDONLY) = 3
      fstat(3, {st_mode=S_IFREG|0755, st_size=484111, ...}) = 0
      mmap2(NULL, 4096, PROT_READ|PROT_WRITE,
      MAP_PRIVATE|MAP_ANONYMOUS|MAP_UNINITIALIZED, -1, 0) = 0xb7769000
      read(3,
      "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\260\275\0\0004\0\0\0"...,
      4096) = 4096
      mmap2(NULL, 512000, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
      0xb769b000
      mmap2(0xb769b000, 481832, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3,
      0) = 0xb769b000
      mmap2(0xb7711000, 4879, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3,
      0x75000) = 0xb7711000
      mmap2(0xb7713000, 19240, PROT_READ|PROT_WRITE,
      MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7713000
      close(3) = 0
      munmap(0xb7769000, 4096) = 0
      open("/home/prezi/buildroot-2014.02/output/host/usr/i586-buildroot-linux-uclibc/sysroot/usr/lib/libogg.so.0",
      O_RDONLY) = -1 ENOENT (No such file or directory)
      open("/lib/libogg.so.0", O_RDONLY) = -1 ENOENT (No such file or
      directory)
      open("/lib/libogg.so.0", O_RDONLY) = -1 ENOENT (No such file or
      directory)
      open("/usr/lib/libogg.so.0", O_RDONLY) = 3
      fstat(3, {st_mode=S_IFREG|0755, st_size=18866, ...}) = 0
      mmap2(NULL, 4096, PROT_READ|PROT_WRITE,
      MAP_PRIVATE|MAP_ANONYMOUS|MAP_UNINITIALIZED, -1, 0) = 0xb7769000
      read(3,
      "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0@\23\0\0004\0\0\0"...,
      4096) = 4096
      mmap2(NULL, 24576, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7695000
      mmap2(0xb7695000, 18448, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3,
      0) = 0xb7695000
      mmap2(0xb769a000, 2482, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3,
      0x4000) = 0xb769a000
      close(3) = 0
      munmap(0xb7769000, 4096) = 0
      open("/home/prezi/buildroot-2014.02/output/host/usr/i586-buildroot-linux-uclibc/sysroot/usr/lib/libm.so.0",
      O_RDONLY) = -1 ENOENT (No such file or directory)
      open("/lib/libm.so.0", O_RDONLY) = 3
      fstat(3, {st_mode=S_IFREG|0755, st_size=49156, ...}) = 0
      close(3) = 0
      open("/home/prezi/buildroot-2014.02/output/host/usr/i586-buildroot-linux-uclibc/sysroot/usr/lib/libgcc_s.so.1",
      O_RDONLY) = -1 ENOENT (No such file or directory)
      open("/lib/libgcc_s.so.1", O_RDONLY) = 3
      fstat(3, {st_mode=S_IFREG|0644, st_size=86464, ...}) = 0
      close(3) = 0
      open("/home/prezi/buildroot-2014.02/output/host/usr/i586-buildroot-linux-uclibc/sysroot/usr/lib/libc.so.0",
      O_RDONLY) = -1 ENOENT (No such file or directory)
      open("/lib/libc.so.0", O_RDONLY) = 3
      fstat(3, {st_mode=S_IFREG|0755, st_size=484111, ...}) = 0
      close(3) = 0
      open("/lib/libc.so.0", O_RDONLY) = 3
      fstat(3, {st_mode=S_IFREG|0755, st_size=484111, ...}) = 0
      close(3) = 0
      open("/lib/libc.so.0", O_RDONLY) = 3
      fstat(3, {st_mode=S_IFREG|0755, st_size=484111, ...}) = 0
      close(3) = 0
      stat("/lib/ld-uClibc.so.0", {st_mode=S_IFREG|0755, st_size=24581, ...}) = 0
      open("/lib/libgcc_s.so.1", O_RDONLY) = 3
      fstat(3, {st_mode=S_IFREG|0644, st_size=86464, ...}) = 0
      close(3) = 0
      open("/lib/libc.so.0", O_RDONLY) = 3
      fstat(3, {st_mode=S_IFREG|0755, st_size=484111, ...}) = 0
      close(3) = 0
      mmap2(NULL, 4096, PROT_READ|PROT_WRITE,
      MAP_PRIVATE|MAP_ANONYMOUS|MAP_UNINITIALIZED, -1, 0) = 0xb7769000
      set_thread_area({entry_number:-1 -> 6, base_addr:0xb77696a0,
      limit:1048575, seg_32bit:1, contents:0, read_exec_only:0,
      limit_in_pages:1, seg_not_present:0, useable:1}) = 0
      mprotect(0xb773a000, 4096, PROT_READ) = 0
      mprotect(0xb7711000, 4096, PROT_READ) = 0
      mprotect(0xb7771000, 4096, PROT_READ) = 0
      ioctl(0, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS,
      {B38400 opost isig icanon echo ...}) = 0
      ioctl(1, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS,
      {B38400 opost isig icanon echo ...}) = 0
      time(NULL) = 1417982133
      brk(0) = 0x84ee000
      brk(0x84ef000) = 0x84ef000
      write(2, "\n", 1
      ) = 1
      write(2, "flac ", 5flac ) = 5
      write(2, "1.3.1", 51.3.1) = 5
      write(2, ", Copyright (C) 2000-2009 Josh "..., 72, Copyright (C)
      2000-2009 Josh Coalson, 2011-2014 Xiph.Org Foundation
      ) = 72
      write(2, "flac comes with ABSOLUTELY NO WA"..., 76flac comes with
      ABSOLUTELY NO WARRANTY. This is free software, and you are
      ) = 76
      write(2, "welcome to redistribute it under"..., 80welcome to
      redistribute it under certain conditions. Type `flac' for details.

      ) = 80
      stat64("01_On_ira.wav", {st_mode=S_IFREG|0644, st_size=31488620, ...}) = 0
      open("01_On_ira.wav", O_RDONLY|O_LARGEFILE) = 3
      ioctl(3, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS,
      0xbf914204) = -1 ENOTTY (Inappropriate ioctl for device)
      brk(0x84f0000) = 0x84f0000
      read(3, "RIFFdz\340\1WAVE", 12) = 12
      stat64("01_On_ira.flac", 0xbf9141c8) = -1 ENOENT (No such file or
      directory)
      stat64("01_On_ira.wav", {st_mode=S_IFREG|0644, st_size=31488620, ...}) = 0
      stat64("01_On_ira.flac", 0xbf914168) = -1 ENOENT (No such file or
      directory)
      brk(0x84f3000) = 0x84f3000
      read(3, "fmt ", 4) = 4
      read(3, "\20\0\0\0", 4) = 4
      read(3, "\1\0", 2) = 2
      read(3, "\2\0", 2) = 2
      read(3, "D\254\0\0", 4) = 4
      read(3, "\20\261\2\0", 4) = 4
      read(3, "\4\0", 2) = 2
      read(3, "\20\0", 2) = 2
      fstat64(3, {st_mode=S_IFREG|0644, st_size=31488620, ...}) = 0
      _llseek(3, 0, [36], SEEK_CUR) = 0
      read(3, "data", 4) = 4
      read(3, "@z\340\1", 4) = 4
      --- SIGILL {si_signo=SIGILL, si_code=ILL_ILLOPN, si_addr=0x8052ecd} ---
      +++ killed by SIGILL +++
      Illegal instruction

      I made small embedded (~40MB) system with mpd, libflac, ffmpeg, vorbis
      and so on to play music on computer like this
      and any other which is x86. Kernel is compiled generic, ffmpeg with cpu
      detection. Of cource system tailored for specific hardware
      would be faster, but it is handy because it is versatile. It would be
      great to enjoy faster encoding on new systems, meanwhile being able to use
      it on so ancient hardware.

      sincerely

       
  • Erik

    Erik - 2014-12-07

    Have you tested this on such an embedded system? If so, please provide more information about the CPU flags of that system.

     

    Last edit: Erik 2014-12-07
  • lvqcl

    lvqcl - 2014-12-07

    The problem is in this code in configure.ac:


    if test "x$asm_optimisation$sse_os" = "xyesyes" ; then
        XIPH_ADD_CFLAGS([-msse2])
        fi
    

    so by default flac requires SSE2 instructions.

    The solution is to compile FLAC with --disable-sse option. Or to edit configure.ac and remove the aforementioned text from it.

     
    • prezi

      prezi - 2014-12-07

      On 07/12/14 22:26, lvqcl wrote:

      The problem is in this code in configure.ac:


      if test "x$asm_optimisation$sse_os" = "xyesyes" ; then
      XIPH_ADD_CFLAGS([-msse2])
      fi


      so by default flac requires SSE2 instructions.

      The solution is to compile FLAC with --disable-sse option. Or to edit
      configure.ac and remove the aforementioned text from it.


      [bugs:#423] http://sourceforge.net/p/flac/bugs/423 Lack of runtime
      cpu flags detection

      Status: open
      Group: 1.3.1
      Created: Sun Dec 07, 2014 05:11 PM UTC by prezi
      Last Updated: Sun Dec 07, 2014 07:07 PM UTC
      Owner: Erik

      In new flac 1.3.1 using sse/avx/avx2 to encode is great thing. But
      since now it is not more possible to prepare binary for x86 without
      sse/avx/avx2 and with such extentions. Compiliation is not the answer:
      e.g. I need it for portable embedded system for x86.


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/flac/bugs/423/
      https://sourceforge.net/p/flac/bugs/423

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/
      https://sourceforge.net/auth/subscriptions

       
    • prezi

      prezi - 2014-12-07

      On 07/12/14 22:26, lvqcl wrote:

      The problem is in this code in configure.ac:


      if test "x$asm_optimisation$sse_os" = "xyesyes" ; then
      XIPH_ADD_CFLAGS([-msse2])
      fi


      so by default flac requires SSE2 instructions.

      The solution is to compile FLAC with --disable-sse option. Or to edit
      configure.ac and remove the aforementioned text from it.


      This is my problem:

      In new flac 1.3.1 using sse/avx/avx2 to encode is great thing. But since
      now it is not more possible to prepare binary for x86 without
      sse/avx/avx2 and with such extentions. Compiliation is not the answer:
      e.g. I need it for portable embedded system for x86.


      This what you propose is necessity of compiling two versions of
      flac/libflac on x86 to utilize the hardware capabilities, depending is
      there sse2/avx available or not. This is not solution. Or add to
      changelog -- in version 1.2.0 you added something like that:
      libraries:

      • Added runtime detection of SSE OS support for most operating systems.

      Now I propose to add

      libraries:

      • Removed runtime detection of SSE OS support for most operating systems.

      Ok, I can deal with it but this is serious change of policy. Can you
      tell me how people from debian/other binary based distributions will
      cope with this?
      Any way you should admit this in Changelog. After all how do you think
      what for in ffmpeg/libav there is "--enable-runtime-cpudetect" ?

      sincerely

       
    • prezi

      prezi - 2014-12-07

      On 07/12/14 22:26, lvqcl wrote:

      The problem is in this code in configure.ac:


      if test "x$asm_optimisation$sse_os" = "xyesyes" ; then
      XIPH_ADD_CFLAGS([-msse2])
      fi


      so by default flac requires SSE2 instructions.

      The solution is to compile FLAC with --disable-sse option. Or to edit
      configure.ac and remove the aforementioned text from it.


      Guys from debian assumed that sse is on every amd64 cpu and for amd64
      there is --enable-sse.
      For i386 there is --disable-sse. Cute.

      sincerely

       
  • Erik

    Erik - 2014-12-07

    I would be very much in favour of making CPU capabilities fully run time detectable. I'm even willing to take a look at this myself if noone else does.

     
    • lvqcl

      lvqcl - 2014-12-07

      FLAC doesn't really require -msse2 option. And its en/decoding speed isn't much slower without this option (at least when FLAC was compiled with GCC 4.9+).

      Several days ago I tested FLAC encoding speed on Intel Nehalem for -8 preset. Encoding time for an album grabbed from a CD:

      64-bit FLAC: 30.2 sec
      32-bit FLAC: 33.8 sec
      32-bit FLAC and -msse2 is removed: 35.8 sec

      So without -msse2 option 32-bit FLAC becomes ~6% slower.

      P.S. About debian: they use the following:

      # Enable SSE only on amd64
      ifeq ($(DEB_HOST_ARCH_CPU),amd64)
          OPTFLAGS = --disable-altivec --enable-sse
      # Enable Altivec only on ppc64
      else ifeq ($(DEB_HOST_ARCH_CPU),ppc64)
          OPTFLAGS = --enable-altivec --disable-sse
      else
          OPTFLAGS = --disable-asm-optimizations --disable-sse --disable-altivec
      endif  
      

      so they have 3 architectures for FLAC: amd64, ppc64 and "other". And IA32 (aka i386) is among the others. It seems that debian devs don't care much about IA32.

       

      Last edit: lvqcl 2014-12-07
      • prezi

        prezi - 2014-12-08

        On 07/12/14 23:56, lvqcl wrote:

        FLAC doesn't really require -msse2 option. And its en/decoding speed
        isn't much slower without this option (at least when FLAC was compiled
        with GCC 4.9+).

        Several days ago I tested FLAC encoding speed on Intel Nehalem for -8
        preset. Encoding time for an album grabbed from a CD:

        64-bit FLAC: 30.2 sec
        32-bit FLAC: 33.8 sec
        32-bit FLAC and -msse2 is removed: 35.8 sec

        So without -msse2 option 32-bit FLAC becomes ~6% slower.


        Even less. atom ion 230 -- with sse: 14.16sec, without sse: 14.74sec,
        what is in this case 4%.
        That any way much better than 1.3.0: in this case it was: 18.67sec.
        You convinced me.

        sincerily

         
    • prezi

      prezi - 2014-12-08

      On 07/12/14 23:38, Erik wrote:

      I would be very much in favour of making CPU capabilities fully run
      time detectable. I'm even willing to take a look at this myself if
      noone else does.


      Ok, I'm very happy being whist-blower :D
      You all do great job, indeed.

      sincerely

       
  • Erik

    Erik - 2015-10-03

    Ticket moved from /p/flac/bugs/423/

     

Log in to post a comment.