Menu

#264 CPU benchmarks

v1.0_(example)
open
nobody
None
5
2014-05-03
2014-04-25
Andrew Bird
No

Hi there,
I compiled fbench (http://www.fourmilab.ch/fbench/fbench.html) for Linux(with GCC) and DOS (with Turbo C++) and compared the runtime results in various Dosemu modes. The reason for doing this is that at present I run Dosemu on 64bit hardware and use the 32 bit kernel so I can use the cpuemu off mode. At some point I hope that CPU emulation will reach an acceptable level of performance such that having the machine in 64 bit mode will become possible.

It seems that emulation is still a couple of orders of magnitude worse than the cpuemu off.

On my AMD 5600+ (2.8GHz) (32bit kernel) (lower time is better):

Linux

Native: 0.0052

DOS (Git branch devel)

cpuemu off : 0.0065
cpuemu vm86: 0.7100
cpuemu full: 0.7000
cpu_emu vm86sim: 0.4000
cpu_emu fullsim: 0.4000

DOS (Git branch simx86-no-mprotect with self merged devel)

cpuemu off : 0.0060
cpuemu vm86: 0.6700
cpuemu full: 0.6700
cpu_emu vm86sim: 0.4000
cpu_emu fullsim: 0.3900

Hog threshhold made no difference.

1 Attachments

Related

Support Requests: #264

Discussion

  • Andrew Bird

    Andrew Bird - 2014-04-25

    Attached DOS executable, run like 'fbench 100000', check out the 'scaled value' printed.

     
  • Bart Oldeman

    Bart Oldeman - 2014-04-25

    Interesting: the simulator is faster than the JIT here. As this is an FPU benchmark there may be some expensively-emulated FPU instructions in tight loops. I'll have a look.

    For CPU benchmarks you could have a look also at emulators.com, e.g. here:
    http://www.emulators.com/docs/nx11_flags.htm

     
  • Andrew Bird

    Andrew Bird - 2014-04-25

    Hi Bart,
       Thanks for the link, I found its content very interesting. I'm wondering if some sort of automated test benchmark should be added to Dosemu. Do you think there's any value in me adding one? Would the tests have to be compiled at build time, or is it acceptable to have DOS executables shipped as is?

    Thanks ,

    Andrew

    Sent from Samsung tablet

    -------- Original message --------
    From Bart Oldeman bartoldeman@users.sf.net
    Date: 25/04/2014 20:37 (GMT+00:00)
    To "[dosemu:support-requests]" 264@support-requests.dosemu.p.re.sf.net
    Subject [dosemu:support-requests] #264 CPU benchmarks

    Interesting: the simulator is faster than the JIT here. As this is an FPU benchmark there may be some expensively-emulated FPU instructions in tight loops. I'll have a look.

    For CPU benchmarks you could have a look also at emulators.com, e.g. here:
    http://www.emulators.com/docs/nx11_flags.htm

    [support-requests:#264] CPU benchmarks

    Status: open
    Group: v1.0_(example)
    Created: Fri Apr 25, 2014 01:16 PM UTC by Andrew Bird
    Last Updated: Fri Apr 25, 2014 01:18 PM UTC
    Owner: nobody

    Hi there,
    I compiled fbench (http://www.fourmilab.ch/fbench/fbench.html) for Linux(with GCC) and DOS (with Turbo C++) and compared the runtime results in various Dosemu modes. The reason for doing this is that at present I run Dosemu on 64bit hardware and use the 32 bit kernel so I can use the cpuemu off mode. At some point I hope that CPU emulation will reach an acceptable level of performance such that having the machine in 64 bit mode will become possible.

    It seems that emulation is still a couple of orders of magnitude worse than the cpuemu off.

    On my AMD 5600+ (2.8GHz) (32bit kernel) (lower time is better):

    Linux

    Native: 0.0052

    DOS (Git branch devel)

    cpuemu off : 0.0065
    cpuemu vm86: 0.7100
    cpuemu full: 0.7000
    cpu_emu vm86sim: 0.4000
    cpu_emu fullsim: 0.4000

    DOS (Git branch simx86-no-mprotect with self merged devel)

    cpuemu off : 0.0060
    cpuemu vm86: 0.6700
    cpuemu full: 0.6700
    cpu_emu vm86sim: 0.4000
    cpu_emu fullsim: 0.3900

    Hog threshhold made no difference.

    Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/dosemu/support-requests/264/

    To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

     

    Related

    Support Requests: #264

  • Bart Oldeman

    Bart Oldeman - 2014-04-27

    There is a problem with the fwait instruction -- it was incorrectly marked as being interpreted, but in effect forced the JIT to recompile over and over again. (Note: fwait should cause an FPU exception if appropriate but FPU exceptions are not correctly implemented by cpu_emu at the moment).

    Could you try the attached patch?

    As for test cases, there is one adapted from QEMU in src/tests, which needs to be compiled by DJGPP. Of course any more tests are always welcome as long as the license is ok. It would be best not to have the binaries shipped with dosemu source code.

     
    • Andrew Bird

      Andrew Bird - 2014-04-27

      Hi Bart,
      Here's a couple of runs of my, as yet unfinished, (ab)use of the python
      unittest suite. A value of 1 == native, anything else is the factor with
      respect to it i.e. 2x is half speed.

      devel branch (0ff16be5bc75f8e07ed20d54a4b8e782258a43d1) without patch

      bash-4.2$ python test/test_bench.py
      TestFbench ... FAIL

      ======================================================================
      FAIL: TestFbench


      CPUEMU
      off : OK target = 1.5x, result = 1.2x
      vm86 : FAIL target = 75.0x, result = 132.0x
      full : FAIL target = 75.0x, result = 136.0x
      vm86sim : OK target = 150.0x, result = 80.0x
      fullsim : OK target = 150.0x, result = 80.0x


      Ran 1 test in 119.546s

      FAILED (failures=1)

      devel branch (0ff16be5bc75f8e07ed20d54a4b8e782258a43d1) with patch

      bash-4.2$ python test/test_bench.py
      TestFbench ... ok

      ======================================================================
      PASS: TestFbench


      CPUEMU
      off : OK target = 1.5x, result = 1.2x
      vm86 : OK target = 75.0x, result = 24.0x
      full : OK target = 75.0x, result = 24.0x
      vm86sim : OK target = 150.0x, result = 80.0x
      fullsim : OK target = 150.0x, result = 80.0x


      Ran 1 test in 64.919s

      OK

      So it looks good to me, timing is now only 24x slower than native, whereas it
      was 132x/136x before. Well done!

      Regarding the benchmarking:
      I figured including binaries would be a problem. Is there such a thing as
      a C cross compiler that runs on Linux and produces DOS binaries?
      Does the binary rule work also for FreeDOS objects like command.com and
      kernel.sys? Currently I'm running dosemu from the development directory and
      not installing. I create a tmp-c directory and populate it with clean
      autoexec.bat, config.sys, the FreeDOS objects, and the dosemu derived tools for
      each test.

      Thanks,

      Andrew

      On Sunday 27 April 2014 18:03:49 Bart Oldeman wrote:

      There is a problem with the fwait instruction -- it was incorrectly marked
      as being interpreted, but in effect forced the JIT to recompile over and
      over again. (Note: fwait should cause an FPU exception if appropriate but
      FPU exceptions are not correctly implemented by cpu_emu at the moment).

      Could you try the attached patch?

      As for test cases, there is one adapted from QEMU in src/tests, which needs
      to be compiled by DJGPP. Of course any more tests are always welcome as
      long as the license is ok. It would be best not to have the binaries
      shipped with dosemu source code.

      Attachment: tables.c.diff (396 Bytes; text/x-patch)


      [support-requests:#264] CPU benchmarks

      Status: open
      Group: v1.0_(example)
      Created: Fri Apr 25, 2014 01:16 PM UTC by Andrew Bird
      Last Updated: Fri Apr 25, 2014 07:37 PM UTC
      Owner: nobody

      Hi there,
      I compiled fbench (http://www.fourmilab.ch/fbench/fbench.html) for
      Linux(with GCC) and DOS (with Turbo C++) and compared the runtime results
      in various Dosemu modes. The reason for doing this is that at present I run
      Dosemu on 64bit hardware and use the 32 bit kernel so I can use the cpuemu
      off mode. At some point I hope that CPU emulation will reach an acceptable
      level of performance such that having the machine in 64 bit mode will
      become possible.

      It seems that emulation is still a couple of orders of magnitude worse
      than the cpuemu off.

      On my AMD 5600+ (2.8GHz) (32bit kernel) (lower time is better):

      Linux

      Native: 0.0052

      DOS (Git branch devel)

      cpuemu off : 0.0065
      cpuemu vm86: 0.7100
      cpuemu full: 0.7000
      cpu_emu vm86sim: 0.4000
      cpu_emu fullsim: 0.4000

      DOS (Git branch simx86-no-mprotect with self merged devel)

      cpuemu off : 0.0060
      cpuemu vm86: 0.6700
      cpuemu full: 0.6700
      cpu_emu vm86sim: 0.4000
      cpu_emu fullsim: 0.3900

      Hog threshhold made no difference.


      Sent from sourceforge.net because dosemu-notify@lists.sourceforge.net is
      subscribed to https://sourceforge.net/p/dosemu/support-requests/

      To unsubscribe from further messages, a project admin can change settings at
      https://sourceforge.net/p/dosemu/admin/support-requests/options. Or, if
      this is a mailing list, you can unsubscribe from the mailing list.

       

      Related

      Support Requests: #264

  • Andrew Bird

    Andrew Bird - 2014-04-29

    Hi Bart,
    I'm not sure if you were able to read test results (I notice you didn't apply your patch to git) but I'll repost them here rather than via email and perhaps I can format them properly. So you can see your patch really helps as the timing is now only 24x slower than native, whereas it was 132x/136x before. I wonder what's going on with the DJGPP compiled version hitting near native speeds in the vm86 CPUEMU, or perhaps it's something the TURBO C++ version does that hurts badly?

    TURBO C++ compiled version (attached in initial post) current devel branch

    TestFbench ... FAIL
    
    ======================================================================
    FAIL: TestFbench
    ----------------------------------------------------------------------
    CPUEMU
           off :  OK  target = 1.5x, result = 1.2x
          vm86 : FAIL target = 75.0x, result = 132.0x
          full : FAIL target = 75.0x, result = 136.0x
       vm86sim :  OK  target = 150.0x, result = 80.0x
       fullsim :  OK  target = 150.0x, result = 80.0x
    
    ----------------------------------------------------------------------
    

    TURBO C++ compiled version (attached in initial post) current devel branch with your FWAIT patch

    TestFbench ... ok
    
    ======================================================================
    PASS: TestFbench
    ----------------------------------------------------------------------
    CPUEMU
           off :  OK  target = 1.5x, result = 1.2x
          vm86 :  OK  target = 75.0x, result = 24.0x
          full :  OK  target = 75.0x, result = 24.0x
       vm86sim :  OK  target = 150.0x, result = 80.0x
       fullsim :  OK  target = 150.0x, result = 80.0x
    
    ----------------------------------------------------------------------
    

    DJGPP GCC 4.9 compiled version (attached here) current devel branch with your FWAIT patch

    TestFbench ... ok
    
    ======================================================================
    PASS: TestFbench
    ----------------------------------------------------------------------
    CPUEMU
           off :  OK  target = 1.5x, result = 1.2x
          vm86 :  OK  target = 75.0x, result = 1.2x
          full :  OK  target = 75.0x, result = 4.0x
       vm86sim :  OK  target = 150.0x, result = 1.2x
       fullsim :  OK  target = 150.0x, result = 61.2x
    
    ----------------------------------------------------------------------
    
     
  • Bart Oldeman

    Bart Oldeman - 2014-04-29

    Hi,

    I just haven't got around to committing the patch yet but I will tonight.
    DJGPP with vm86 IS native (DPMI), still with "full" it uses the JIT and the 4x slowdown is much better than with Turbo C. If you can compile the Turbo C++ version with native FPU (the default is to try emulation, then native which involves some self-modifying code), perhaps you see something better too.

     
  • Andrew Bird

    Andrew Bird - 2014-04-30

    Hi Bart,
    I'm still working on making the benchmark test runner reusable for other tests, and I've taken on board your comments about not shipping binaries.
    I didn't get the chance to rebuild the TURBO C++ version with -fp87, but I'm primarily interested in helping the performance of existing programs rather than to tweak new code (not that you implied that!). Is there a way of analysing an EXE under DOSEMU or otherwise to determine how many times a particular instruction gets run, sort of like gprof but at the instruction level?

     
  • Bart Oldeman

    Bart Oldeman - 2014-05-01

    Hi Andrew,
    attached is a patch to improve performance for the Turbo C version to be similar to DJGPP in JIT mode (from 24x to 4.5x in my test). It's a bit dirty though so I won't apply it to git as is.

    Long explanation: the code is full of instructions such as
    int 39 (there are 8 ints for this, 34-3b, see also Ralf Brown's list)
    where FP instructions such as fwait and fld would be if -fp87 were used. Now int39's interrupt handler will emulate the FP ins if there is no copro, but will patch the int39 into (for example) fwait; fld ... if there is a copro.

    The latter is what happens. The JIT creates a new translation block for every patched int 3x with jmps in between the blocks. It's better if there were just one block containing many FPU instructions, which is what the attached source patch does: it forces retranslation if the "int 3x" is patched.

    As for gprof style functionality, no it's not there unless of course you add some code to DOSEMU itself.

    As for 16-bit compilers, that is an old issue. OpenWatcom can produce DOS binaries directly from Linux but many distributions do not like its license as not being free enough.

     
  • Andrew Bird

    Andrew Bird - 2014-05-02

    Hi Bart,
    Your patch helped with the Turbo C++ compiled version of fbench, on my hardware I got 5.2x from an initial 26.0x, a substantial improvement. Regarding the vx86sim/fullsim timing is there any possibility of speed up, only I have another benchmark (integer) that is really weak there?

    Current devel branch - no patch

    ==============================================================
    FAIL: TestFbenchTc
    -------------------------------------------------------------- 
    CPUEMU
           off :  OK  target <= 2.0x, result = 1.2x
          vm86 : FAIL target <= 2.0x, result = 26.0x
          full : FAIL target <= 5.0x, result = 26.0x
       vm86sim : FAIL target <= 2.0x, result = 79.6x
       fullsim : FAIL target <= 75.0x, result = 78.8x                             
    

    Current devel branch + cd.diff applied

    ============================================================== 
    FAIL: TestFbenchTc
    -------------------------------------------------------------- 
    CPUEMU
           off :  OK  target <= 2.0x, result = 1.2x
          vm86 : FAIL target <= 2.0x, result = 5.2x
          full : FAIL target <= 5.0x, result = 5.2x
       vm86sim : FAIL target <= 2.0x, result = 78.0x
       fullsim : FAIL target <= 75.0x, result = 78.0x
    
     
  • Andrew Bird

    Andrew Bird - 2014-05-03

    Hi Bart,
    I retested with your latest devel c1ddb275b8ca54fe66b8b6144cf0bb5c861d8f76 and these are the results. It's looking a lot better. Have you reached the point yet where there's no low hanging fruit?

    ===============================================================
    PASS: TestFbenchDjgpp
    ---------------------------------------------------------------
    CPUEMU
           off :  OK  target <= 2.0x, result = 1.0x
          vm86 :  OK  target <= 2.0x, result = 1.2x
          full :  OK  target <= 5.0x, result = 4.0x
       vm86sim :  OK  target <= 2.0x, result = 1.2x
       fullsim :  OK  target <= 75.0x, result = 61.2x
    
    ===============================================================
    PASS: TestFbenchTcc
    ---------------------------------------------------------------
    CPUEMU
           off :  OK  target <= 2.0x, result = 1.0x
          vm86 :  OK  target <= 8.0x, result = 4.3x
          full :  OK  target <= 8.0x, result = 4.3x
       vm86sim :  OK  target <= 80.0x, result = 64.7x
       fullsim :  OK  target <= 80.0x, result = 65.0x
    
    ---------------------------------------------------------------
    Ran 2 tests in 618.609s
    
    OK
    
     

Log in to post a comment.