Hi there,
I compiled fbench (http://www.fourmilab.ch/fbench/fbench.html) for Linux(with GCC) and DOS (with Turbo C++) and compared the runtime results in various Dosemu modes. The reason for doing this is that at present I run Dosemu on 64bit hardware and use the 32 bit kernel so I can use the cpuemu off mode. At some point I hope that CPU emulation will reach an acceptable level of performance such that having the machine in 64 bit mode will become possible.
It seems that emulation is still a couple of orders of magnitude worse than the cpuemu off.
On my AMD 5600+ (2.8GHz) (32bit kernel) (lower time is better):
Native: 0.0052
cpuemu off : 0.0065
cpuemu vm86: 0.7100
cpuemu full: 0.7000
cpu_emu vm86sim: 0.4000
cpu_emu fullsim: 0.4000
cpuemu off : 0.0060
cpuemu vm86: 0.6700
cpuemu full: 0.6700
cpu_emu vm86sim: 0.4000
cpu_emu fullsim: 0.3900
Hog threshhold made no difference.
Attached DOS executable, run like 'fbench 100000', check out the 'scaled value' printed.
Interesting: the simulator is faster than the JIT here. As this is an FPU benchmark there may be some expensively-emulated FPU instructions in tight loops. I'll have a look.
For CPU benchmarks you could have a look also at emulators.com, e.g. here:
http://www.emulators.com/docs/nx11_flags.htm
Hi Bart,
Thanks for the link, I found its content very interesting. I'm wondering if some sort of automated test benchmark should be added to Dosemu. Do you think there's any value in me adding one? Would the tests have to be compiled at build time, or is it acceptable to have DOS executables shipped as is?
Thanks ,
Andrew
Sent from Samsung tablet
-------- Original message --------
From Bart Oldeman bartoldeman@users.sf.net
Date: 25/04/2014 20:37 (GMT+00:00)
To "[dosemu:support-requests]" 264@support-requests.dosemu.p.re.sf.net
Subject [dosemu:support-requests] #264 CPU benchmarks
Interesting: the simulator is faster than the JIT here. As this is an FPU benchmark there may be some expensively-emulated FPU instructions in tight loops. I'll have a look.
For CPU benchmarks you could have a look also at emulators.com, e.g. here:
http://www.emulators.com/docs/nx11_flags.htm
[support-requests:#264] CPU benchmarks
Status: open
Group: v1.0_(example)
Created: Fri Apr 25, 2014 01:16 PM UTC by Andrew Bird
Last Updated: Fri Apr 25, 2014 01:18 PM UTC
Owner: nobody
Hi there,
I compiled fbench (http://www.fourmilab.ch/fbench/fbench.html) for Linux(with GCC) and DOS (with Turbo C++) and compared the runtime results in various Dosemu modes. The reason for doing this is that at present I run Dosemu on 64bit hardware and use the 32 bit kernel so I can use the cpuemu off mode. At some point I hope that CPU emulation will reach an acceptable level of performance such that having the machine in 64 bit mode will become possible.
It seems that emulation is still a couple of orders of magnitude worse than the cpuemu off.
On my AMD 5600+ (2.8GHz) (32bit kernel) (lower time is better):
Linux
Native: 0.0052
DOS (Git branch devel)
cpuemu off : 0.0065
cpuemu vm86: 0.7100
cpuemu full: 0.7000
cpu_emu vm86sim: 0.4000
cpu_emu fullsim: 0.4000
DOS (Git branch simx86-no-mprotect with self merged devel)
cpuemu off : 0.0060
cpuemu vm86: 0.6700
cpuemu full: 0.6700
cpu_emu vm86sim: 0.4000
cpu_emu fullsim: 0.3900
Hog threshhold made no difference.
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/dosemu/support-requests/264/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
Related
Support Requests: #264
There is a problem with the fwait instruction -- it was incorrectly marked as being interpreted, but in effect forced the JIT to recompile over and over again. (Note: fwait should cause an FPU exception if appropriate but FPU exceptions are not correctly implemented by cpu_emu at the moment).
Could you try the attached patch?
As for test cases, there is one adapted from QEMU in src/tests, which needs to be compiled by DJGPP. Of course any more tests are always welcome as long as the license is ok. It would be best not to have the binaries shipped with dosemu source code.
Hi Bart,
Here's a couple of runs of my, as yet unfinished, (ab)use of the python
unittest suite. A value of 1 == native, anything else is the factor with
respect to it i.e. 2x is half speed.
devel branch (0ff16be5bc75f8e07ed20d54a4b8e782258a43d1) without patch
bash-4.2$ python test/test_bench.py
TestFbench ... FAIL
======================================================================
FAIL: TestFbench
CPUEMU
off : OK target = 1.5x, result = 1.2x
vm86 : FAIL target = 75.0x, result = 132.0x
full : FAIL target = 75.0x, result = 136.0x
vm86sim : OK target = 150.0x, result = 80.0x
fullsim : OK target = 150.0x, result = 80.0x
Ran 1 test in 119.546s
FAILED (failures=1)
devel branch (0ff16be5bc75f8e07ed20d54a4b8e782258a43d1) with patch
bash-4.2$ python test/test_bench.py
TestFbench ... ok
======================================================================
PASS: TestFbench
CPUEMU
off : OK target = 1.5x, result = 1.2x
vm86 : OK target = 75.0x, result = 24.0x
full : OK target = 75.0x, result = 24.0x
vm86sim : OK target = 150.0x, result = 80.0x
fullsim : OK target = 150.0x, result = 80.0x
Ran 1 test in 64.919s
OK
So it looks good to me, timing is now only 24x slower than native, whereas it
was 132x/136x before. Well done!
Regarding the benchmarking:
I figured including binaries would be a problem. Is there such a thing as
a C cross compiler that runs on Linux and produces DOS binaries?
Does the binary rule work also for FreeDOS objects like command.com and
kernel.sys? Currently I'm running dosemu from the development directory and
not installing. I create a tmp-c directory and populate it with clean
autoexec.bat, config.sys, the FreeDOS objects, and the dosemu derived tools for
each test.
Thanks,
Andrew
On Sunday 27 April 2014 18:03:49 Bart Oldeman wrote:
Related
Support Requests: #264
Hi Bart,
I'm not sure if you were able to read test results (I notice you didn't apply your patch to git) but I'll repost them here rather than via email and perhaps I can format them properly. So you can see your patch really helps as the timing is now only 24x slower than native, whereas it was 132x/136x before. I wonder what's going on with the DJGPP compiled version hitting near native speeds in the vm86 CPUEMU, or perhaps it's something the TURBO C++ version does that hurts badly?
TURBO C++ compiled version (attached in initial post) current devel branch
TURBO C++ compiled version (attached in initial post) current devel branch with your FWAIT patch
DJGPP GCC 4.9 compiled version (attached here) current devel branch with your FWAIT patch
Hi,
I just haven't got around to committing the patch yet but I will tonight.
DJGPP with vm86 IS native (DPMI), still with "full" it uses the JIT and the 4x slowdown is much better than with Turbo C. If you can compile the Turbo C++ version with native FPU (the default is to try emulation, then native which involves some self-modifying code), perhaps you see something better too.
Hi Bart,
I'm still working on making the benchmark test runner reusable for other tests, and I've taken on board your comments about not shipping binaries.
I didn't get the chance to rebuild the TURBO C++ version with -fp87, but I'm primarily interested in helping the performance of existing programs rather than to tweak new code (not that you implied that!). Is there a way of analysing an EXE under DOSEMU or otherwise to determine how many times a particular instruction gets run, sort of like gprof but at the instruction level?
Hi Andrew,
attached is a patch to improve performance for the Turbo C version to be similar to DJGPP in JIT mode (from 24x to 4.5x in my test). It's a bit dirty though so I won't apply it to git as is.
Long explanation: the code is full of instructions such as
int 39 (there are 8 ints for this, 34-3b, see also Ralf Brown's list)
where FP instructions such as fwait and fld would be if -fp87 were used. Now int39's interrupt handler will emulate the FP ins if there is no copro, but will patch the int39 into (for example) fwait; fld ... if there is a copro.
The latter is what happens. The JIT creates a new translation block for every patched int 3x with jmps in between the blocks. It's better if there were just one block containing many FPU instructions, which is what the attached source patch does: it forces retranslation if the "int 3x" is patched.
As for gprof style functionality, no it's not there unless of course you add some code to DOSEMU itself.
As for 16-bit compilers, that is an old issue. OpenWatcom can produce DOS binaries directly from Linux but many distributions do not like its license as not being free enough.
Hi Bart,
Your patch helped with the Turbo C++ compiled version of fbench, on my hardware I got 5.2x from an initial 26.0x, a substantial improvement. Regarding the vx86sim/fullsim timing is there any possibility of speed up, only I have another benchmark (integer) that is really weak there?
Current devel branch - no patch
Current devel branch + cd.diff applied
Hi Bart,
I retested with your latest devel c1ddb275b8ca54fe66b8b6144cf0bb5c861d8f76 and these are the results. It's looking a lot better. Have you reached the point yet where there's no low hanging fruit?