At least for some inputs, uCsim is surprisingly slow on my Power 9 system. For most tasks (compiling SDCC; regression testing) that system is by far the fastest one I have, but there are a few oddities. The following are all times from systems with otherwise practically no load running Debian GNU/Linux:
Command used: time ../../sim/ucsim/mos6502.src/ucsim_mos6502 gen/uc6502/tst_gcc-torture-execute-arith-rand-ll.ihx < ./ports/uc6502/uCsim.cmd
My laptop (Ryzen 4800H):
Runtime: 3.145461 sec
real 0m3,150s
user 0m3,138s
sys 0m0,012s
My Raspi 4:
Runtime: 12.779123 sec
real 0m12,787s
user 0m12,737s
sys 0m0,016s
nemesis (dual 22-core SMT4 Power 9):
Runtime: 30.998519 sec
real 0m31,003s
user 0m30,993s
sys 0m0,013s
Apparently uCsim is so slow here, that this test times out during normal regression testing. I'll try to look into details (where in uCsim do we spent the time) later. But I also noticed a general trend: On most of my systems, during regression testing far more time is spent in SDCC than in uCsim. But for power64 it is more balanced. Using the default timeouts, usually two uc6502 tests fail due to timeouts, and a few z80-related ones are on the edge of failing (they usually pass as long as there is not too much load on the system - make -j 80 is still fine, make -j 120 tends to fail).
Now, that Power 9 system might not have the WOF tables configured correctly, so the CPUs might be running at a lower TDP, and Power 9 is a somewhat older architecture. Being a bit slower than the Ryzen 4800H for some single-threaded workloads is no surprise. But it definitely shouldn't be slower than the Raspi.
Dear Philipp!
Can you attach out file of that mentioned test?
Daniel
I have verified that the .ihx files are identical on the Raspi 4 and the Power9 machine.
P.S.: The number of ticks simulated is the same, too:
nemesis (power64):
raspi-rebstock (aarch64):
Last edit: Philipp Klaus Krause 2023-10-03
This is the gprof output on powerpc64. I also tried valgrind, but it crashed.
P.S.: And for comparison also the gprof output on the Raspi 4. For some reason, when compiled with -pg, both nemesis and the raspi take about 112s (according to time), which is much longer than what I measured without -pg.
Last edit: Philipp Klaus Krause 2023-10-03
Dear Philipp,
Would you please check effect of cperiod value on runtime? Write:
cperiod=100
into uCsim.cmd file (before run) and try some other values as well, such as 10000, 100000, 500000.
Daniel
It doesn't make much of a difference. uCsim is apparently slightly faster for higher values of cpointer (31.1s at cpointer=100000 vs 31.3s at cpointer=100).
Dear Philipp,
Can you tgz and send me full content of results/uc6502 directory please?
Daniel
Here they are. If it helps, I could also give you an account on the machine.
Yes, I think an account would be usefull, so I could make more tests.
I wrote a simple CPU speed measurement (1 thread, no IO) and checked several machines I can access. Nemezis was surprisingly slow. I have no idea how uCsim could run faster on it.
Did you try your "simple CPU speed measurement" on a Raspi 3 or 4? If yes, how does it perform there vs. nemesis?
This is the result of my tests:
szoba2 is an rpi2 and szoba3 is an rpi3.
Last edit: Daniel Drotos 2023-12-16
MFlop is measured with floating point operations but it is not really relevant to uCsim. kips column means kilo-instrution-per-second and it is measured with a cycle that is similar to uCsim instruction simulation.