CodeProfiling

Code profiling

  1. Overview
  2. Results
    1. Global profiling (no particular code area in mind)
    2. Hardware and OS software configurations
    3. Gaming scenarios

Overview

For various other tasks, we need information about where the CPU is actually the most heavily used in the game, under various run-time conditions.

Because we want to enhance the 3D models and textures quality, increase the game realism, ... we need solutions to increase the frame rates, or keep them at the same level as now.

But for the moment, we only have impressions and ideas about this, no real figures, so we don't know where to work.

Tasks that may need profiling data :

  • D13 Multi-threading preparation : which parts of the code would need to be, and which would not help if they were.
  • D36 Optimize robots computations : do we really need to work on this ? only at race start (initialization) or during the race ?
  • A15 Fix tracks that kill FPS : which parts of the code are implied in these frame rate drop downs ? can we fix this ? how ?
  • A06, A07, A08, A09 Higher resolution textures for cars : we'd like to know if these enhancements change the frame rates or not.
  • A16 to A31 Track textures rework / enhancements : same.
  • A39 3D cockpit model : same.

Results

Here are the profiling results we got from various :

  • hardware and software configurations,
  • build configuration (Debut / Release),
  • profiling methods,
  • gaming scenarios,
  • other parameters (chosen physics engine, ...).

Important notes :

  • These variants should be kept as few as possible, for better repeatability and easier comparisons.
  • On the other hand, we'd ideally like to have at least one variant for each supported OS (Linux, Windows XP ... Windows 7 ? ... even Mac OS X ?). Moreover, different profiling purposes will likely result in really non comparable outputs.

See the tables below for details about these variants.

See also the callgrind format files in svn:/trunk/doc/docdevel/profiling for full details (the files from which the following tables have been built).

Global profiling (no particular code area in mind)

All figures below are absolute cumulative % elapsed time, except for the lines giving frame rates (frames/s) or update rates (number/s).

Normal racing, oprofile method

Only the significant upper levels of the call graph are given here (indented through the '*' char).

Hard / soft configuration CML1 CML1 CML1 CML1 CML1 CML1 CML1 CML1 CML1 CML1
Profiling method MO1 MO1 MO1 MO1 MO1 MO1 MO1 MO1 MO1 MO1
Gaming scenario SGL1 SGL1 SGM1 SGM1 SGL1 SGL1 SGM1 SGM1 SGH1 SGH1
Physics engine Simu V2 Simu V3 Simu V2 Simu V3 Simu V2 Simu V3 Simu V2 Simu V3 Simu V2 Simu V3
Build configuration Debug Debug Debug Debug Release Release Release Release Release Release
Speed Dreams 82,50 86,00 87,50 97,20 ? ? ? ? ? ?
* libm.so 2,00 5,80 4,70 15,00 1,90 5,20 4,50 14,00 10,10 35,00
* libc.so 5,80 5,30 4,60 3,80 6,40 5,90 6,00 5,60 5,90 3,60
* no-vmlinux (Linux kernel) 9,00 9,50 10,20 10,60 9,10 9,00 8,70 8,60 8,70 8,10
* ReUpdate 8,50 13,20 23,80 36,70 ? ? ? ? ? ?
* * ReOneStep 5,40 10,20 21,20 34,00 ? ? ? ? ? ?
* * * Simu Vx simUpdate 3,80 8,45 16,90 30,20 0,90 1,90 2,90 6,10 8,00 16,30
* * * Simplix TDriver::drive 0,75 0,60 2,40 2,00 0,40 0,80 1,20 2,10 2,80 5,00
* * * USR drive 0,50 0,45 1,60 1,40 0,30 0,60 0,80 1,80 2,20 3,80
* SSGGraph refresh 51,70 47,30 38,60 28,20 71,10 67,70 65,10 51,30 52,10 26,90
* * /dev/zero (??) 5,50 4,90 5,60 2,90 6,20 5,10 7,80 5,00 6,30 2,10
* * cGrScreen::update 2,65 2,50 2,20 2,20 16,20 16,10 15,50 12,90 15,50 8,20
* * * cGrScreen::camDraw 1,80 1,70 1,75 1,80 ? ? ? ? ? ?
* * * cGrBoard::refreshBoard 0,70 0,60 0,45 0,40 ? ? ? ? ? ?
* * libGLCore.so.185. 44,00 40,00 30,00 19,00 49,30 46,00 42,80 32,50 11,20 13,70
* * libGL.so.185. 2,00 1,90 1,40 0,80 2,90 2,70 2,90 1,80 1,50 0,70
* * libOpenAL.so 2,70 2,90 5,00 6,20 2,70 2,90 3,90 4,10 4,20 4,30
Frames per second : min-max (mean) 60-135(87) 60-125(83) 24-62(35) 20-61(34) 59-181(93) 58-185(99) 32-91(46) 28-88(43) 17-58(34) 7-33(22)
Robots update rate (N per second) 48 51 49 48 52 ? ? ? ? 46
Simu update rate (N per second) 496 510 497 502 542 ? ? ? ? 500

Notes:

  • Robots and Simu update rates have been collected in separate test sessions after activating some instrumentation code in raceengine.cpp (#define LogEvents 1). This instrumentation code outputs strings to the console and thus slowers the frame rate. This means that these update rates (2 last lines of the table) are a bit over estimated).
  • For release build variants, some figures are lacking, and some other have been guessed up to a certain level from debug variants, because CPU consumption has been collected mainly on a per-shared library basis. librobottools is thus assumed to be shared by USR and Simplix by the same ratio.

Comments on the previous table (Medium to high Linux 64 configuration, oprofile tool, debug and release build configurations) :

  1. Simu V3 actually eats significantly more CPU than Simu V2. And even if we intentionally didn't assign the libm.so figure to the physics engine in the call tree, it problably has to be added to the "Simu Vx simUpdate" one in a significant ratio (at least as for the difference between the Simu V2 and the Simu V3 case).
  2. This higher CPU consumption does NOT result in a significantly lower frame rate, except for the heavy gaming scenario, where the game remains playable with Simu V2, but not with Simu V3 (30% FPS loss).
  3. If we don't need to decrease RCM_MAX_DT_ROBOTS, working on optimizing / multi-threading the robots code is probably not a top priority task (from far).
  4. Frame rate gains are probably to be looked for :
    1. in the graphics module : here, the graphical world model seems to be a good target (3D models, textures, level of details management, ...), from far more than direct C code optimizations ; running it at least partially in parallel with the physics engine module is probably also a fruitful target.
    2. in the physics engine module : probably hard to optimize code-wise ; running it at least partially in parallel with the graphics module is probably a more fruitful target.

Note: More variants are to be tested before the above assertions are proved to be true. Mainly Windows build configurations are lacking.

Normal racing, callgrind method

Only the significant upper levels of the call graph are given here (indented through the '*' char).

Hard / soft configuration CML2
Profiling method MO2
Gaming scenario SGL1
Physics engine Simu V2
Build configuration Debug
Speed Dreams 100.00
* libm.so 10.40
* libc.so 3.44
* no-vmlinux (Linux Kernel) 8.77
* ReUpdate 41.55
* * ReOneStep 41.31
* * * Simu Vx simUpdate 34.08
* * * Simplix TDriver::drive 3.07
* * * USR drive 1.79
* SSGGraph refresh 0.00
* * cGrScreen::update 0.00
* * * cGrScreen::camDraw 0.00
* * * cGrBoard::refreshBoard 0.00
* * libGLCore.so.185. 0.00
* * libGL.so.185. 0.00
* * libOpenAL.so 27.61
Frames per second : min-max (mean) 6/3600
Robots update rate (N per second) 50?
Simu update rate (N per second) 500?

Notes:

  • In the first column, the measured frame rate is actually 6 per hour (due to the profiling method with callgrind).

Blind racing (no graphics), oprofile method

Only the significant upper levels of the call graph are given here (indented through the '*' char).

Hard / soft configuration CML1 CML1 CML1 CML1 CML1 CML1 CML1 CML1
Profiling method MO1 MO1 MO1 MO1 MO1 MO1 MO1 MO1
Gaming scenario SGL2 SGL2 SGL2 SGL2 SGL3 SGL3 SGL3 SGL3
Physics engine Simu V2 Simu V3 Simu V2 Simu V3 Simu V2 Simu V3 Simu V2 Simu V3
Build configuration Debug Debug Release Release Debug Debug Release Release
Speed Dreams 94,5 96,9 82,9 80,6 85,4 79,2 83,2 81,7
* libc.so 2,1 2,4 6,0 2,8 4,6 1,3 6,0 3,2
* no-vmlinux 17,4 13,2 17,6 11,9 19,2 14,6 17,9 11,3
* ReUpdate? 61,3 61,3 56,3 64,4 61,6 63,3 57,7 65,7
* * ReOneStep? 60,3 60,9 ? ? 60,6 62,9 ? ?
* * * Simu Vx simUpdate 47,7 55,4 48,2 63,1 43,9 55,5 54,8 63,6
* * librobottools.so 4,1 9,1 4,3 9,8 4,2 9,6 5,0 9,9
* * libm.so 13,3 19,8 26,0 40,9 12,8 19,5 25,4 41,6
* * * robot::drive 7,4 3,0 5,4 2,1 11,8 5,0 5,5 2,1
* * librobottools.so 0,7 0,1 0,7 0,1 0,7 0,1 0,8 0,1
* * libm.so 0,4 0,2 2,2 0,4 0,9 0,2 1,9 0,1
Robots update rate (N per second) 3,4 1,6 4,7 2,2 2,9 1,5 4,0 2,1
Simu update rate (N per second) 35,2 16,7 47,3 22,2 29,5 15,2 41,3 21,6

Comments on the previous table (Medium to high Linux 64 configuration, oprofile tool, debug and release build configurations) :

  1. USR or Simplix doesn't make any significant difference ... because only one driver possible (not enough program counter hits).
  2. Again, Simu V3 eats more CPU than Simu V2 : from 20 to 30 % more
  3. A good part of the difference resides in the calls to the Maths library (libm) : with Simu V3, around 50 % CPU is eaten in this lib (whereas only around 30-35% with Simu V2).

Note : The profiling data files for these tests, especially the "Debug build" ones, gives more details about how the main components of Simu V3 are involved.

Hardware and OS software configurations

  1. Configuration CML1 "Medium Linux" (Linux 64, Athlon 64x2 2400GHz, nVidia 8800 GT)
    • Hardware
    • CPU : AMD Athlon 64x2 4600+ (2400 GHz, 2 Gb DDR2 800)
    • Video : nVidia 8800 GT 512 Mb
    • Base software :
    • Linux Mandriva Linux 2010.0 x86_64 (up-to-date)
    • Proprietary nVidia driver 185.18.36
    • KDE 4.5
    • Speed Dreams
    • svn 2452
    • Debug / Release build with OPTION_DEBUG = ON.
    • Simu V2 / Simu V3 physics engine
    • Home compiled Open AL 1.11.753 / ALUT 1.1
    • Other dependencies as standard Mandriva packages
  2. Configuration CML2 (Linux 64, Athlon 64x2 5000 (2593 GHz), nVidia 8600 GT, Xinerama)
    • Hardware
    • CPU : AMD Athlon(tm) 64 X2 Dual Core Processor 5000+ (2593 GHz, 2 Gb memory)
    • Video : nvidia 8600 GT 256 Mb
    • Base software :
    • Gentoo Linux (up-to-date)
    • Proprietary nVidia driver 190.42
    • Speed Dreams
    • Debug / Release build with OPTION_DEBUG = ON
    • OpenAL 1.11.753, FreeAlut? 1.1 (standard Gentoo packages)
    • Other dependencies as standard Gentoo packages

Profiling methods

Method MO1 : Linux oprofile CPU_CLK_UNHALTED events

Here's how to setup, start, stop profiling, and how to produce reports (needs oprofile, valgrind with cachegrind, kcachegrind).

sudo opcontrol --no-vmlinux --event=default --callgraph=32
sudo cat /root/.oprofile/daemonrc
  SESSION_DIR=/var/lib/oprofile
  CHOSEN_EVENTS_0=CPU_CLK_UNHALTED:100000:0:1:1
  NR_CHOSEN=1
  SEPARATE_LIB=1
  SEPARATE_KERNEL=0
  SEPARATE_THREAD=0
  SEPARATE_CPU=0
  VMLINUX=none
  IMAGE_FILTER=
  BUF_SIZE=65536
  CPU_BUF_SIZE=0
  CALLGRAPH=32
  XENIMAGE=none
sudo opcontrol --start-daemon
sudo opcontrol --reset
speed-dreams
# Start the race (New race) and wait for the real race start (3D world / blind-mode screen).
sudo opcontrol --start
# Keep SD window visible (no screen saver ...)
# Let the race go until the first racer crosses the end line.
sudo opcontrol --shutdown # Recommended daemon shutdown/start before each profiling session.
opreport -cgf >opreport-cgf.out.txt # Extract call tree + profiling data
# Convert to cachegrind format for easy analysis in kcachegrind GUI
cat opreport-cgf.out.txt | op2calltree.py >opreport-cgf.op2calltree.py.cachegrind
kcachegrind opreport-cgf.op2calltree.py.cachegrind &

Notes:

  • op2calltree.py contributed by Nathaniel Smith <njs@…>, with small improvements by Jean-Philippe.
  • the similar op2calltree perl script that is shipped with oprofile doesn't support call tree information (flat report)
  • you can also use gprof2dot.py to produce a PNG image of the call tree with detailed profiling info :

    opreport -gdf | gprof2dot.py -f oprofile | dot -Tpng -o opreport-gdf.gprof2dot.png
    

Method MO2 : Linux valgrind (tool = callgrind)

Start Speed Dreams in valgrind with the callgrind tool and make sure the result is stored in a file which can be read by kcachegrind.

Notes:

  • because valgrind simulates the CPU, it is quite slow and thus the number of frames is very low (0.2 fps). This means that the graphical code doesn't need to be called that often;
  • simulated time is much slower then real time;
  • the resulting file can be read by kcachegrind.

Gaming scenarios

Scenario SGL1 "Global light 1"

  • Desktop configuration :
    • Sync To Vblank OFF
  • Game options :
    • Display : Window, 1280x1024x24, Best
    • OpenGL : Texture compression on, max 8192
    • Graphic : Default, Sky dome 0, no dynamic time
    • AI : Pro level
    • Sound : OpenAL
    • Simulation : Simu V2 or Simu V3
  • Gaming conditions (light load)
    • Quick Race 3 laps (Weather : Morning, Scarce clouds)
    • Cockpit view F2 (fixed focus) with rear-view mirror in the 2nd driver's car (no change during the race)
    • E-Track6
    • 2 USR and 2 Simplixes drivers on SC Boxer 96 :
1 USR Arne Fischer SC Boxer 96
2 USR Hans Meyer SC Boxer 96
3 Simplix Yuuki Kyousou SC Boxer 96
4 Simplix Haruna Say SC Boxer 96

Notes:

  • The profiling data collected from 10, 5 and 3 lap such QuickRaces are very similar, so it was decided to use 3 (shorter test sessions).
  • E-Track6 seems to be a track where the current Simu V2 setups are quite compatible with Simu V3.
  • Same for USR and Simplixes the SC Boxer 96 car : their Simu V2 setups are quite compatible with Simu V3.

Scenario SGM1 "Global medium 1"

Same as Scenario SGL1 "Global light 1", but :

  • Gaming conditions (medium load)
    • Cockpit view F2 (fixed focus) in the 3rd driver's car (no change during the race)
    • 6 USR and 6 Simplixes drivers on SC Boxer 96, SC Cavallo 360 and SC Murasama NSX :
1 USR Arne Fischer SC Boxer 96
2 Simplix Yuuki Kyousou SC Boxer 96
3 USR Hans Meyer SC Boxer 96
4 Simplix Haruna Say SC Boxer 96
5 USR Tony Davies SC Cavallo 360
6 Simplix Vittorio Basso SC Cavallo 360
7 USR Ken Rayner SC Cavallo 360
8 Simplix Sal Moretti SC Cavallo 360
9 USR Mick Donna SC Murasama NSX
10 Simplix Arnaud Beauchamp SC Murasama NSX
11 USR Greg Wilson SC Murasama NSX
12 Simplix Jacques Prewitt SC Murasama NSX

Notes:

  • As for for the SC Boxer 96 car, Simu V2 setups for USR and Simplixes on SC Cavallo 360 and SC Murasama NSX seem quite compatible with Simu V3.

Scenario SGH1 "Global heavy 1"

Same as Scenario SGL1 "Global medium 1", but :

  • Gaming conditions (heavy load)
    • Cockpit view F2 (fixed focus) in the 2nd driver's car (no change during the race)
    • 12 USR and 12 Simplixes drivers on SC cars (all possible ones) :
1 USR Arne Fischer SC Boxer 96
2 Simplix Yuuki Kyousou SC Boxer 96
3 USR Hans Meyer SC Boxer 96
4 Simplix Haruna Say SC Boxer 96
5 USR Tony Davies SC Cavallo 360
6 Simplix Vittorio Basso SC Cavallo 360
7 USR Ken Rayner SC Cavallo 360
8 Simplix Sal Moretti SC Cavallo 360
9 USR Mick Donna SC Murasama NSX
10 Simplix Arnaud Beauchamp SC Murasama NSX
11 USR Greg Wilson SC Murasama NSX
12 Simplix Jacques Prewitt SC Murasama NSX
13 USR Jackie Graham SC Spirit 300
14 Simplix Marisol Carrillo SC Spirit 300
15 USR Steve Magson SC Spirit 300
16 Simplix Luis Barreto SC Spirit 300
17 USR Mark Duncan SC FMC GT4
18 Simplix Brad Newman SC FMC GT4
19 USR Chuck Davis Jr SC FMC GT4
20 Simplix Micheal Ashbury SC FMC GT4
21 USR Stefan Larsson SC Lynx 220
22 Simplix Augustus Booth SC Lynx 220
23 USR Don Nelson SC Lynx 220
24 Simplix Jeremy Carmicheal SC Lynx 220

Scenario SGL2 "Global light 2"

  • Game options :
    • AI : Pro level
    • Simulation : Simu V2 or Simu V3
  • Gaming conditions (light load)
    • Practice Race 20 laps, Results-only mode (Weather : Morning, Clear sky)
    • E-Track6
    • 1 USR on SC Boxer 96 (Arne Fischer)

Notes:

  • The profiling data collected from 20 and 40 laps are very similar, while it was quite different for 10 laps, so it was decided to use 20 (shorter test sessions).
  • Same as for SGL1.

Scenario SGL3 "Global light 3"

Same as Scenario SGL1 "Global light 2", but :

  • Gaming conditions (light load)
    • 1 Simplix on SC Boxer 96 (Yuuki Kyousou)

Related

Wiki: Index
Wiki: PreparingMultithreading
Wiki: TheWayToRelease2

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.