pascal p5c Blog

fast, portable Pascal compiler using gnu c as intermediate code

Brought to you by: trevorbl

Use profiling to speed up your p5c pascal programs

The idea behing profiling is to identify so called hot-spots in your program - ie those parts where the program spends most of its time.
When we know where they are, we can rework the code to make it run faster.
There's no point doubling the speed of code that runs 1% of the time because it won't be noticed. On the other hand, doubling the speed of code that runs, say, 20% of the time should save 10% on your program's execution time.

Here, we'll be looking to make p5c run faster. I'm running on linux,and will be using the profile module of valgrind to collect the profiling data, and kcachegrind to examine it. (These are 2 amazing programs.) You can adapt the techniques here to your own setup. Users of other systems might find alternative tools here

step 1 - build a version of the program under test

This puts the line numbers and symbol names in the compiled program and enables the results to be displayed in terms of program line numbers and function names (or procedure names).

Our prgram under test is p5c so to remove some confusion, we'll create a new version of p5c and call it p5c.t.pas
Replace p5c.t with your program name in what follows.

Put these 2 lines at the top of p5c.t.pas:

{$n+,d- -- force line numbers, omit debug code }
{@@# line 1 "p5c.pas" @@}

This ensures that the intermediate c file knows the line numbers of p5c.pas, and will allow gcc to put these into the generated binary image.

Now build the image:

$P5CDIR/pc -g p5c.t.pas

The -g option tells gcc to put line numbers and other symbolic information into the generated image, p5c.t

step 2 - gather profiling data

You may need to remove old profile data first:

rm -f callgrind.out.*

Now use valgrind to run your program.

It needs a command line of the form valgring options your-command command-line-args
In this example, the options specify the callgrind module.
The command is p5c.t and the arguments are respectively the pascal source file that we will compile, and the gcc intermediate file:

valgrind --tool=callgrind p5c.t iso7185-tests/iso7185pat.pas iso7185-tests/iso7185pat.c

With the bash shell, this can be abbreviated:

valgrind --tool=callgrind p5c.t iso7185-tests/iso7185pat.{pas,c}

Since valgring simulates p5c.t cycle-by-cycle, this could take a while. The payback is we get 100% accurate profiling.

We could run again, compiling different source files and have several sets of results if we wanted to.

step 3 - examine the results

Use kcachegrind for this. The results of the previous test or tests are in files called callgrind.out.<some-id></some-id>

kcachegrind callgrind.out.*

This shows the results in a page like this:

this

It's helpful to select "Source File" from the drop down box in the to[p left window, then select p5c.t.pas to hide data for library calls etc,
Then select "Source Code" and "Call Graph" in the tabs on the 2 right hand side windows.
Finally, select "Relative %" on the tool bar.

The bottom left window shows the detailed profiling data.
We can see that the program spends 99.99% of its time in main() and its children, then 99.74% in a procedure called programme (shown as programme_1) and so on. in order og decreasing time spent.

Now click on the"self" tab at the top of this window:

https://sourceforge.net/p/pascal-p5c/blog/2017/07/how-to-use-profiling-to-speed-up-p5c-pascal-programs/attachment/profile2.png

Now we see how much time is spent in each procedure or function, but excluding time spent calling any other functions.

It shows 46% of the time is spent in a single function called lcase, Click on that line.
The top right window shows the source code of the lcase function, and the bottom right window shows it's call graph.

Similarly, 35 % of the time is spent in a function called strequvf.

Both of tehse are very small functions, and improve,ments to the eficienct of these will have a dramatic effect on the compile speed of p5c.

But wait! We also see that lcase is called 42 milion times, and the call graph shows that it is called nearly always from strequvf.

Given that p5c.t was compiling souce code a few tens of k bytes long, we would expect to be able to reduce the number of times lcase is called as well, so there's even more efficiency gains.

To be continued ...

Posted by 2020-08-28 | Draft

Attachments: