Summary: gprof is performance analysis tool for Linux. Use it by compiling your C code with the -pg option for gcc, reproducing the issue, and then running gprof against the previously-generated gmon.out file.


Background

I was writing some cross-platform C code that worked well on my Windows 10 PC, but didn’t work so well on a Raspberry Pi 4 running Raspberry Pi OS. The program would slow down dramatically, and a quick analysis showed that there was a CPU bottleneck. I needed a profiler to help me see where the program’s CPU time was being spent. I decided to use gprof.

Steps

First, your code needs to be recompiled with the -pg option for the gcc compiler. This option needs to be applied at both the compilation and linking step. The -pg option generates instrumentation that tracks function calls at runtime.

Once you rebuild your program, run it again and reproduce the issue. Ideally, allow the slow part of the program to execute multiple times so you get a good sample of the problem. Exit the program, and you should see that a new file was generated: gmon.out. This file contains the data collected by the -pg instrumentation.

The next step is to process gmon.out using gprof. The tool takes two arguments: the executable filename that was used to generate the gmon.out file, and the path to the gmon.out file. This tool generates a lot of output text, so you may want to redirect it to a text file. Run it like this:

$ gprof /path/to/exe /path/to/gmon.out > gmon.txt

Open the gmon.txt file and you’ll see results that look something like this:

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 80.01      0.12     0.12      392     0.31     0.31  FunctionA
 20.00      0.15     0.03     2480     0.01     0.01  FunctionX
  0.00      0.15     0.00   169740     0.00     0.00  FunctionB
  0.00      0.15     0.00   144896     0.00     0.00  FunctionY

Considerations

Be careful of how you interpret the results. There’s an explanation in the generated output of the meaning of the various columns. The cumulative time includes calls to other functions, so keep in mind the caller/callee relationship of the functions (the output also includes sampled call stacks, which can be helpful in understanding this). Also, consider that gprof doesn’t track calls made into libraries that weren’t built with the -pg option, so time spent in those libraries may instead accrue to the calling function.

My Findings

Going back to the problem I had on Raspberry Pi, it turns out I had some rather inefficient code that was making too many calls to the OS graphics libraries. On Windows, this problem was masked by the excellent performance of the graphics driver. On Raspberry Pi OS, the graphics stack couldn’t keep up, and the problem became clear. Since I can’t rely on the performance of a particular system’s graphics implementation, I’m going to optimize my own code, minimizing calls to the OS graphics libraries.