Profiling C code on Linux with gprof
gprof is performance analysis tool for Linux. Use it by compiling your C code with the
-pg option for
gcc, reproducing the issue, and then running
gprof against the previously-generated
I was writing some cross-platform C code that worked well on my Windows 10 PC, but didn’t work so well on a Raspberry Pi 4 running Raspberry Pi OS. The program would slow down dramatically, and a quick analysis showed that there was a CPU bottleneck. I needed a profiler to help me see where the program’s CPU time was being spent. I decided to use
First, your code needs to be recompiled with the
-pg option for the
gcc compiler. This option needs to be applied at both the compilation and linking step. The
-pg option generates instrumentation that tracks function calls at runtime.
Once you rebuild your program, run it again and reproduce the issue. Ideally, allow the slow part of the program to execute multiple times so you get a good sample of the problem. Exit the program, and you should see that a new file was generated:
gmon.out. This file contains the data collected by the
The next step is to process
gprof. The tool takes two arguments: the executable filename that was used to generate the gmon.out file, and the path to the gmon.out file. This tool generates a lot of output text, so you may want to redirect it to a text file. Run it like this:
$ gprof /path/to/exe /path/to/gmon.out > gmon.txt
gmon.txt file and you’ll see results that look something like this:
% cumulative self self total time seconds seconds calls ms/call ms/call name 80.01 0.12 0.12 392 0.31 0.31 FunctionA 20.00 0.15 0.03 2480 0.01 0.01 FunctionX 0.00 0.15 0.00 169740 0.00 0.00 FunctionB 0.00 0.15 0.00 144896 0.00 0.00 FunctionY
Be careful of how you interpret the results. There’s an explanation in the generated output of the meaning of the various columns. The cumulative time includes calls to other functions, so keep in mind the caller/callee relationship of the functions (the output also includes sampled call stacks, which can be helpful in understanding this). Also, consider that
gprof doesn’t track calls made into libraries that weren’t built with the
-pg option, so time spent in those libraries may instead accrue to the calling function.
Going back to the problem I had on Raspberry Pi, it turns out I had some rather inefficient code that was making too many calls to the OS graphics libraries. On Windows, this problem was masked by the excellent performance of the graphics driver. On Raspberry Pi OS, the graphics stack couldn’t keep up, and the problem became clear. Since I can’t rely on the performance of a particular system’s graphics implementation, I’m going to optimize my own code, minimizing calls to the OS graphics libraries.