Overview

Linux perf is a tool that allows counting and sampling of various events in the hardware and in the kernel. Hardware events are available via performance monitoring units (PMU); they measure CPU cycles, cache misses, branches, etc. Kernel events include scheduling context switches, page faults, block I/O, etc.

perf is good for answering these and similar questions:

What is the breakdown of CPU time (or cache misses or branch mispredictions) by function?
What was the total number of page faults generated by two different runs of my program?
How many context switches occurred during the execution?
How many times did my program issued a sched_yield system call?
Which functions triggered most page faults?
How much block I/O did my program generate?

Despite claims that perf can also tell you where and why your program was blocking, I was not able to get it to do so after many attempts. For alternative ways to perform off-CPU analysis, read this post by Brendan Gregg or use WiredTiger's Operation Tracking.

What follows is a quick cheat sheet of how to use perf with WiredTiger. Most of the information in this cheat sheet comes from this excellent tutorial by Brendan Gregg. If you need more information, please refer to that source.

Running perf with WiredTiger

Build WiredTiger as usual. You do not need to add any special compilation flags.

To run perf with WiredTiger, you simply insert the name of the perf command before the WiredTiger executable. For example, if you want to record default perf statistics for a wtperf job, you'd run:

perf stat wtperf <wtperf options>

If you want to profile CPU time of wtperf, you'd run:

perf record wtperf <wtperf options>

Any environment variables you care about will go before the perf command.

If you wanted to run perf on an already running WiredTiger process, you supply a -p <PID> option to either of these commands:

perf record -p <PID>

or

perf stat -p <PID>

Counting events

The command perf stat will report the total count of events for the entire execution.

perf stat wtperf <wtperf options>

By default it will report things like clock cycles, context switches, CPU migrations, page faults, cycles, instructions, branches and branch misses.

Here is a sample output:

   55275.953046      task-clock (msec)         #  1.466 CPUs utilized
      7,349,023      context-switches          #  0.133 M/sec
         38,219      cpu-migrations            #  0.691 K/sec
        160,562      page-faults               #  0.003 M/sec
173,665,597,703      cycles                    #  3.142 GHz
 90,146,578,819      instructions              #  0.52  insn per cycle
 22,199,890,600      branches                  #  401.619 M/sec
    387,770,656      branch-misses             #  1.75% of all branches

   37.710693533 seconds time elapsed

Alternatively, you can decide what events you want to include. For example, if you wanted to count last-level cache (LLC) misses, one way to do this is like so:

perf stat -e LLC-load-misses wtperf <wtperf options>

Or if you wanted all default events and all cache events, run:

perf stat -d wtperf <wtperf options>

To learn about all events you can count or sample with perf, use the command:

perf list

To count all system calls by type, run:

perf stat -e 'syscalls:sys_enter_*'

Sampling events (profiling)

Unlike perf stat, which counts all events, perf record samples events. The simplest way to invoke it is like so:

perf record <command>

This will sample on-CPU functions.

Examining the output

By default, the output will be placed into a file named perf.data. To examine it, run:

perf report

perf report will show, for each sampled event, the call stacks that were recorded at the time of sampling. The call stacks will be sorted from the most frequently occurring to the least frequently occurring.

At the time of this writing, the default sampling frequency is 4000; that is, a sample is taken after every 4000 events. This number may change, so to be sure check the setting in the header in the perf output, like so:

perf report --header

If you want to change the output file, use the -o option:

perf record -o <new_output_file> <command>

And run the reporting with the -i option:

perf report -i <new_output_file>

Here is a way to look at the output with all the call stacks expanded:

perf report -n --stdio

More examples

Sample on-cpu functions with call stacks:

perf record -g <command>

Sample as above, but using a specific sampling frequency (99 hertz):

perf record -F 99 -g <command>

Sample last-level cache load misses with call stacks:

perf record -e LLC-load-misses -g <command>

Sample cache misses for a running process:

perf record -e LLC-load-misses -g -p <PID>

Sample page faults:

perf record -e page-faults -g