perf command
Performance analysis tool for Linux, providing hardware counter statistics and tracing capabilities.
Overview
perf
is a powerful Linux profiling tool that accesses the performance monitoring hardware counters of the CPU to gather statistics about program execution. It can monitor CPU performance events, trace system calls, profile applications, and analyze hardware and software events. Part of the Linux kernel tools, it helps identify performance bottlenecks in applications and the system.
Options
stat
Runs a command and gathers performance counter statistics
$ perf stat ls
Documents Downloads Pictures Videos
Performance counter stats for 'ls':
0.93 msec task-clock # 0.781 CPUs utilized
0 context-switches # 0.000 K/sec
0 cpu-migrations # 0.000 K/sec
89 page-faults # 0.096 M/sec
1,597,086 cycles # 1.724 GHz
1,221,363 instructions # 0.76 insn per cycle
245,931 branches # 265.518 M/sec
10,764 branch-misses # 4.38% of all branches
0.001189061 seconds time elapsed
0.001090000 seconds user
0.000000000 seconds sys
record
Records performance data for later analysis
$ perf record -g ./myprogram
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.064 MB perf.data (1302 samples) ]
report
Displays performance data from a previous recording
$ perf report
# Samples: 1302
#
# Overhead Command Shared Object Symbol
# ........ ....... ................. ..............
#
35.71% myprogram myprogram [.] process_data
24.58% myprogram libc-2.31.so [.] malloc
15.21% myprogram myprogram [.] calculate_result
top
System profiling tool for Linux, similar to top but with performance counter information
$ perf top
Samples: 42K of event 'cycles', 4000 Hz, Event count (approx.): 10456889073
Overhead Shared Object Symbol
12.67% [kernel] [k] _raw_spin_unlock_irqrestore
4.71% [kernel] [k] finish_task_switch
2.82% [kernel] [k] __schedule
2.40% firefox [.] 0x00000000022e002d
list
Lists available events for monitoring
$ perf list
List of pre-defined events (to be used in -e):
cpu-cycles OR cycles [Hardware event]
instructions [Hardware event]
cache-references [Hardware event]
cache-misses [Hardware event]
branch-instructions OR branches [Hardware event]
branch-misses [Hardware event]
...
-e, --event
Specifies which events to monitor (used with other commands)
$ perf stat -e cycles,instructions,cache-misses ./myprogram
Performance counter stats for './myprogram':
1,234,567,890 cycles
987,654,321 instructions # 0.80 insn per cycle
5,432,109 cache-misses
1.234567890 seconds time elapsed
-p, --pid
Monitors a specific process by its PID
$ perf record -p 1234
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.452 MB perf.data (2371 samples) ]
-g, --call-graph
Enables call-graph (stack chain/backtrace) recording
$ perf record -g ./myprogram
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.128 MB perf.data (2567 samples) ]
Usage Examples
Profiling CPU usage of a command
$ perf stat -d ls -la
total 56
drwxr-xr-x 9 user user 4096 May 5 10:00 .
drwxr-xr-x 28 user user 4096 May 4 15:30 ..
-rw-r--r-- 1 user user 8980 May 5 09:45 file.txt
Performance counter stats for 'ls -la':
1.52 msec task-clock # 0.812 CPUs utilized
0 context-switches # 0.000 K/sec
0 cpu-migrations # 0.000 K/sec
102 page-faults # 0.067 M/sec
3,842,901 cycles # 2.530 GHz
5,779,212 instructions # 1.50 insn per cycle
1,059,631 branches # 697.128 M/sec
36,789 branch-misses # 3.47% of all branches
1,254,898 L1-dcache-loads # 825.590 M/sec
45,632 L1-dcache-load-misses # 3.64% of all L1-dcache accesses
0.001871938 seconds time elapsed
0.001871000 seconds user
0.000000000 seconds sys
Recording and analyzing application performance
$ perf record -g ./myapplication
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.253 MB perf.data (3842 samples) ]
$ perf report
# To display the perf.data header info, please use --header/--header-only options.
#
# Samples: 3K of event 'cycles'
# Event count (approx.): 3842000000
#
# Overhead Command Shared Object Symbol
# ........ ....... ................. ..............
#
35.42% myapplication myapplication [.] process_data
21.67% myapplication libc-2.31.so [.] malloc
15.89% myapplication myapplication [.] calculate_result
Monitoring specific hardware events
$ perf stat -e L1-dcache-loads,L1-dcache-load-misses,L1-dcache-stores ./myprogram
Performance counter stats for './myprogram':
123,456,789 L1-dcache-loads
2,345,678 L1-dcache-load-misses # 1.90% of all L1-dcache accesses
98,765,432 L1-dcache-stores
2.345678901 seconds time elapsed
Tips:
Run as Root for Full Access
Many perf features require root privileges. Use sudo perf
to access all hardware counters and system-wide profiling capabilities.
Use Flame Graphs for Visualization
Convert perf data to flame graphs for easier analysis:
$ perf record -g ./myprogram
$ perf script | FlameGraph/stackcollapse-perf.pl | FlameGraph/flamegraph.pl > flamegraph.svg
Focus on Hotspots
When analyzing performance data, concentrate on functions with the highest overhead percentages first, as these represent the best optimization opportunities.
Reduce Overhead During Recording
For production profiling, use sampling at a lower frequency with -F
to reduce the performance impact:
$ perf record -F 99 -g -p 1234
Annotate Source Code
Use perf annotate
to see which specific lines of code are causing performance issues:
$ perf annotate -d ./myprogram
Frequently Asked Questions
Q1. What's the difference between perf stat and perf record?
A. perf stat
provides a summary of performance metrics after a command completes, while perf record
captures detailed performance data that can be analyzed later with perf report
.
Q2. How can I profile a running process?
A. Use perf record -p PID
to attach to a running process by its process ID.
Q3. How do I interpret the output of perf report?
A. The "Overhead" column shows the percentage of samples attributed to each function, helping identify performance bottlenecks. Higher percentages indicate functions consuming more CPU time.
Q4. Can perf profile GPU performance?
A. Standard perf primarily focuses on CPU and system performance. For GPU profiling, specialized tools like NVIDIA's nvprof or AMD's ROCm profiler are more appropriate.
Q5. How can I reduce the size of perf.data files?
A. Use the --freq
or -F
option with a lower sampling rate, or limit the data collection duration with the -a
option and a time specification.
References
https://perf.wiki.kernel.org/index.php/Main_Page
Revisions
- 2025/05/05 First revision