Trivial sampling profiler for GNU/Linux

Rationale

I've found that there are no things (probably apart from "Zoom") that can annotate source in a way to help me to find functions that slow down the program without causing much CPU calculations.

Most profilers (oprofile, valgrind --tool=callgrind) show things like string processing, some calculations, inner loops while the actual delay is in read, fsync, nanosleep and other syscalls.

In other words, I need not only to know why program hogs CPU, but why program waits.

To do it right either oprofile needs to be able to interrupt my program and probe samples (not just measure the thing that executing now), or callgrind need to "look at the clock" while probing (not just count calls).

I've asked at stackoverflow.com ( 1 2 3 4 ), but received no positive answers (not checked proprietary "Zoom" although).

While thinking how it is to be done, I've quickly hacked the proof-of-concept sampling profiler that moslty uses standard tools.

It shows how many times each line has appeared on stack.

Codes

The profiler consists of three parts:

Code snippet:

void dump_trace() { FILE* f = fopen("/tmp/qqq", "a"); void *array[10]; size_t size; size_t i; size = backtrace (array, 10); fprintf (f, "\n"); for (i = 0; i < size; ++i) { char* address = array[i]; if(i>1) { --address; /* We want caller, not the next instruction */ } fprintf (f, "%p\n", address); } fclose(f); } void __attribute__((constructor)) dump_trace_setup() { struct sigaction sa = {dump_trace}; sigaction(SIGUSR1, &sa, NULL); }

dump_trace writes the next backtrace to /tmp/qqq; dump_trace_setup sets up issuing backtraces when the program receives SIGUSR1 signal.

Timer

for((;;)) { killall -USR1 prog; }

(I know it's bad, but can a good thing here be that simple?)

Converter

addr2annotation:

#!/bin/sh if [ -z "$1" ]; then echo "Usage: cat input | addr2line -e program | addr2annotation \`pwd\`/soucefile.c | less" exit 1; fi; sort | uniq -c | grep $1 | perl -ne '/^\s*(\d+).*:\s*(\d*)\s*$/ and print "$2 $1\n"' | sort -n | annotate.pl $1

It takes lines produced by addr2line, calculates statistics (sort | uniq -c), then simplifies format for annotate.pl ("line_number number_of_occurences")

annotate.pl:

#!/usr/bin/perl -w use strict; open F, "<", $ARGV[0] or die($!); my $l = 0; for(;;) { $_ = <STDIN>; last unless $_; /(\d+)\s+(\d+)/; my ($line, $count) = ($1, $2); while(<F>) { ++$l; if($l==$line) { printf "%7d|", $count; print; last; } else { print " |"; print; } } } while(<F>) { print " |"; print; }

It just reads source file (command-line argument), pairs of numbers ("line_number number_of_occurences") from stdin and prints the annotated source code.

Example session:

vi@vi-notebook:~/code/profile_test$ cat prog.c #include <unistd.h> #include <stdio.h> #include <execinfo.h> #include <signal.h> func1() { usleep(10000); } func2() { int i; for(i=0; i<10; ++i) { usleep(0); } } func3() { func1(); func2(); int i, u=3; for(i=0; i<1000000; ++i) { u*=u; u+=17; u/=3; } } int main() { int i; func1(); func2(); for(i=0; i<10; ++i) { func3(); func2(); func3(); } func3(); return 0; } void dump_trace() { FILE* f = fopen("/tmp/qqq", "a"); void *array[10]; size_t size; size_t i; size = backtrace (array, 10); fprintf (f, "\n"); for (i = 0; i < size; ++i) { char* address = array[i]; if(i>1) { --address; /* We want caller, not the next instruction */ } fprintf (f, "%p\n", address); } fclose(f); } void __attribute__((constructor)) dump_trace_setup() { struct sigaction sa = {dump_trace}; sigaction(SIGUSR1, &sa, NULL); } vi@vi-notebook:~/code/profile_test$ gcc -g3 prog.c -o prog vi@vi-notebook:~/code/profile_test$ for((;;)) { killall -USR1 prog 2> /dev/null; }& [1] 22269 vi@vi-notebook:~/code/profile_test$ for((;;)) { ./prog ; } # wait for some time while it's collecting probes ^C vi@vi-notebook:~/code/profile_test$ kill %1 [1]+ Terminated for ((1; 1; 1)) do killall -USR1 prog 2> /dev/null; done vi@vi-notebook:~/code/profile_test$ head /tmp/qqq 0x804862c 0xffffe400 0x8048535 0x8048569 0x80485d7 0x496d5c75 0x8048490 0x804862c vi@vi-notebook:~/code/profile_test$ cat /tmp/qqq | addr2line -e prog | head ??:0 /home/vi/code/profile_test/prog.c:51 ??:0 /home/vi/code/profile_test/prog.c:8 /home/vi/code/profile_test/prog.c:19 /home/vi/code/profile_test/prog.c:36 ??:0 ??:0 ??:0 /home/vi/code/profile_test/prog.c:51 vi@vi-notebook:~/code/profile_test$ cat /tmp/qqq | addr2line -e prog | addr2annotation `pwd`/prog.c |#include <unistd.h> |#include <stdio.h> |#include <execinfo.h> |#include <signal.h> | | |func1() { 27726| usleep(10000); |} | |func2() { | int i; | for(i=0; i<10; ++i) { 52670| usleep(0); | } |} | |func3() { 26662| func1(); 30278| func2(); | int i, u=3; 538| for(i=0; i<1000000; ++i) { 205| u*=u; | u+=17; 10762| u/=3; | } |} | |int main() { | | int i; | 911| func1(); 1278| func2(); 11505| for(i=0; i<10; ++i) { 47336| func3(); 17947| func2(); 46859| func3(); | } 4724| func3(); | return 0; |} | | |void dump_trace() { | FILE* f = fopen("/tmp/qqq", "a"); | void *array[10]; | size_t size; | size_t i; | 119058| size = backtrace (array, 10); | fprintf (f, "\n"); | | for (i = 0; i < size; ++i) { | char* address = array[i]; | if(i>1) { | --address; /* We want caller, not the next instruction */ | } | fprintf (f, "%p\n", address); | } | | fclose(f); |} | |void __attribute__((constructor)) dump_trace_setup() { | struct sigaction sa = {dump_trace}; | sigaction(SIGUSR1, &sa, NULL); |} |

We see both calculations-based delays (at the end of func3) and wait-based ones (func1 and func2). Also it is accumulating: time for callees are automatically included.

Possible enhancements

Contact

Vitaly "_Vi" Shukela.

vi0oss@gmail.com

License: May use the codes anywhere