This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
basic-block and profile-based optimizing (was Re: New attribute"infrequent"?)
- To: Jan Hubicka <jh at suse dot cz>
- Subject: basic-block and profile-based optimizing (was Re: New attribute"infrequent"?)
- From: Scott A Crosby <crosby at qwes dot math dot cmu dot edu>
- Date: Fri, 31 Aug 2001 07:51:13 -0400 (EDT)
- cc: <gcc at gcc dot gnu dot org>, <pfk at fuchs dot offl dot uni-jena dot de>, <libc-alpha at sources dot redhat dot com>, <aj at suse dot de>
On Mon, 27 Aug 2001, Jan Hubicka wrote:
> Hi,
> several branch prediction papers consider classification of function calls to
> discover calls to exceptional code (error handling, exit etc.) and predict them
> as not happening. While this improves the call heuristics coverage
> imporatatnly, I didn't considered it very important to implement, but after
> generalizing it a bit, it kept in my mind for a few months now as potentially
> attractive.
>
Wouldn't it be more general to use profiling for this? We've got an
infrastructure for doing both funciton-level profiling, and even
basic-block level profiling.
GCC could analyze the function count profiling and automatically determine
infrequently used functions, then treat them as you suggest.
Since there is basic-block profiling too, at least in theory, one could
move seldom-used basic blocks to the end of an object file, or even into
their own section? I've seen some strange code (Like the linux kernel)
where this is being done manually.
This infrastructure could be used to determine which loops to unroll,
which functions to inline, etc. (If a large function is called frequently,
but almost always from only one other place, inline it.)
The basic-block profiling infrastructure doesn't seem to be well known. I
suspect this may be because the gprof/gcov manpages have been out of date
with respect to the gprof texinfo. I think that whats most needed is to
increase the visibility of what has been accomplished already.
As for the code, it definitely needs cleanup. There's '-fprofile-arcs'
'-a' for two different ways of profiling basic blocks, and apparently two
different ways output formats. There's also two different programs for
analyzing this info: gprof and gcov. Then, there's out of date manpages
for both; you have to read the texinfo.
Its taken some puzzling and experimentation for me to get this far. I'm
attaching some random snippets from the texinfo docs, and at the end, I've
got two working illustrations for how to use gcov&gprof for basic block
profiling, and sample outputs from both.
This work was done with:
GCC version 2.95.2 20000220 (Debian GNU/Linux) -- /usr/lib/gcc-lib/i386-linux/2.95.2/spec
GCOV version 1.5
GPROF version 2.9.5
Scott
-------- Pulled from various manuals..
`-a'
Generate extra code to write profile information for basic blocks,
which will record the number of times each basic block is
executed, the basic block start address, and the function name
containing the basic block. If `-g' is used, the line number and
filename of the start of the basic block will also be recorded.
If not overridden by the machine description, the default action is
to append to the text file `bb.out'.
---
And we get a file 'bb.out' with stuff like:
File /tmp/fffxxx/xbattleai-1.1.2/load.d, 80 basic blocks
Block # 1: executed 1 time(s) address= 0x80525a4 function= dump_board line= 31 file= load.c
Block # 2: executed 0 time(s) address= 0x80525e8 function= dump_board line= 40 file= load.c
Block # 3: executed 1 time(s) address= 0x8052602 function= dump_board line= 44 file= load.c
Block # 4: executed 0 time(s) address= 0x80526cb function= dump_board line= 52 file= load.c
Block # 5: executed 1 time(s) address= 0x80526e7 function= dump_board line= 54 file= load.c
Block # 6: executed 1 time(s) address= 0x8052701 function= dump_board line= 58 file= load.c
Block # 7: executed 121 time(s) address= 0x8052710 function= dump_board line= 58 file= load.c
...
---
Gprof can even annotate the source code, in more than one way, with this
info.
`-A[SYMSPEC]'
`--annotated-source[=SYMSPEC]'
The `-A' option causes `gprof' to print annotated source code. If
SYMSPEC is specified, print output only for matching symbols.
*Note Annotated Source::.
It can also suggest function ordering:
`--function-ordering'
The `--function-ordering' option causes `gprof' to print a
suggested function ordering for the program based on profiling
data. This option suggests an ordering which may improve paging,
tlb and cache behavior for the program on systems which support
arbitrary ordering of functions in an executable.
The exact details of how to force the linker to place functions in
a particular order is system dependent and out of the scope of this
manual.
---
GCC also has a way to guess branch probabilities:
`-fprofile-arcs'
Instrument "arcs" during compilation. For each function of your
program, GCC creates a program flow graph, then finds a spanning
tree for the graph. Only arcs that are not on the spanning tree
have to be instrumented: the compiler adds code to count the
number of times that these arcs are executed. When an arc is the
only exit or only entrance to a block, the instrumentation code
can be added to the block; otherwise, a new basic block must be
created to hold the instrumentation code.
`-fbranch-probabilities'
After running a program compiled with `-fprofile-arcs' (*note
Options for Debugging Your Program or `gcc': Debugging Options.),
you can compile it a second time using `-fbranch-probabilities',
to improve optimizations based on guessing the path a branch might
take.
--
I could not get it to generate both gprof and gcov profiling output with
one execution.
--
The 'gprof' technique for basic-block profiling.
CFLAGS= -g -pg -a
Compile and run the program, then:
# perl /usr/share/doc/binutils/gprof/bbconv.pl < bb.out >bbFOO.out
# gprof -l -A -x BINARY_NAME gmon.out bbFOO.out | less
And each line is annotated with how many times it branches to each basic
block it branches to. [1]
--
The 'gcov' alternative is:
CFLAGS= -g -pg -fprofile-arcs -ftest-coverage
Compile the program (generating *.bb and *.bbg files)
Run the program (which should generate *.da files upon normal termination.)
# gcov -b FOO (the 'FOO' in 'FOO.c')
Then examine the FOO.c.gcov file.
--
[1] From gprof texinfo
ulg updcrc(s, n)
uch *s;
unsigned n;
2 ->{
register ulg c;
static ulg crc = (ulg)0xffffffffL;
2 -> if (s == NULL) {
1 -> c = 0xffffffffL;
1 -> } else {
1 -> c = crc;
1 -> if (n) do {
26312 -> c = crc_32_tab[...];
26312,1,26311 -> } while (--n);
}
2 -> crc = c;
2 -> return c ^ 0xffffffffL;
2 ->}
In this example, the function was called twice, passing once through
each branch of the `if' statement. The body of the `do' loop was
executed a total of 26312 times. Note how the `while' statement is
annotated. It began execution 26312 times, once for each iteration
through the loop. One of those times (the last time) it exited, while
it branched back to the beginning of the loop 26311 times.
[2] From gcov texinfo.
main()
{
1 int i, total;
1 total = 0;
11 for (i = 0; i < 10; i++)
branch 0 taken = 91%
branch 1 taken = 100%
branch 2 taken = 100%
10 total += i;
1 if (total != 45)
branch 0 taken = 100%
###### printf ("Failure\n");
call 0 never executed
branch 1 never executed
else
1 printf ("Success\n");
call 0 returns = 100%
1 }