This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

basic-block and profile-based optimizing (was Re: New attribute"infrequent"?)

On Mon, 27 Aug 2001, Jan Hubicka wrote:

> Hi,
> several branch prediction papers consider classification of function calls to
> discover calls to exceptional code (error handling, exit etc.) and predict them
> as not happening.  While this improves the call heuristics coverage
> imporatatnly, I didn't considered it very important to implement, but after
> generalizing it a bit, it kept in my mind for a few months now as potentially
> attractive.

Wouldn't it be more general to use profiling for this? We've got an
infrastructure for doing both funciton-level profiling, and even
basic-block level profiling.

GCC could analyze the function count profiling and automatically determine
infrequently used functions, then treat them as you suggest.

Since there is basic-block profiling too, at least in theory, one could
move seldom-used basic blocks to the end of an object file, or even into
their own section? I've seen some strange code (Like the linux kernel)
where this is being done manually.

This infrastructure could be used to determine which loops to unroll,
which functions to inline, etc. (If a large function is called frequently,
but almost always from only one other place, inline it.)

The basic-block profiling infrastructure doesn't seem to be well known. I
suspect this may be because the gprof/gcov manpages have been out of date
with respect to the gprof texinfo. I think that whats most needed is to
increase the visibility of what has been accomplished already.

As for the code, it definitely needs cleanup. There's '-fprofile-arcs'
'-a' for two different ways of profiling basic blocks, and apparently two
different ways output formats. There's also two different programs for
analyzing this info: gprof and gcov. Then, there's out of date manpages
for both; you have to read the texinfo.

Its taken some puzzling and experimentation for me to get this far. I'm
attaching some random snippets from the texinfo docs, and at the end, I've
got two working illustrations for how to use gcov&gprof for basic block
profiling, and sample outputs from both.

This work was done with:
  GCC version   2.95.2 20000220 (Debian GNU/Linux)  --  /usr/lib/gcc-lib/i386-linux/2.95.2/spec
  GCOV version  1.5
  GPROF version 2.9.5


-------- Pulled from various manuals..

     Generate extra code to write profile information for basic blocks,
     which will record the number of times each basic block is
     executed, the basic block start address, and the function name
     containing the basic block.  If `-g' is used, the line number and
     filename of the start of the basic block will also be recorded.
     If not overridden by the machine description, the default action is
     to append to the text file `bb.out'.

And we get a file 'bb.out' with stuff like:
File /tmp/fffxxx/xbattleai-1.1.2/load.d, 80 basic blocks

    Block # 1: executed   1 time(s) address= 0x80525a4 function= dump_board        line=  31 file= load.c
    Block # 2: executed   0 time(s) address= 0x80525e8 function= dump_board        line=  40 file= load.c
    Block # 3: executed   1 time(s) address= 0x8052602 function= dump_board        line=  44 file= load.c
    Block # 4: executed   0 time(s) address= 0x80526cb function= dump_board        line=  52 file= load.c
    Block # 5: executed   1 time(s) address= 0x80526e7 function= dump_board        line=  54 file= load.c
    Block # 6: executed   1 time(s) address= 0x8052701 function= dump_board        line=  58 file= load.c
    Block # 7: executed 121 time(s) address= 0x8052710 function= dump_board        line=  58 file= load.c

Gprof can even annotate the source code, in more than one way, with this

     The `-A' option causes `gprof' to print annotated source code.  If
     SYMSPEC is specified, print output only for matching symbols.
     *Note Annotated Source::.

It can also suggest function ordering:
     The `--function-ordering' option causes `gprof' to print a
     suggested function ordering for the program based on profiling
     data.  This option suggests an ordering which may improve paging,
     tlb and cache behavior for the program on systems which support
     arbitrary ordering of functions in an executable.

     The exact details of how to force the linker to place functions in
     a particular order is system dependent and out of the scope of this

GCC also has a way to guess branch probabilities:
     Instrument "arcs" during compilation.  For each function of your
     program, GCC creates a program flow graph, then finds a spanning
     tree for the graph.  Only arcs that are not on the spanning tree
     have to be instrumented: the compiler adds code to count the
     number of times that these arcs are executed.  When an arc is the
     only exit or only entrance to a block, the instrumentation code
     can be added to the block; otherwise, a new basic block must be
     created to hold the instrumentation code.
     After running a program compiled with `-fprofile-arcs' (*note
     Options for Debugging Your Program or `gcc': Debugging Options.),
     you can compile it a second time using `-fbranch-probabilities',
     to improve optimizations based on guessing the path a branch might

I could not get it to generate both gprof and gcov profiling output with
one execution.
The 'gprof' technique for basic-block profiling.
    CFLAGS= -g -pg -a
Compile and run the program, then:
# perl /usr/share/doc/binutils/gprof/ < bb.out >bbFOO.out
# gprof -l -A -x BINARY_NAME gmon.out bbFOO.out | less

And each line is annotated with how many times it branches to each basic
block it branches to. [1]

The 'gcov' alternative is:
   CFLAGS= -g -pg -fprofile-arcs -ftest-coverage

Compile the program (generating *.bb and *.bbg files)

Run the program (which should generate *.da files upon normal termination.)

# gcov -b FOO   (the 'FOO' in 'FOO.c')

Then examine the FOO.c.gcov file.

[1] From gprof texinfo
                     ulg updcrc(s, n)
                         uch *s;
                         unsigned n;
                 2 ->{
                         register ulg c;

                         static ulg crc = (ulg)0xffffffffL;

                 2 ->    if (s == NULL) {
                 1 ->	c = 0xffffffffL;
                 1 ->    } else {
                 1 ->	c = crc;
                 1 ->        if (n) do {
             26312 ->            c = crc_32_tab[...];
     26312,1,26311 ->        } while (--n);
                 2 ->    crc = c;
                 2 ->    return c ^ 0xffffffffL;
                 2 ->}

   In this example, the function was called twice, passing once through
each branch of the `if' statement.  The body of the `do' loop was
executed a total of 26312 times.  Note how the `while' statement is
annotated.  It began execution 26312 times, once for each iteration
through the loop.  One of those times (the last time) it exited, while
it branched back to the beginning of the loop 26311 times.

[2] From gcov texinfo.
                1      int i, total;

                1      total = 0;

               11      for (i = 0; i < 10; i++)
     branch 0 taken = 91%
     branch 1 taken = 100%
     branch 2 taken = 100%
               10        total += i;

                1      if (total != 45)
     branch 0 taken = 100%
           ######        printf ("Failure\n");
     call 0 never executed
     branch 1 never executed
                1        printf ("Success\n");
     call 0 returns = 100%
                1    }

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]