This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

BBRO declared harmful (on H8/300 & others)


I'm still waiting for my patch mentioned in:

http://gcc.gnu.org/ml/gcc-patches/2002-10/msg01334.html

to be approved but nobody seems to be approving it/reviewing it,
possibly because nobody is understanding the problem.

So here's a more detailed explanation of the problem.

If you look at the comment in bb-reorder.c which explains the
workings of BBRO, it states:

"  (1) Consider:

                if (p) goto A;          // predict taken
                foo ();
              A:
                if (q) goto B;          // predict taken
                bar ();
              B:
                baz ();
                return;

       We'll currently reorder this as

                if (!p) goto C;
              A:
                if (!q) goto D;
              B:
                baz ();
                return;
              D:
                bar ();
                goto B;
              C:
                foo ();
                goto A;"

This code reordering is based on this implicit assumption:

1) The fall-through case for a branch is faster than the branching case
   so it converts the predicted case to a fallthrough and moves the
   nonpredicted case out-of-line.

This is not true on the H8/300. It has short conditional branches which
are 4 states and long conditional branches which are 6 states regardless
of whether they are taken or not taken.

So BBRO winds up taking code like this:

        cmp.w   #18,er1         ; 4 states
        bne     label           ; 4 states
        ...
label:

The above code:

branch not taken: 4 + 4 = 8 states
branch taken    : 4 + 4 = 8 states

and converts this to:

        cmp.w   #18,er1         ; 4 states
        beq     label1          ; 4 states
label2:

label1:
        ...
        jmp     @label2:24      ; 4 states

So the above code:

branch not taken: 4 + 4     = 8 states
branch taken:     4 + 4 + 4 = 12 states

So to summarize:

BBRO is a bad optimization on processors without cache where the
conditional branches require a fixed number of clocks regardless
of whether the branch is taken or not taken.

On these processors, BBRO does not improve the predicted branch case
and penalizes the non-predicted case which results in a performance LOSS.
Not only does it create slower code, but it also generates more branches,
which increases code size and decreases code density.

Therefore BBRO should be turned off for the H8/300, and other processors
without cache which have constant-cycle conditional branches.

Toshi


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]