This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
BBRO declared harmful (on H8/300 & others)
- From: tm <tm at mail dot kloo dot net>
- To: kazu at cs dot umass dot edu, law at redhat dot com
- Cc: gcc-patches at gcc dot gnu dot org, toshiyasu dot morita at hsa dot hitachi dot com
- Date: Thu, 31 Oct 2002 13:16:41 -0800 (PST)
- Subject: BBRO declared harmful (on H8/300 & others)
I'm still waiting for my patch mentioned in:
http://gcc.gnu.org/ml/gcc-patches/2002-10/msg01334.html
to be approved but nobody seems to be approving it/reviewing it,
possibly because nobody is understanding the problem.
So here's a more detailed explanation of the problem.
If you look at the comment in bb-reorder.c which explains the
workings of BBRO, it states:
" (1) Consider:
if (p) goto A; // predict taken
foo ();
A:
if (q) goto B; // predict taken
bar ();
B:
baz ();
return;
We'll currently reorder this as
if (!p) goto C;
A:
if (!q) goto D;
B:
baz ();
return;
D:
bar ();
goto B;
C:
foo ();
goto A;"
This code reordering is based on this implicit assumption:
1) The fall-through case for a branch is faster than the branching case
so it converts the predicted case to a fallthrough and moves the
nonpredicted case out-of-line.
This is not true on the H8/300. It has short conditional branches which
are 4 states and long conditional branches which are 6 states regardless
of whether they are taken or not taken.
So BBRO winds up taking code like this:
cmp.w #18,er1 ; 4 states
bne label ; 4 states
...
label:
The above code:
branch not taken: 4 + 4 = 8 states
branch taken : 4 + 4 = 8 states
and converts this to:
cmp.w #18,er1 ; 4 states
beq label1 ; 4 states
label2:
label1:
...
jmp @label2:24 ; 4 states
So the above code:
branch not taken: 4 + 4 = 8 states
branch taken: 4 + 4 + 4 = 12 states
So to summarize:
BBRO is a bad optimization on processors without cache where the
conditional branches require a fixed number of clocks regardless
of whether the branch is taken or not taken.
On these processors, BBRO does not improve the predicted branch case
and penalizes the non-predicted case which results in a performance LOSS.
Not only does it create slower code, but it also generates more branches,
which increases code size and decreases code density.
Therefore BBRO should be turned off for the H8/300, and other processors
without cache which have constant-cycle conditional branches.
Toshi