Signed divide by 2 pessimization

Marek Michalkiewicz
Sun Sep 10 04:22:00 GMT 2000

Geoff Keating <> wrote:
> This is a sign that x86 should be fixed.  It's bad to work around
> problems with a backend in the frontend.  Why is the x86 BRANCH_COST
> set too low?

That was just my conclusion after some simple tests I did on a
Pentium III.  But the x86 is a complicated beast, and probably
there is no single value of BRANCH_COST that is best in all cases.
(This was already discussed in May.)

What I did was to change the test to look like this:

  (BRANCH_COST < shift_cost[size - 1] * (abs_d != 2) + shift_cost[size - lgup])

(compare the cost of a branch with the cost of one or two shifts)
which does the right thing on the AVR, but x86 BRANCH_COST is also 1
by default, which makes the condition always true, resulting in
smaller but slower code in my simple tests (which by no means should
be considered good benchmarks - things like branch prediction make
it difficult to benchmark).  Changing x86 BRANCH_COST to 2 makes the
fastest code for this case, but I don't know what other (bad) effects
it might have, so I prefer to leave it alone.

The existing (abs_d != 2 && BRANCH_COST < 3) test just happens to do
the right thing for the x86.  On the AVR, it makes the code for divide
by 2 both slower and larger (about twice as big - and that's a little
8-bit micro where code size is very important).  I don't even know what
would be the effect of this change on dozens of other GCC targets, or
non-Intel x86 chips - I just can't test that.

Because it's all so complicated, I'd like a simple and safe change
which can be made now - make (abs_d != 2 && BRANCH_COST < 3) the
default, but allow the target to override it.  Let's do it now, let
the interested targets decide, and worry about improving it later.

The problem potentially affects many small embedded targets, where
shifts (especially by many bits - if a single machine instruction
can only shift by 1) are often much more expensive than branches
(fast because there is no big prefetch queue to flush).

I understand we already have many target macros, so one more is not
very welcome.  On the positive side, that's just fine-tuning (only
affects code size and speed, not correctness), so most targets need
not define it and will get a (sometimes) reasonable default.  People
writing new ports don't need to worry about it either, until they
get the port working and start fine-tuning it.

The patch (rejected) I sent in May, as well as links to related
discussion, is here:

Please consider this change again.  Thanks!


More information about the Gcc mailing list