This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] PR middle-end/18293: Fast-path expand_mult for 2^N

On Wed, 8 Dec 2004, Paul Schlie wrote:
> > Roger Sayle wrote:
> >> Zack Weinberg wrote:
> >> [We do still strength-reduce x*2 all the way to x+x, right?]
> > Yes, we do.  expand_shift contains the optimization that x<<1 is x+x,
> > and even that x<<2 is t=x+x,t+t on platforms where rtx_costs
> > of N additions is lower than the rtx_cost of a left shift by N
> > bits.
> - wouldn't ever expect it to be advantageous, even on 8-bit
>   single-bit-shift per cycle targets.

On the Intel Pentium 4, an addition has a latency of a single cycle,
but a shift by a constant has a latency of four.  This means that
multiplications by eight are faster implemented by three consecutive
additions than by a single shift instruction.  The P4 is a significant
GCC target.

> However on a similar topic, has any further thought been given to
> reverting:
> (bool)(x & <pow2-const>) => (bool)(x >> <log2-const>)
> as at best it's a target specific optimization, and generally grossly
> counter productive on targets which do not shift efficiently? [and
> doubly painful if the (x >> <log2-const>) expression is needlessly
> promoted to int on small targets]?

I agree that this performance regression still needs to be fixed
for 4.0.0, but I've not yet confirmed that the RTL if-conversion
pass will catch this optimization when appropriate.  Without it
we'll just be exchanging one performance regression for another.
I promise to look into it today.  Can you remind me of the PR#?


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]